Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestList messes up URLs containing characters like ' or * when populated with requestsFromUrl #2755

Open
1 task
webrdaniel opened this issue Nov 29, 2024 · 1 comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@webrdaniel
Copy link
Collaborator

Which package is this bug report for? If unsure which one to select, leave blank

None

Issue description

RequestList seems to mess with URLs containing characters like ' or * when populated with requestsFromUrl.
It uses regex to grab the URLs but the URLs might not follow proper spec.

Code sample

import { RequestList } from '@crawlee/core';

const startUrls1 = await RequestList.open('startUrls1', [
    {
        "requestsFromUrl": "https://pastebin.com/raw/VHLFnh2h"
    }
]);
// this is just like above, but directly without an external file
const startUrls2 = await RequestList.open('startUrls2', [
    {
        "url": "https://www.zillow.com/homedetails/141-O'Canoe-Pl-Hampton-VA-23661/74398007_zpid/"
    }
]);
console.log(startUrls1.requests);
console.log(startUrls2.requests);

Package version

v3.12.0

Node.js version

20.18.1

Operating system

No response

Apify platform

  • Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

@webrdaniel webrdaniel added the bug Something isn't working. label Nov 29, 2024
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Nov 29, 2024
@B4nan
Copy link
Member

B4nan commented Nov 29, 2024

Please keep the link to the slack thread in such reports so we have the additional context too.

https://apify.slack.com/archives/C0L33UM7Z/p1732817024848529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants