Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Be able to pass a timeout param to the endpoints #59

Closed
nickscamara opened this issue Apr 24, 2024 · 10 comments
Closed

[Feat] Be able to pass a timeout param to the endpoints #59

nickscamara opened this issue Apr 24, 2024 · 10 comments
Assignees

Comments

@nickscamara
Copy link
Member

nickscamara commented Apr 24, 2024

enable the user to pass a "timeout parameter" to both the scrape and the crawl endpoint. If the timeout is exceeded, please send the user a clear error message. On the crawl endpoint, return any pages that have already been scraped but include messages notifying them that the timeout was exceeded.

If the task is completed within two days, we'll include a $10 dollar tip :)

This is an intro bounty. We are looking for exciting people that will buy in so we can start to ramp up.

@ezhil56x
Copy link

ezhil56x commented Apr 25, 2024

@nickscamara
Can I get assigned?

@nickscamara
Copy link
Member Author

@ezhil56x all yours!

@nickscamara nickscamara changed the title [Feat]: Be able to pass a timeout param to /scrape endpoints [Feat]: Be able to pass a timeout param to the endpoints Apr 25, 2024
@ezhil56x
Copy link

@nickscamara
Do we need a default timeout or not required?

@kesh-007 kesh-007 mentioned this issue Apr 25, 2024
@rafaelsideguide rafaelsideguide changed the title [Feat]: Be able to pass a timeout param to the endpoints [Feat] Be able to pass a timeout param to the endpoints May 2, 2024
@parthusun8
Copy link

parthusun8 commented Jun 8, 2024

Hi, Is this issue still open, or is someone working on it?

@rafaelsideguide
Copy link
Collaborator

@parthusun8, the issue is still open, but fixing it would need us to make some real complex changes to our bull queue system to allow the /crawl route to timeout. So far, we’ve found that stopping an active job in bull isn’t possible. This means we’d have to change the deepest parts of our system to add a timeout feature to Firecrawl.

@rafaelsideguide
Copy link
Collaborator

@nickscamara should we close this for now?

@haija45
Copy link

haija45 commented Jul 5, 2024

Je peux être affecté in the work

@akay41024
Copy link

/attempt #59
My implementation plan 👍

In the scrape endpoint, we use the scrapeUrl function and pass the timeout value as an option. If the scrape operation times out, we catch the TimeoutError and return a JSON response with a status code of 408 (Request Timeout).

In the crawl endpoint, we use the crawlUrl function and pass the timeout value as an option. If the crawl operation times out, we catch the TimeoutError and return a JSON response with a status code of 408 (Request Timeout). We also add a message to each page in the response indicating that the crawl timed out.

Copy link

algora-pbc bot commented Jul 31, 2024

@akay41024: Another person is already attempting this issue. Please don't start working on this issue unless you were explicitly asked to do so.

@algora-pbc algora-pbc bot removed the 💎 Bounty label Aug 26, 2024
@rafaelsideguide
Copy link
Collaborator

The queue management system (BullMQ) does not support this feature for crawl. For now, the user can cancel the crawl using DELETE endpoint. Closing this for now.

@rafaelsideguide rafaelsideguide closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants