[Bug - Actions] All scraping engines failed! #884

yupingsong-anylink-io · 2024-11-11T02:27:14Z

Describe the Bug
When using FirecrawlApp.app.scrape_url to scrape a page, the following error is received:
Error: Internal Server Error: Failed to scrape URL. (Internal server error) - All scraping engines failed! - No additional error details provided.
The same code used to work properly.

Screenshots

Environment (please complete the following information):

OS: Windows
Firecrawl Version:1.3.1
python Version: 3.10

Logs

Scrape {my url} failed. Error: Internal Server Error: Failed to scrape URL. (Internal server error) - All scraping engines failed! - No additional error details provided.

Additional Context

crawler = FirecrawlApp(api_key=os.getenv("FIRECRAWL_API_KEY"))


class ExtractSchema(BaseModel):
    urls: List[str]


def search_by_keyword(search_url: str, key_word: str) -> str:
    print(f"Starting search with keyword: {key_word}")
    try:
        scrape_result = crawler.scrape_url(
            search_url,
            params={
                "formats": ["extract"],
                #Specify the HTML tags, classes and ids to include in the response.
                "includeTags": ["#t table.eps-table td.views-field-dummy-notice-title a"],
                #A prompt for the LLM to extract the data in the correct structure.
                "extract": {
                    "prompt": "Extract the url of <a> element",
                    "schema": ExtractSchema.model_json_schema()
                },
                "actions": [
                    {"type": "wait", "milliseconds": 2000},
                    {"type": "click", "selector": "#edit-words--7"},
                    {"type": "wait", "milliseconds": 500},
                    {"type": "write", "text": key_word},
                    {"type": "wait", "milliseconds": 500},
                    {"type": "press", "key": "Enter"},
                    {"type": "wait", "milliseconds": 8000}
                ]
            }
        )
        print("Search completed. Processing results...")
        return json.dumps(scrape_result)
    except Exception as e:
        print(f"Scrape {search_url} failed. Error: {e}")
        return ""

The text was updated successfully, but these errors were encountered:

longmans · 2024-11-11T02:28:39Z

Me, too. Mac, Python 3.12.4

mogery · 2024-11-11T10:36:33Z

Can you please share an example URL where this fails?

mogery · 2024-11-11T10:44:54Z

I believe this may be fixed by f097cdd, but it is hard to debug without a URL.

yupingsong-anylink-io · 2024-11-12T03:31:16Z

The search_url is https://canadabuys.canada.ca/en/tender-opportunities?search_filter=&status%5B87%5D=87&status%5B1920%5D=1920&pub%5B3%5D=3&record_per_page=50&current_tab=t&words=, and key_word is Walkway

mogery · 2024-11-13T19:06:06Z

Hi there, I cannot recreate the issue anymore. Is it fixed for you as well?

yupingsong-anylink-io · 2024-11-14T01:55:33Z

I just confirmed, it's fixed on my end too, thanks!

Is there any repair done on the firecrawl server side? Or is it due to my local reasons? If this problem recurs in the future, it will be easier for me to locate it.

mogery · 2024-11-14T17:59:17Z

This was repaired server-side, in commit f097cdd. We weren't accounting for the wait actions in our timeout logic.

aakriti-14 · 2024-11-20T04:42:32Z

I am also facing this similar issue and getting the error.

{"success":false,"error":"(Internal server error) - All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [[email protected]](mailto:[email protected])."}

I am providing these actions.

actions = [
      {"type": "wait", "milliseconds": 2000},  # Wait before clicking
      {"type": "click", "selector": 'button[data-v-257ec5c0]'},  # Click the "Show More" button
      ...
      (repeated 50 times)
      {"type": "scrape"}
    ]

I am using the timeout of > 10 mins as well. Can you please help here @mogery ?

rafaelsideguide · 2024-12-04T15:57:14Z

I tested for 3 runs with the following code, and "all scraping engines failed" is still happening for over 50% of scrapes.

Testing code:

import FirecrawlApp from "@mendable/firecrawl-js";

const app = new FirecrawlApp({ apiKey: "fc-<redacted>" });

const main = async () => {
    let allEnginesFailedCounter = 0;
    let successCounter = 0;
    let otherErrorCounter = 0;

    for (let i = 0; i < 100; i++) {
      console.log(`Crawl: ${i + 1}`);
      try {
          const constructedUrl = 'https://www.bolagsfakta.se/5566352844-Runlack_Industrilackering_AB';
          const scrapeResponse = await app.scrapeUrl(constructedUrl, {
              formats: ["html"],
              actions: [
                {
                      type: "wait",
                      milliseconds: 5000
                  },
                  {
                      type: "click",
                      selector: "#report-container > div:nth-child(18) > div > div > div.row > div > div > table:nth-child(4) > tbody > tr:nth-child(6)"
                  }
              ],
              onlyMainContent: false,
          });

          if (!scrapeResponse.success) {
              otherErrorCounter++;
          } else {
              successCounter++;
          }
      } catch (error) {
          if (error.message.includes("All scraping engines failed!")) {
              allEnginesFailedCounter++;
          } else {
              otherErrorCounter++;
          }
      }
    }

    console.log({
        allEnginesFailedCounter,
        successCounter,
        otherErrorCounter
    })
}

main()

run 1:

allEnginesFailedCounter: 47,
successCounter: 53,
otherErrorCounter: 0

run 2:

allEnginesFailedCounter: 43,
successCounter: 56,
otherErrorCounter: 1

run 3:

allEnginesFailedCounter: 50,
successCounter: 49,
otherErrorCounter: 1

@mogery @tomkosm any ideas?

yupingsong-anylink-io added the bug Something isn't working label Nov 11, 2024

mogery self-assigned this Nov 11, 2024

mogery closed this as completed Nov 14, 2024

rafaelsideguide reopened this Dec 4, 2024

linear bot changed the title ~~[Bug] All scraping engines failed!~~ [Bug - Actions] All scraping engines failed! Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug - Actions] All scraping engines failed! #884

[Bug - Actions] All scraping engines failed! #884

yupingsong-anylink-io commented Nov 11, 2024 •

edited

Loading

longmans commented Nov 11, 2024 •

edited

Loading

mogery commented Nov 11, 2024

mogery commented Nov 11, 2024

yupingsong-anylink-io commented Nov 12, 2024

mogery commented Nov 13, 2024

yupingsong-anylink-io commented Nov 14, 2024

mogery commented Nov 14, 2024

aakriti-14 commented Nov 20, 2024 •

edited

Loading

rafaelsideguide commented Dec 4, 2024

[Bug - Actions] All scraping engines failed! #884

[Bug - Actions] All scraping engines failed! #884

Comments

yupingsong-anylink-io commented Nov 11, 2024 • edited Loading

longmans commented Nov 11, 2024 • edited Loading

mogery commented Nov 11, 2024

mogery commented Nov 11, 2024

yupingsong-anylink-io commented Nov 12, 2024

mogery commented Nov 13, 2024

yupingsong-anylink-io commented Nov 14, 2024

mogery commented Nov 14, 2024

aakriti-14 commented Nov 20, 2024 • edited Loading

rafaelsideguide commented Dec 4, 2024

yupingsong-anylink-io commented Nov 11, 2024 •

edited

Loading

longmans commented Nov 11, 2024 •

edited

Loading

aakriti-14 commented Nov 20, 2024 •

edited

Loading