Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug - Actions] All scraping engines failed! #884

Open
yupingsong-anylink-io opened this issue Nov 11, 2024 · 9 comments
Open

[Bug - Actions] All scraping engines failed! #884

yupingsong-anylink-io opened this issue Nov 11, 2024 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@yupingsong-anylink-io
Copy link

yupingsong-anylink-io commented Nov 11, 2024

Describe the Bug
When using FirecrawlApp.app.scrape_url to scrape a page, the following error is received:
Error: Internal Server Error: Failed to scrape URL. (Internal server error) - All scraping engines failed! - No additional error details provided.
The same code used to work properly.

Screenshots

Environment (please complete the following information):

  • OS: Windows
  • Firecrawl Version:1.3.1
  • python Version: 3.10

Logs

Scrape {my url} failed. Error: Internal Server Error: Failed to scrape URL. (Internal server error) - All scraping engines failed! - No additional error details provided.

Additional Context

crawler = FirecrawlApp(api_key=os.getenv("FIRECRAWL_API_KEY"))


class ExtractSchema(BaseModel):
    urls: List[str]


def search_by_keyword(search_url: str, key_word: str) -> str:
    print(f"Starting search with keyword: {key_word}")
    try:
        scrape_result = crawler.scrape_url(
            search_url,
            params={
                "formats": ["extract"],
                #Specify the HTML tags, classes and ids to include in the response.
                "includeTags": ["#t table.eps-table td.views-field-dummy-notice-title a"],
                #A prompt for the LLM to extract the data in the correct structure.
                "extract": {
                    "prompt": "Extract the url of <a> element",
                    "schema": ExtractSchema.model_json_schema()
                },
                "actions": [
                    {"type": "wait", "milliseconds": 2000},
                    {"type": "click", "selector": "#edit-words--7"},
                    {"type": "wait", "milliseconds": 500},
                    {"type": "write", "text": key_word},
                    {"type": "wait", "milliseconds": 500},
                    {"type": "press", "key": "Enter"},
                    {"type": "wait", "milliseconds": 8000}
                ]
            }
        )
        print("Search completed. Processing results...")
        return json.dumps(scrape_result)
    except Exception as e:
        print(f"Scrape {search_url} failed. Error: {e}")
        return ""
@yupingsong-anylink-io yupingsong-anylink-io added the bug Something isn't working label Nov 11, 2024
@longmans
Copy link

longmans commented Nov 11, 2024

Me, too. Mac, Python 3.12.4

@mogery
Copy link
Member

mogery commented Nov 11, 2024

Can you please share an example URL where this fails?

@mogery mogery self-assigned this Nov 11, 2024
@mogery
Copy link
Member

mogery commented Nov 11, 2024

I believe this may be fixed by f097cdd, but it is hard to debug without a URL.

@yupingsong-anylink-io
Copy link
Author

The search_url is https://canadabuys.canada.ca/en/tender-opportunities?search_filter=&status%5B87%5D=87&status%5B1920%5D=1920&pub%5B3%5D=3&record_per_page=50&current_tab=t&words=, and key_word is Walkway

@mogery
Copy link
Member

mogery commented Nov 13, 2024

Hi there, I cannot recreate the issue anymore. Is it fixed for you as well?

@yupingsong-anylink-io
Copy link
Author

I just confirmed, it's fixed on my end too, thanks!

Is there any repair done on the firecrawl server side? Or is it due to my local reasons? If this problem recurs in the future, it will be easier for me to locate it.

@mogery
Copy link
Member

mogery commented Nov 14, 2024

This was repaired server-side, in commit f097cdd. We weren't accounting for the wait actions in our timeout logic.

@mogery mogery closed this as completed Nov 14, 2024
@aakriti-14
Copy link

aakriti-14 commented Nov 20, 2024

I am also facing this similar issue and getting the error.

{"success":false,"error":"(Internal server error) - All scraping engines failed! -- Double check the URL to make sure it's not broken. If the issue persists, contact us at [[email protected]](mailto:[email protected])."}

I am providing these actions.

actions = [
      {"type": "wait", "milliseconds": 2000},  # Wait before clicking
      {"type": "click", "selector": 'button[data-v-257ec5c0]'},  # Click the "Show More" button
      ...
      (repeated 50 times)
      {"type": "scrape"}
    ]

I am using the timeout of > 10 mins as well. Can you please help here @mogery ?

@rafaelsideguide
Copy link
Collaborator

I tested for 3 runs with the following code, and "all scraping engines failed" is still happening for over 50% of scrapes.

Testing code:

import FirecrawlApp from "@mendable/firecrawl-js";

const app = new FirecrawlApp({ apiKey: "fc-<redacted>" });

const main = async () => {
    let allEnginesFailedCounter = 0;
    let successCounter = 0;
    let otherErrorCounter = 0;

    for (let i = 0; i < 100; i++) {
      console.log(`Crawl: ${i + 1}`);
      try {
          const constructedUrl = 'https://www.bolagsfakta.se/5566352844-Runlack_Industrilackering_AB';
          const scrapeResponse = await app.scrapeUrl(constructedUrl, {
              formats: ["html"],
              actions: [
                {
                      type: "wait",
                      milliseconds: 5000
                  },
                  {
                      type: "click",
                      selector: "#report-container > div:nth-child(18) > div > div > div.row > div > div > table:nth-child(4) > tbody > tr:nth-child(6)"
                  }
              ],
              onlyMainContent: false,
          });

          if (!scrapeResponse.success) {
              otherErrorCounter++;
          } else {
              successCounter++;
          }
      } catch (error) {
          if (error.message.includes("All scraping engines failed!")) {
              allEnginesFailedCounter++;
          } else {
              otherErrorCounter++;
          }
      }
    }

    console.log({
        allEnginesFailedCounter,
        successCounter,
        otherErrorCounter
    })
}

main()

run 1:

allEnginesFailedCounter: 47,
successCounter: 53,
otherErrorCounter: 0

run 2:

allEnginesFailedCounter: 43,
successCounter: 56,
otherErrorCounter: 1

run 3:

allEnginesFailedCounter: 50,
successCounter: 49,
otherErrorCounter: 1

@mogery @tomkosm any ideas?

@linear linear bot changed the title [Bug] All scraping engines failed! [Bug - Actions] All scraping engines failed! Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants