Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: BigQueryIO taking a long time to initialize when using STORAGE_API_WRITE_INCONSISTENT with CREATE_IF_NEEDED and many tables #25112

Closed
1 of 15 tasks
reuvenlax opened this issue Jan 22, 2023 · 3 comments

Comments

@reuvenlax
Copy link
Contributor

What happened?

With STORAGE_API_WRITE_INCONSISTENT every table is written from every worker. The current implementation of CREATE_IF_NEEDED uses GetTable to determine if a table doesn't exist, however GetTable has a default quota of 100 QPS. If writing to 1000 tables on 360 workers, it will take an hour just for all the workers to determine whether the tables exist!

Suggested fix: instead of preemptively checking the table, we should go ahead with the normal Storage API write stream calls and detect errors that indicate the table doesn't exist.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@damccorm
Copy link
Contributor

Per my comment here #25113 (comment) I don't think this is a release blocker so I'm going to remove the milestone in preparation for the cut next Wednesday. If you disagree, please readd the blocker and we can discuss

@damccorm damccorm removed this from the 2.46.0 Release milestone Feb 17, 2023
reuvenlax added a commit that referenced this issue May 22, 2023
@kennknowles
Copy link
Member

@reuvenlax @Abacn is this expected to be resolved by the associated pull requests? do we have reliable reproduction and proof of resolution?

@Abacn
Copy link
Contributor

Abacn commented Sep 6, 2023

From #25113 this is fixed

note - the associated test has observed flaky: #27314 - though appeared to be infrequent

@Abacn Abacn closed this as completed Sep 6, 2023
@github-actions github-actions bot added this to the 2.51.0 Release milestone Sep 6, 2023
cushon pushed a commit to cushon/beam that referenced this issue May 24, 2024
…to implement CREATE_IF_NEEDED to avoid low quotas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants