Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(BQ Python) Fix streaming with large loads by performing job waits in finish_bundle #23012

Merged
merged 12 commits into from
Sep 14, 2022

Conversation

ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Sep 2, 2022

Streaming FILE_LOADS is currently not working with large loads. After the first load, it lags because WaitForBQJobs is only triggered once with beam.Create([None]). These changes move the wait into the finish_bundle of its respective step.

Note: Streaming with small loads is done in one step and thus was not getting stuck.

Also adding tests that more accurately determine if streaming with batch loads is working.

Needs #23011 to run tests correctly

Fixes #23104

@ahmedabu98
Copy link
Contributor Author

Waiting for #23011

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @pabloem for label python.
R: @johnjcasey for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

…ing to separate outputs in the same finish_bundle
@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

Merging #23012 (80f399b) into master (ebacef9) will increase coverage by 0.00%.
The diff coverage is 79.48%.

@@           Coverage Diff           @@
##           master   #23012   +/-   ##
=======================================
  Coverage   73.58%   73.59%           
=======================================
  Files         716      716           
  Lines       95311    95327   +16     
=======================================
+ Hits        70138    70153   +15     
- Misses      23877    23878    +1     
  Partials     1296     1296           
Flag Coverage Δ
python 83.41% <79.48%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/gcp/bigquery.py 74.24% <ø> (ø)
...s/python/apache_beam/io/gcp/bigquery_file_loads.py 87.24% <79.48%> (-0.47%) ⬇️
...eam/runners/portability/fn_api_runner/execution.py 92.44% <0.00%> (-0.65%) ⬇️
sdks/go/pkg/beam/util/gcsx/gcs.go 27.41% <0.00%> (ø)
sdks/go/pkg/beam/artifact/stage.go 61.87% <0.00%> (ø)
sdks/go/pkg/beam/io/filesystem/util.go 96.29% <0.00%> (ø)
sdks/go/pkg/beam/io/filesystem/memfs/memory.go 96.15% <0.00%> (ø)
...hon/apache_beam/runners/worker/bundle_processor.py 93.54% <0.00%> (ø)
sdks/python/apache_beam/runners/direct/executor.py 97.01% <0.00%> (+0.54%) ⬆️
sdks/python/apache_beam/internal/metrics/metric.py 94.00% <0.00%> (+1.00%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ahmedabu98
Copy link
Contributor Author

Run Python 3.8 PostCommit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: WriteToBigQuery with file_loads and dynamic table destination doesn't load after first File Load
2 participants