-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery): use storage api for query jobs #6822
feat(bigquery): use storage api for query jobs #6822
Conversation
cbbe454
to
8c73313
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few initial comments
24f03c9
to
cca00a8
Compare
460ba9b
to
24db8c3
Compare
return nil | ||
} | ||
|
||
func (it *arrowIterator) processStream(readStream string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation is appropriate for a unary call, but could be refined to something more stream oriented (e.g. fail after N consecutive attempts, reset attempts when we successfully request).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to ship, once you address the remaining items. Left one more comment as a holdover from our chat today about selecting the correct statement from a multi-statement execution (script) where there's different types of statements.
// This function uses a naive approach of checking the root level query | ||
// ( ignoring subqueries, functions calls, etc ) and checking | ||
// if it contains an ORDER BY clause. | ||
func HasOrderedResults(sql string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable approach, though something like unmatched parens in an inline comment might muck this up. To get more right requires more SQL parsing than we'd want to do locally, and the resolution is fairly simple in these instances.
🤖 I have created a release *beep* *boop* --- ## [1.46.0](https://togithub.com/googleapis/google-cloud-go/compare/bigquery/v1.45.0...bigquery/v1.46.0) (2023-02-06) ### Features * **bigquery:** Add dataset/table collation ([#7235](https://togithub.com/googleapis/google-cloud-go/issues/7235)) ([9f7bbeb](https://togithub.com/googleapis/google-cloud-go/commit/9f7bbeb466bd7572544c4178a33370a25b5f1496)) * **bigquery:** Use storage api for query jobs ([#6822](https://togithub.com/googleapis/google-cloud-go/issues/6822)) ([26c04f4](https://togithub.com/googleapis/google-cloud-go/commit/26c04f4cd5083b4aa3c219500572d3af2f291645)) ### Bug Fixes * **bigquery:** Create/update an isolated dataset for collation feature ([#7256](https://togithub.com/googleapis/google-cloud-go/issues/7256)) ([b371558](https://togithub.com/googleapis/google-cloud-go/commit/b3715585aa6892fc41a29027694c72f31390441a)) * **bigquery:** Fetch dst table for jobs when readings with Storage API ([#7325](https://togithub.com/googleapis/google-cloud-go/issues/7325)) ([0bf80d7](https://togithub.com/googleapis/google-cloud-go/commit/0bf80d72a893755adefdead900e8990ed53d9627)), refs [#7322](https://togithub.com/googleapis/google-cloud-go/issues/7322) --- This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
Initial work on using the Storage API for fetching results of a query. This is more efficient because it can download data in parallel by splitting the read session and using Arrow as a more efficient format. The API surface for users stay the same, with them being able to transform query results into user defined structs. Under the hood the library will take care of converting data represented in Arrow to the user defined struct.
One thing to note is that this introduces the first external dependency on the Apache Arrow Go library.
Initially we are gonna use it as an experimental feature and explicit ask users to create a
bqStorage.BigQueryReadClient
.Proposed by issue #3880 and work on the Python library https://medium.com/google-cloud/announcing-google-cloud-bigquery-version-1-17-0-1fc428512171