Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery): Configurable table read session project #10924

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

juanli16
Copy link

@juanli16 juanli16 commented Sep 26, 2024

When creating a BigQuery table RowIterator with StorageReadAPI enabled, the read session is created by default in the table's project. We should be able to overwrite this, so that we can keep the storage of the table data and the permission/cost management of the read session separate.

To accomplish this, TableReadOption with WithClientProject option is added, and if set, it will create the session using the client's project, otherwise, it keeps the default behaviour.

@juanli16 juanli16 requested review from a team as code owners September 26, 2024 16:05
@juanli16 juanli16 requested a review from tswast September 26, 2024 16:05
Copy link

google-cla bot commented Sep 26, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Sep 26, 2024
@juanli16 juanli16 changed the title bigquery: Create table read session using the client's project feat(bigquery): Configurable table read session project Sep 26, 2024
@alvarowolfx alvarowolfx requested review from alvarowolfx and removed request for tswast September 27, 2024 16:22
@alvarowolfx
Copy link
Contributor

hey @juanli16 thanks for the contribution. Some question on this improvement and your use case ?

Do you have multiple billing project that can be target for different table reads or it's the same project ID that you want to configure the same as the project ID set up on on the BigQuery client ? We make this configuration more global to the ReadClient and use the projectID already on the bigquery.Client.

One concern that I'm having here is changing the signature of the Table.Read method to add client options, we thought on doing that in the past, but we always go back and forth on that idea. I'd rather prefer this for now to be an option on the ReadCient, like I started on PR #10580. I can check to merge PR #10580 and then you turn this improvement into a ReadClient level option, what do you think ?

And can we add integration tests ? Probably exercise reading a public dataset table and creating a session from the main project ID that the client is set up.

@juanli16
Copy link
Author

Do you have multiple billing project that can be target for different table reads or it's the same project ID that you want to configure the same as the project ID set up on on the BigQuery client ?

Yes, in our use case, we have the same billing project that we want to use for reading tables stored in different projects.

I'd rather prefer this for now to be an option on the ReadCient, like I started on PR #10580. I can check to merge PR #10580 and then you turn this improvement into a ReadClient level option, what do you think ?

This makes sense to me, it will make the setting more global, and we won't have to set it per table read.

And certainly, I can try to add integration test for it.

@fsaintjacques
Copy link

It's more about having right to read from a table but without billing access to where it lives. In fact, in our use case, the service account does have the permissions to read on the table, but no other permissions including on the project where said table lives.

It's akin to controlling in which project job.insert is submitted, sometimes you simply don't have the choice (think public datasets).

gcf-merge-on-green bot pushed a commit that referenced this pull request Oct 1, 2024
When reading result sets using the Storage Read API Acceleration enabled, currently the read session is created by default in the table's project. This works for cases where the destination table is not specified and automatically created, which defaults to the project where the the query or job was created. But when reading a table directly or specifying a destination table, it doesn't work in cases where the client doesn't have BQ Storage permissions (just table read permission for example). This is a common use case where some customers have a main billing project and this project has access to other GCP projects with just permission to read data from BigQuery tables.

With this PR, we default to use the defined Query/Job projectID (which defaults to the current `bigquery.Client.projectID`   or when reading the a table directly, we also use default to the `bigquery.Client.projectID`.

Reported initially on PR #10924

~Supersedes #10924~
@juanli16 juanli16 force-pushed the table-read-session-project branch from 0746c09 to 9d5da3c Compare October 1, 2024 22:10
@juanli16
Copy link
Author

juanli16 commented Oct 2, 2024

Hi @alvarowolfx , I have rebased my PR on top of your change in #10932 which has been merged. Now it only introduces the TableReadOption that allows to use different project for read sessions, with a fallback that uses the original bigquery client's PR. Let me know if this is still desired, or I can close it. As your PR already addressed my original concern :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants