Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove databaseRole from metadata DB requests in SpannerIO.ReadChange… #25108

Merged
merged 1 commit into from
Jan 24, 2023

Conversation

nuggetwheat
Copy link
Contributor

This change removes the database role from the SpannerConfig that is used to access the Spanner Change Stream metadata database. The role shouldn't be passed to the Change Stream metadata database because there are no roles defined on that database. Addresses #25105.

@Abacn
Copy link
Contributor

Abacn commented Jan 20, 2023

Hi, thanks for the quick fix. Could you please first set base branch master, after merged, then cherry pick the merged commit on the master branch into release-2.45.0 branch? This guarantees every commit in release branch is seen on master

@nuggetwheat
Copy link
Contributor Author

Hi, thanks for the quick fix. Could you please first set base branch master, after merged, then cherry pick the merged commit on the master branch into release-2.45.0 branch? This guarantees every commit in release branch is seen on master

I added the commit to master, then cherry-picked it back into release-2.45.0. Is that sufficient, or do I need to create a new pull request?

@Abacn Abacn changed the base branch from release-2.45.0 to master January 21, 2023 01:31
@Abacn
Copy link
Contributor

Abacn commented Jan 21, 2023

Sorry if I was not clear. master means apache/beam:master branch. I've done that for this PR

EDIT: not working, need rebase because the current base is on a release branch...

@Abacn Abacn changed the base branch from master to release-2.45.0 January 21, 2023 01:32
@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @apilloud for label java.
R: @pabloem for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@Abacn
Copy link
Contributor

Abacn commented Jan 21, 2023

Well despite the base branch issue, shouldn't the correct behavior to be, without setting SpannerConfig.setDatabaseRole(null) the pipeline should just work fine? Sounds weird to me that we need to forcefully overwrite this option there.

@Abacn
Copy link
Contributor

Abacn commented Jan 21, 2023

I see the same fix #24177 for another use case. The appear to have same cause. Overwrite it to null here just fix this single use case, but it may again appear elsewhere. Preferably we should fix the underlying logic.

@johnjcasey
Copy link
Contributor

Run Java_GCP_IO_Direct PreCommit

@johnjcasey
Copy link
Contributor

Agreed with @Abacn . is there a general case fix instead?

@nuggetwheat
Copy link
Contributor Author

When reading from change streams there are two databases involved, 1) the primary database that contains the change stream, and 2) a metadata database (maintained by Beam) that holds partition progress state so a job can be restarted and pick up where it left off. To create a spanner config to access the metadata database, the following pattern is used:

SpannerConfig metadataConfig = spannerConfig.toBuilder()
  .setInstanceId(metadataInstanceId)
  .setDatabaseId(metadataDatabaseId)
  .build();

The metadata config (metadataConfig) is initialized using the primary database config (spannerConfig) since most of the fields are the same and then the ones that are different get overwritten. With the Fine-Grained Access Control project, we've added a new config field databaseRole that should only be set for primary database access. Unfortunately the pattern above blindly copies this field to the metadata config. A better approach might be to write a metadata config create function that copies a whitelisted set of fields from the primary config and leaves all of the other fields empty. I can implement something like this now if you'd like. Otherwise, I can create an issue and assign it to someone on the change stream team who maintains this code. What are your thoughts?

@johnjcasey
Copy link
Contributor

Lets create an issue, and have it fixed more robustly.

We can merge this workaround into the main branch and this release branch for now

@johnjcasey
Copy link
Contributor

It looks like the GCP precommit is failing with a timeout. I can't merge when it has failed after retrying it

@johnjcasey
Copy link
Contributor

Run Java_GCP_IO_Direct PreCommit

@nuggetwheat nuggetwheat reopened this Jan 23, 2023
@johnjcasey johnjcasey merged commit 1734d53 into apache:release-2.45.0 Jan 24, 2023
@Abacn
Copy link
Contributor

Abacn commented Jan 24, 2023

@johnjcasey FYI: #25108 (comment) this change was not in master; release-2.45.0 and master has diverged

@johnjcasey
Copy link
Contributor

yep, we got it merged to master as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants