Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

BigQuery: fix an issue with option propagation and refactor to future-proof #540

Merged
merged 4 commits into from
Feb 16, 2017
Merged

BigQuery: fix an issue with option propagation and refactor to future-proof #540

merged 4 commits into from
Feb 16, 2017

Conversation

dhalperi
Copy link
Contributor

We created a helper in BigQueryIO to create a JobConfigurationQuery capturing all options, but we had not yet propagated this cleanup into the Services abstraction or helper classes.

Refactor BigQueryServices and BigQueryTableRowIterator to propagate the same configuration.

Adds a new deprecated constructor to BigQueryTableRowIterator for backwards-compatibility.

This fixes #539.

…-proof

* We created a helper in BigQueryIO to create a JobConfigurationQuery capturing all options,
  but we had not yet propagated this cleanup into the Services abstraction or helper classes.

Refactor BigQueryServices and BigQueryTableRowIterator to propagate the same configuration.

Adds a new deprecated constructor to BigQueryTableRowIterator for backwards-compatibility.

This fixes #539.
@dhalperi dhalperi requested a review from peihe January 30, 2017 20:46
@dhalperi
Copy link
Contributor Author

@peihe given the amount of code divergence, they both need careful review. Please review here

@peihe
Copy link
Contributor

peihe commented Jan 30, 2017

But, I think we should still do the Beam first, otherwise it will diverge further more.

And, I think forward ports PRs could cause additional inconvenience during review and backport.

@peihe
Copy link
Contributor

peihe commented Jan 30, 2017

commented on apache/beam#1873

Let's get Beam PR LGTMed, and then update this accordingly.

@peihe
Copy link
Contributor

peihe commented Feb 1, 2017

Update this PR based on apache/beam#1873?

@dhalperi
Copy link
Contributor Author

dhalperi commented Feb 1, 2017

The code is substantially different here, plus we are unable to make backwards-incompatible changes. Needs separate review.

Copy link
Contributor

@peihe peihe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

I am okay to review it separately, as long as we make it clear that we don't need to keep Dataflow code less divergent from Beam code.

.setFlattenResults(flattenResults)
.setPriority("BATCH")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert BATCH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1167,9 +1168,12 @@ private void executeQuery(
}

private JobConfigurationQuery createBasicQueryConfig() {
// Due to deprecated functionality, if this function is updated
// then the similar code in BigQueryTableRowIterator#fromQuery should be updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that better to have a
public static createBasicQueryConfig(String query, boolean flattenResults, boolean useLegacySql)
and, use it in two places?

checkNotNull(queryConfig, "queryConfig");
checkNotNull(projectId, "projectId");
checkNotNull(client, "client");
return new BigQueryTableRowIterator(null, queryConfig, projectId, client);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
null /* ref */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not refactoring in dataflow 1.x, this is a minimal fix only.

.setPriority("BATCH")
.setQuery(query)
.setUseLegacySql(MoreObjects.firstNonNull(useLegacySql, Boolean.TRUE));
return new BigQueryTableRowIterator(null, queryConfig, projectId, client);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
null /* ref */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not refactoring in dataflow 1.x, this is a minimal fix only.

Copy link
Contributor Author

@dhalperi dhalperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @peihe

.setFlattenResults(flattenResults)
.setPriority("BATCH")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

.setPriority("BATCH")
.setQuery(query)
.setUseLegacySql(MoreObjects.firstNonNull(useLegacySql, Boolean.TRUE));
return new BigQueryTableRowIterator(null, queryConfig, projectId, client);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not refactoring in dataflow 1.x, this is a minimal fix only.

checkNotNull(queryConfig, "queryConfig");
checkNotNull(projectId, "projectId");
checkNotNull(client, "client");
return new BigQueryTableRowIterator(null, queryConfig, projectId, client);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not refactoring in dataflow 1.x, this is a minimal fix only.

@peihe
Copy link
Contributor

peihe commented Feb 16, 2017

LGTM

@dhalperi dhalperi merged commit fc5fee2 into GoogleCloudPlatform:master Feb 16, 2017
@dhalperi
Copy link
Contributor Author

Thanks!

@dhalperi dhalperi deleted the bigquery-direct-standard-sql branch February 16, 2017 22:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DirectPipelineRunner doesn't support StandardSql with BigQueryIO.READ
3 participants