-
Notifications
You must be signed in to change notification settings - Fork 322
BigQuery: fix an issue with option propagation and refactor to future-proof #540
BigQuery: fix an issue with option propagation and refactor to future-proof #540
Conversation
…-proof * We created a helper in BigQueryIO to create a JobConfigurationQuery capturing all options, but we had not yet propagated this cleanup into the Services abstraction or helper classes. Refactor BigQueryServices and BigQueryTableRowIterator to propagate the same configuration. Adds a new deprecated constructor to BigQueryTableRowIterator for backwards-compatibility. This fixes #539.
@peihe given the amount of code divergence, they both need careful review. Please review here |
But, I think we should still do the Beam first, otherwise it will diverge further more. And, I think forward ports PRs could cause additional inconvenience during review and backport. |
commented on apache/beam#1873 Let's get Beam PR LGTMed, and then update this accordingly. |
Update this PR based on apache/beam#1873? |
The code is substantially different here, plus we are unable to make backwards-incompatible changes. Needs separate review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
I am okay to review it separately, as long as we make it clear that we don't need to keep Dataflow code less divergent from Beam code.
.setFlattenResults(flattenResults) | ||
.setPriority("BATCH") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert BATCH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -1167,9 +1168,12 @@ private void executeQuery( | |||
} | |||
|
|||
private JobConfigurationQuery createBasicQueryConfig() { | |||
// Due to deprecated functionality, if this function is updated | |||
// then the similar code in BigQueryTableRowIterator#fromQuery should be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that better to have a
public static createBasicQueryConfig(String query, boolean flattenResults, boolean useLegacySql)
and, use it in two places?
checkNotNull(queryConfig, "queryConfig"); | ||
checkNotNull(projectId, "projectId"); | ||
checkNotNull(client, "client"); | ||
return new BigQueryTableRowIterator(null, queryConfig, projectId, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
null /* ref */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not refactoring in dataflow 1.x, this is a minimal fix only.
.setPriority("BATCH") | ||
.setQuery(query) | ||
.setUseLegacySql(MoreObjects.firstNonNull(useLegacySql, Boolean.TRUE)); | ||
return new BigQueryTableRowIterator(null, queryConfig, projectId, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
null /* ref */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not refactoring in dataflow 1.x, this is a minimal fix only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @peihe
.setFlattenResults(flattenResults) | ||
.setPriority("BATCH") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
.setPriority("BATCH") | ||
.setQuery(query) | ||
.setUseLegacySql(MoreObjects.firstNonNull(useLegacySql, Boolean.TRUE)); | ||
return new BigQueryTableRowIterator(null, queryConfig, projectId, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not refactoring in dataflow 1.x, this is a minimal fix only.
checkNotNull(queryConfig, "queryConfig"); | ||
checkNotNull(projectId, "projectId"); | ||
checkNotNull(client, "client"); | ||
return new BigQueryTableRowIterator(null, queryConfig, projectId, client); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not refactoring in dataflow 1.x, this is a minimal fix only.
LGTM |
Thanks! |
We created a helper in BigQueryIO to create a JobConfigurationQuery capturing all options, but we had not yet propagated this cleanup into the Services abstraction or helper classes.
Refactor BigQueryServices and BigQueryTableRowIterator to propagate the same configuration.
Adds a new deprecated constructor to BigQueryTableRowIterator for backwards-compatibility.
This fixes #539.