-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable max rows per streaming request #237
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @FreCap and apologies for the delay in reviewing. I have left a few comments, please take a look when you get chance.
@@ -93,6 +93,18 @@ public class BigQuerySinkConfig extends AbstractConfig { | |||
"The interval, in seconds, in which to attempt to run GCS to BQ load jobs. Only relevant " | |||
+ "if enableBatchLoad is configured."; | |||
|
|||
public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to - maxRowsPerRequest
@@ -93,6 +93,18 @@ public class BigQuerySinkConfig extends AbstractConfig { | |||
"The interval, in seconds, in which to attempt to run GCS to BQ load jobs. Only relevant " | |||
+ "if enableBatchLoad is configured."; | |||
|
|||
public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest"; | |||
private static final ConfigDef.Type BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE = ConfigDef.Type.INT; | |||
private static final Integer BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT = 50000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have the default behaviour same. We can use '-1' to say this is disabled and have that as the default
private static final ConfigDef.Type BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE = ConfigDef.Type.INT; | ||
private static final Integer BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT = 50000; | ||
private static final ConfigDef.Importance BQ_STREAMING_MAX_ROWS_PER_REQUEST_IMPORTANCE = ConfigDef.Importance.LOW; | ||
private static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_DOC = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maximum number of rows to be sent in one batch in the request payload to bigquery.
This can reduce number of failed calls due to Request Too Large
if the payload exceeds BigQuery specified quota limits. (https://cloud.google.com/bigquery/quotas#write-api-limits)
Setting it to a low value can result in degraded performance of the connector
"that would return a `Request Too Large` before finding the right size. " + | ||
"This config allows starting from a lower value altogether and reduce the amount of failed requests. " + | ||
"Only works with simple TableWriter (no GCS)"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a validator as well with minimum and maximum values allowed.
-1 -> default
1 -> min
50,000 -> max (https://cloud.google.com/bigquery/quotas#write-api-limits)
Hi! Unfortunately I moved on another project, but feel free to pick up the
MR.
Thank you,
Francesco
…On Mon, Aug 28, 2023 at 08:45 Bhagyashree ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Thanks for the PR, @FreCap <https://github.com/FreCap> and apologies for
the delay in reviewing. I have left a few comments, please take a look when
you get chance.
------------------------------
In
kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/config/BigQuerySinkConfig.java
<#237 (comment)>
:
> @@ -93,6 +93,18 @@ public class BigQuerySinkConfig extends AbstractConfig {
"The interval, in seconds, in which to attempt to run GCS to BQ load jobs. Only relevant "
+ "if enableBatchLoad is configured.";
+ public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest";
Can we rename this to - maxRowsPerRequest
------------------------------
In
kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/config/BigQuerySinkConfig.java
<#237 (comment)>
:
> @@ -93,6 +93,18 @@ public class BigQuerySinkConfig extends AbstractConfig {
"The interval, in seconds, in which to attempt to run GCS to BQ load jobs. Only relevant "
+ "if enableBatchLoad is configured.";
+ public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest";
+ private static final ConfigDef.Type BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE = ConfigDef.Type.INT;
+ private static final Integer BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT = 50000;
Let's have the default behaviour same. We can use '-1' to say this is
disabled and have that as the default
------------------------------
In
kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/config/BigQuerySinkConfig.java
<#237 (comment)>
:
> @@ -93,6 +93,18 @@ public class BigQuerySinkConfig extends AbstractConfig {
"The interval, in seconds, in which to attempt to run GCS to BQ load jobs. Only relevant "
+ "if enableBatchLoad is configured.";
+ public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest";
+ private static final ConfigDef.Type BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE = ConfigDef.Type.INT;
+ private static final Integer BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT = 50000;
+ private static final ConfigDef.Importance BQ_STREAMING_MAX_ROWS_PER_REQUEST_IMPORTANCE = ConfigDef.Importance.LOW;
+ private static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_DOC =
The maximum number of rows to be sent in one batch in the request payload
to bigquery.
This can reduce number of failed calls due to Request Too Large if the
payload exceeds BigQuery specified quota limits. (
https://cloud.google.com/bigquery/quotas#write-api-limits)
Setting it to a low value can result in degraded performance of the
connector
------------------------------
In
kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/config/BigQuerySinkConfig.java
<#237 (comment)>
:
> @@ -93,6 +93,18 @@ public class BigQuerySinkConfig extends AbstractConfig {
"The interval, in seconds, in which to attempt to run GCS to BQ load jobs. Only relevant "
+ "if enableBatchLoad is configured.";
+ public static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_CONFIG = "bqStreamingMaxRowsPerRequest";
+ private static final ConfigDef.Type BQ_STREAMING_MAX_ROWS_PER_REQUEST_TYPE = ConfigDef.Type.INT;
+ private static final Integer BQ_STREAMING_MAX_ROWS_PER_REQUEST_DEFAULT = 50000;
+ private static final ConfigDef.Importance BQ_STREAMING_MAX_ROWS_PER_REQUEST_IMPORTANCE = ConfigDef.Importance.LOW;
+ private static final String BQ_STREAMING_MAX_ROWS_PER_REQUEST_DOC =
+ "Due to BQ streaming put limitations, the max request size is 10MB. " +
+ "Hence, considering that in average 1 record takes at least 20 bytes, " +
+ "if we have big batches (e.g. 500000) we might need to run against BigQuery multiple requests " +
+ "that would return a `Request Too Large` before finding the right size. " +
+ "This config allows starting from a lower value altogether and reduce the amount of failed requests. " +
+ "Only works with simple TableWriter (no GCS)";
+
Lets add a validator as well with minimum and maximum values allowed.
-1 -> default
1 -> min
50,000 -> max (https://cloud.google.com/bigquery/quotas#write-api-limits)
—
Reply to this email directly, view it on GitHub
<#237 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM4TGJ3Y2A2JRXQEVPMLRTXXSHF5ANCNFSM6AAAAAAQM4XRKA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Due to BQ streaming put limitations, the max request size is 10MB.
Hence, considering that in average 1 record takes at least 20 bytes, if we have big batches (e.g. 500000) we might need to run against BigQuery multiple requests that would return a
Request Too Large
before finding the right size.This config allows starting from a lower value altogether and reduce the amount of failed requests. Only works with simple TableWriter (no GCS)
Otherwise this can lead to