Adding message metadata in logs in case of errors #128

brunodomenici · 2021-08-30T12:23:18Z

Hello,

We have difficult to identify bad messages in ours workloads. We need to know the topic/partition/offset of problematic message.
In the PR we propose to include topic/partition/offset/timestamp information in the log In case of error in BigQuery side and also a log for schema cycle error.

Thank you

C0urante

Thanks @brunodomenici. I've left some thoughts here; let me know if you have any questions.

C0urante · 2021-09-23T13:37:22Z

...onnector/src/main/java/com/wepay/kafka/connect/bigquery/convert/BigQuerySchemaConverter.java

-import java.util.List;
-import java.util.Map;
-import java.util.Optional;
+import java.util.*;


Please revert this change; we prefer explicit import lists instead of wildcards.

C0urante · 2021-09-23T13:37:52Z

...onnector/src/main/java/com/wepay/kafka/connect/bigquery/convert/BigQuerySchemaConverter.java

@@ -44,6 +40,7 @@
 * {@link com.google.cloud.bigquery.Schema BigQuery Schemas}.
 */
 public class BigQuerySchemaConverter implements SchemaConverter<com.google.cloud.bigquery.Schema> {
+  private static final Logger logger = LoggerFactory.getLogger(AdaptiveBigQueryWriter.class);


This should have its own logging namespace instead of copying the one from the adaptive writer:

Suggested change

private static final Logger logger = LoggerFactory.getLogger(AdaptiveBigQueryWriter.class);

private static final Logger logger = LoggerFactory.getLogger(BigQuerySchemaConverter.class);

C0urante · 2021-09-23T13:38:15Z

...onnector/src/main/java/com/wepay/kafka/connect/bigquery/convert/BigQuerySchemaConverter.java

 import com.wepay.kafka.connect.bigquery.convert.logicaltype.DebeziumLogicalConverters;
 import com.wepay.kafka.connect.bigquery.convert.logicaltype.KafkaLogicalConverters;
 import com.wepay.kafka.connect.bigquery.convert.logicaltype.LogicalConverterRegistry;
 import com.wepay.kafka.connect.bigquery.convert.logicaltype.LogicalTypeConverter;
 import com.wepay.kafka.connect.bigquery.exception.ConversionConnectException;
-
+import com.wepay.kafka.connect.bigquery.write.row.AdaptiveBigQueryWriter;


This can be removed once the logging namespace is corrected.

C0urante · 2021-09-23T13:39:54Z

...nnector/src/main/java/com/wepay/kafka/connect/bigquery/write/row/AdaptiveBigQueryWriter.java

-import com.google.cloud.bigquery.InsertAllRequest;
-import com.google.cloud.bigquery.InsertAllResponse;
-
+import com.google.cloud.bigquery.*;


Please revert this change and keep the import list explicit.

C0urante · 2021-09-23T13:39:59Z

...nnector/src/main/java/com/wepay/kafka/connect/bigquery/write/row/AdaptiveBigQueryWriter.java

-import java.util.Map;
-import java.util.Set;
-import java.util.SortedMap;
+import java.util.*;


Please revert this change and keep the import list explicit.

C0urante · 2021-09-23T14:07:00Z

...onnector/src/main/java/com/wepay/kafka/connect/bigquery/convert/BigQuerySchemaConverter.java

@@ -106,7 +103,9 @@ public BigQuerySchemaConverter(boolean allFieldsNullable) {
          ConversionConnectException("Top-level Kafka Connect schema must be of type 'struct'");
    }

-    throwOnCycle(kafkaConnectSchema, new ArrayList<>());
+    if(throwOnCycle(kafkaConnectSchema, new ArrayList<>())) {
+      throw new ConversionConnectException("Kafka Connect schema contains cycle. See logs for detail");


Is the "See logs for detail" bit necessary? Why not include information on the failing schema in the exception message itself?

This is an important distinction because the message of the exception that causes the connector to fail is made visible to users via the REST API via things like the /connector/{connector}/status endpoint, whereas logs are harder to access. We should try to include important information directly in exception messages when possible so that people don't have to look at logs to get the info they need. See #150 for an example of where we're trying to improve things on that front.

Fair point. Done

C0urante · 2021-09-23T14:13:22Z

...onnector/src/main/java/com/wepay/kafka/connect/bigquery/convert/BigQuerySchemaConverter.java

    }

    if (seenSoFar.contains(kafkaConnectSchema)) {
-      throw new ConversionConnectException("Kafka Connect schema contains cycle");
+      logger.error("Cycle detected : " + kafkaConnectSchema.name());


I mention this above, but I don't think this should be a log message (or at least, not exclusively one) and can be included in the message of the exception that ends up being thrown.

Also, what happens if there's a cycle involving nested sub-schemas? Will kafkaConnectSchema.name() give us the name of the top-level schema (which is probably what we want to give users), or will it give the name of the nested sub-schema that creates the cycle (which may confuse people as it may not refer to the actual schema they've registered in, e.g., Schema Registry)?

Finally, similar to what you've done with the LoggerUtils::logRecord method, should we include metadata about the record that causes this issue in the exception message?

Ultimately, what I think would be useful here is an exception whose message explains that a cycle has been detected in the record schema, contains name of the top-level schema for the record, and has metadata on the topic/partition/offset for the record.

Yes. I agree this could be confusing. I've tried to more explicit.

We've refactored this to give both the top level schema name as well as the name of the attribute causing the cycle.
Moreover, we also wrapped the ConnectConversionException with another one to give details about the specified connect record that causes an issue, so that we didn't need to alter the SchemaConverter interface.

C0urante · 2021-09-23T14:14:04Z

...onnector/src/main/java/com/wepay/kafka/connect/bigquery/convert/BigQuerySchemaConverter.java

@@ -120,33 +119,37 @@ public BigQuerySchemaConverter(boolean allFieldsNullable) {
    return com.google.cloud.bigquery.Schema.of(fields);
  }

-  private void throwOnCycle(Schema kafkaConnectSchema, List<Schema> seenSoFar) {
+  private boolean throwOnCycle(Schema kafkaConnectSchema, List<Schema> seenSoFar) {


Nit: if we're not throwing an exception from this method anymore and are instead returning a boolean indicating whether the schema has a cycle or not, the method should be renamed to something like schemaContainsCycle.

Fair point. Done

Reverted back to throwOnCycle with additional parameters.

...onnector/src/main/java/com/wepay/kafka/connect/bigquery/convert/BigQuerySchemaConverter.java

C0urante · 2021-09-23T14:23:23Z

kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/SchemaManager.java

@@ -324,6 +324,7 @@ private TableInfo getTableInfo(TableId table, List<SinkRecord> records, Boolean
    List<com.google.cloud.bigquery.Schema> bigQuerySchemas = new ArrayList<>();
    Optional.ofNullable(readTableSchema(table)).ifPresent(bigQuerySchemas::add);
    for (SinkRecord record : records) {
+      logger.debug("Convert Schema for :"+record.topic()+"[p="+record.kafkaPartition()+"](o="+record.kafkaOffset()+")");


A few comments:

We may not actually do any conversion if the retriever returns null. This should probably be moved directly above the call to convertRecordSchema

We're only converting the value schema here; key conversion takes place elsewhere. We should note this in the message here ("Converting value schema for...") and also add a logging message for key schema conversion where that takes place

The message can be made more human-readable by eliminating shorthand syntax and becoming more verbose; something like "record with topic: , partition: , offset: "

Slf4j marker syntax (i.e., "{}") can be used instead of string concatenation

Ok for marker syntax and more human readable.
For the first point I don't understand. I think this code was already there.

This log line is replaced with a clearer error message in the ConversionConnectException.
Therefore, there is not really a need for this log line anymore.

brunodomenici · 2021-12-01T17:17:25Z

kcbq-connector/src/main/java/com/wepay/kafka/connect/bigquery/write/row/BigQueryWriter.java

@@ -189,6 +186,7 @@ private boolean isPartialFailure(SortedMap<SinkRecord, InsertAllRequest.RowToIns
    for (Map.Entry<SinkRecord, InsertAllRequest.RowToInsert> row: rows.entrySet()) {
      if (failRowsSet.contains((long)index)) {
        failRows.put(row.getKey(), row.getValue());
+        logger.trace("Failed Record: {}", row);


After digging a little, this is the import point to us. We really need to know more about failed records. I don't need to relate them with record pooled by Kafka Connect.

This simplify changes I guess.

brunodomenici · 2021-12-01T17:24:27Z

Hi @C0urante . Thanks for the review. I'm really sorry, I had no time to work on this. I'm back now.

I think I'm able to simplify a little bit.

brunodomenici · 2021-12-02T09:41:44Z

Let me rebase...and resolve conflits

Co-authored-by: Hassan Kishk <[email protected]> Co-authored-by: Bruno Domenici <[email protected]>

NicolasFruyAdeo · 2022-10-19T13:15:45Z

@C0urante Hey, sorry we've kept this PR alone for so long. We've finally come back to it.
We've rebased our branch on the current master branch.
We've tried to revert a few things and hope this new approach is satisfactory.

C0urante · 2022-10-19T13:17:26Z

@NicolasFruyAdeo I am no longer working at Confluent and am not actively reviewing PRs for this project; please reach out to the current maintainers if you'd still like to pursue this change.

NicolasFruyAdeo · 2022-10-19T13:32:56Z

Hi @ManasjyotiSharma @kapilchhajer @ypmahajan,
Would you mind taking over ownership of this PR and review our proposal to improve diagnosing issues through better error messages?

C0urante reviewed Sep 23, 2021

View reviewed changes

brunodomenici commented Dec 1, 2021

View reviewed changes

brunodomenici requested a review from C0urante December 1, 2021 17:24

brunodomenici marked this pull request as draft December 2, 2021 09:41

brunodomenici marked this pull request as ready for review December 2, 2021 14:43

hassankishk requested a review from a team as a code owner October 18, 2022 14:33

Improved error message on schema conversion exceptions.

ffd1761

Co-authored-by: Hassan Kishk <[email protected]> Co-authored-by: Bruno Domenici <[email protected]>

NicolasFruyAdeo force-pushed the adding_logs branch from bd6132d to ffd1761 Compare October 19, 2022 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding message metadata in logs in case of errors #128

Adding message metadata in logs in case of errors #128

brunodomenici commented Aug 30, 2021

C0urante left a comment

C0urante Sep 23, 2021

brunodomenici Dec 1, 2021

C0urante Sep 23, 2021

brunodomenici Dec 1, 2021

C0urante Sep 23, 2021

C0urante Sep 23, 2021

brunodomenici Dec 1, 2021

C0urante Sep 23, 2021

brunodomenici Dec 1, 2021

C0urante Sep 23, 2021

brunodomenici Dec 1, 2021

C0urante Sep 23, 2021

brunodomenici Dec 1, 2021

NicolasFruyAdeo Oct 19, 2022

C0urante Sep 23, 2021

brunodomenici Dec 1, 2021 •

edited

Loading

NicolasFruyAdeo Oct 19, 2022

C0urante Sep 23, 2021

brunodomenici Dec 1, 2021

NicolasFruyAdeo Oct 19, 2022

brunodomenici Dec 1, 2021

brunodomenici commented Dec 1, 2021

brunodomenici commented Dec 2, 2021

NicolasFruyAdeo commented Oct 19, 2022

C0urante commented Oct 19, 2022

NicolasFruyAdeo commented Oct 19, 2022

	private static final Logger logger = LoggerFactory.getLogger(AdaptiveBigQueryWriter.class);
	private static final Logger logger = LoggerFactory.getLogger(BigQuerySchemaConverter.class);

Adding message metadata in logs in case of errors #128

Are you sure you want to change the base?

Adding message metadata in logs in case of errors #128

Conversation

brunodomenici commented Aug 30, 2021

C0urante left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunodomenici Dec 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunodomenici commented Dec 1, 2021

brunodomenici commented Dec 2, 2021

NicolasFruyAdeo commented Oct 19, 2022

C0urante commented Oct 19, 2022

NicolasFruyAdeo commented Oct 19, 2022

brunodomenici Dec 1, 2021 •

edited

Loading