KAFKA-16858: Throw DataException from validateValue on array and map schemas without inner schemas #16161

gharris1727 · 2024-05-31T23:42:27Z

The SchemaBuilder interface allows for objects with null valueSchema and/or keySchema to be constructed. These currently cause the validateValue to throw an NPE on non-empty containers. This changes empty and non-empty containers to throw DataException instead.

The first commit rewrites some existing tests, and adds assertions for the prior behavior. Look at only changes in the later commits to see the actual behavior change.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

Signed-off-by: Greg Harris <[email protected]>

…p schemas don't have inner schemas Signed-off-by: Greg Harris <[email protected]>

gharris1727 · 2024-06-03T21:52:49Z

Hey @C0urante @yashmayya Could either of you PTAL? Thanks!

C0urante

Can you expound on the motivation for this change? IIRC we began to allow null element schemas in order to support schema inference for empty collections parsed by the Values class, which is valid IMO.

gharris1727 · 2024-06-04T16:53:32Z

Can you expound on the motivation for this change? IIRC we began to allow null element schemas in order to support schema inference for empty collections parsed by the Values class, which is valid IMO.

Framed in that way, I also think it would make sense to try to infer a schema for empty lists. But I think that decision is undesirable in the broader context of the Connect ecosystem.

I tried to determine how a null schema is interpreted in the Connect type-system. When used in a ConnectRecord, it's a top-type, as it allows the Object value to be any type. The way Values uses it is also as a top-type, because a heterogeneous array infers a null schema. The current behavior for validateValue is as a bottom-type, as no value is valid for a null schema (although this seems to be an accident).

Much of our infrastructure is not prepared for element schemas to be null, such as the SchemaBuilder#array() and SchemaBuilder#map() methods, which still have null guards:

kafka/connect/api/src/main/java/org/apache/kafka/connect/data/SchemaBuilder.java

Lines 361 to 383 in 55d38ef

    
               public static SchemaBuilder array(Schema valueSchema) { 
        
                   if (null == valueSchema) 
        
                       throw new SchemaBuilderException("valueSchema cannot be null."); 
        
                   SchemaBuilder builder = new SchemaBuilder(Type.ARRAY); 
        
                   builder.valueSchema = valueSchema; 
        
                   return builder; 
        
               } 
        
               /** 
        
                * @param keySchema the schema for keys in the map 
        
                * @param valueSchema the schema for values in the map 
        
                * @return a new {@link Schema.Type#MAP} SchemaBuilder 
        
                */ 
        
               public static SchemaBuilder map(Schema keySchema, Schema valueSchema) { 
        
                   if (null == keySchema) 
        
                       throw new SchemaBuilderException("keySchema cannot be null."); 
        
                   if (null == valueSchema) 
        
                       throw new SchemaBuilderException("valueSchema cannot be null."); 
        
                   SchemaBuilder builder = new SchemaBuilder(Type.MAP); 
        
                   builder.keySchema = keySchema; 
        
                   builder.valueSchema = valueSchema; 
        
                   return builder; 
        
               }

The field builder has a null guard:

kafka/connect/api/src/main/java/org/apache/kafka/connect/data/SchemaBuilder.java

Lines 326 to 327 in 55d38ef

    
           if (null == fieldSchema) 
        
               throw new SchemaBuilderException("fieldSchema for field " + fieldName + " cannot be null.");

and obviously the validateValue method was written without null schemas in-mind.

If we changed all of the above functions to remove the null guards, it would permit schemaless data within schema'd data. This would allow data sources to break the single schema vs schemaless check that most SMTs use to determine if they're operating on schema'd vs schemaless data. Rather than receiving DataExceptions when assembling the structs in the data sources, it would likely surface as NPEs in data sinks.

So I don't think we can change validateValue to be top-type without risking more bad data and NPEs overall. I also don't think it's viable to make ConnectRecord bottom-type, as schemaless data is intentionally supported. I think the best path forward is to attempt to align with the SchemaBuilder semantics before the SchemaBuilder(Type) constructor was made public, and disallow null element schemas entirely. Then, the assumption that the root Schema in ConnectRecord should be the only nullable schema will be correct. Of course, we still have to change the Values class to be more conservative with the types it outputs, which I opened this ticket for: https://issues.apache.org/jira/browse/KAFKA-16870 .

One of the good things for us is that the null-schema'd collections from the Values class and SimpleHeaderConverter don't commonly make it into the key or value, or as fields in a struct, because AK doesn't have HeaderTo$Key or HeaderTo$Value. This ticket actually isn't motivated by the Values parsing for that reason.

C0urante

Thanks Greg, I find the rationale about all-or-nothing expectations for schemas pretty convincing. Left a few comments on the implementation and tests, nothing major though.

C0urante · 2024-06-04T18:45:06Z

connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java

-                    validateValue(schema.valueSchema(), entry);
+                Schema arrayValueSchema = assertSchemaNotNull(name, "elements", schema.valueSchema());
+                for (Object entry : array) {
+                    validateValue("entry", arrayValueSchema, entry);


The field name doesn't seem like it'll be very useful here; can we just use the nameless variant? Something like "array element" could possibly work but too but it'd still lack the name of the field, which is IMO the only really useful piece of info.

It also seems a little strange with the error messages implying that the problematic value is a top-level value for the field, instead of a sub-value of it: Invalid Java object for schema with type STRING: class java.lang.Boolean for field: "entry". Something like Invalid Java object for schema with type STRING: class java.lang.Boolean for element of array field: "field" could be less ambiguous.

I agree that the field name sucks here. I don't think that justifies keeping the old behavior, especially where field "null" appears in the exception :)

I've implemented your suggestion with a new validateValue method, so that existing external callers passing a field name will keep the existing quoting.

connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java

connect/api/src/test/java/org/apache/kafka/connect/data/ConnectSchemaTest.java

Signed-off-by: Greg Harris <[email protected]>

C0urante

Ah, nice touch with the location variant. LGTM!

gharris1727 · 2024-06-05T18:34:20Z

Test failures appear unrelated, and the connect tests pass for me locally.

…schemas without inner schemas (#16161) Signed-off-by: Greg Harris <[email protected]> Reviewers: Chris Egerton <[email protected]>

commit ee834d9 Author: Antoine Pourchet <[email protected]> Date: Thu Jun 6 15:20:48 2024 -0600 KAFKA-15045: (KIP-924 pt. 19) Update to new AssignmentConfigs (apache#16219) This PR updates all of the streams task assignment code to use the new AssignmentConfigs public class. Reviewers: Anna Sophie Blee-Goldman <[email protected]> commit 8a2bc3a Author: Bruno Cadonna <[email protected]> Date: Thu Jun 6 21:19:52 2024 +0200 KAFKA-16903: Consider produce error of different task (apache#16222) A task does not know anything about a produce error thrown by a different task. That might lead to a InvalidTxnStateException when a task attempts to do a transactional operation on a producer that failed due to a different task. This commit stores the produce exception in the streams producer on completion of a send instead of the record collector since the record collector is on task level whereas the stream producer is on stream thread level. Since all tasks use the same streams producer the error should be correctly propagated across tasks of the same stream thread. For EOS alpha, this commit does not change anything because each task uses its own producer. The send error is still on task level but so is also the transaction. Reviewers: Matthias J. Sax <[email protected]> commit 7d832cf Author: David Jacot <[email protected]> Date: Thu Jun 6 21:19:20 2024 +0200 KAFKA-14701; Move `PartitionAssignor` to new `group-coordinator-api` module (apache#16198) This patch moves the `PartitionAssignor` interface and all the related classes to a newly created `group-coordinator/api` module, following the pattern used by the storage and tools modules. Reviewers: Ritika Reddy <[email protected]>, Jeff Kim <[email protected]>, Chia-Ping Tsai <[email protected]> commit 79ea7d6 Author: Mickael Maison <[email protected]> Date: Thu Jun 6 20:28:39 2024 +0200 MINOR: Various cleanups in clients (apache#16193) Reviewers: Chia-Ping Tsai <[email protected]> commit a41f7a4 Author: Murali Basani <[email protected]> Date: Thu Jun 6 18:06:25 2024 +0200 KAFKA-16884 Refactor RemoteLogManagerConfig with AbstractConfig (apache#16199) Reviewers: Greg Harris <[email protected]>, Kamal Chandraprakash <[email protected]>, Chia-Ping Tsai <[email protected]> commit 0ed104c Author: Kamal Chandraprakash <[email protected]> Date: Thu Jun 6 21:26:08 2024 +0530 MINOR: Cleanup the storage module unit tests (apache#16202) - Use SystemTime instead of MockTime when time is not mocked - Use static assertions to reduce the line length - Fold the lines if it exceeds the limit - rename tp0 to tpId0 when it refers to TopicIdPartition Reviewers: Kuan-Po (Cooper) Tseng <[email protected]>, Chia-Ping Tsai <[email protected]> commit f36a873 Author: Cy <[email protected]> Date: Thu Jun 6 08:46:49 2024 -0700 MINOR: Added test for ClusterConfig#displayTags (apache#16110) Reviewers: Chia-Ping Tsai <[email protected]> commit 226f3c5 Author: Sanskar Jhajharia <[email protected]> Date: Thu Jun 6 18:48:23 2024 +0530 MINOR: Code cleanup in metadata module (apache#16065) Reviewers: Mickael Maison <[email protected]> commit ebe1e96 Author: Loïc GREFFIER <[email protected]> Date: Thu Jun 6 13:40:31 2024 +0200 KAFKA-16448: Add ProcessingExceptionHandler interface and implementations (apache#16187) This PR is part of KAFKA-16448 which aims to bring a ProcessingExceptionHandler to Kafka Streams in order to deal with exceptions that occur during processing. This PR brings ProcessingExceptionHandler interface and default implementations. Co-authored-by: Dabz <[email protected]> Co-authored-by: sebastienviale <[email protected]> Reviewer: Bruno Cadonna <[email protected]> commit b74b182 Author: Lianet Magrans <[email protected]> Date: Thu Jun 6 09:45:36 2024 +0200 KAFKA-16786: Remove old assignment strategy usage in new consumer (apache#16214) Remove usage of the partition.assignment.strategy config in the new consumer. This config is deprecated with the new consumer protocol, so the AsyncKafkaConsumer should not use or validate the property. Reviewers: Lucas Brutschy <[email protected]> commit f880ad6 Author: Alyssa Huang <[email protected]> Date: Wed Jun 5 23:30:49 2024 -0700 KAFKA-16530: Fix high-watermark calculation to not assume the leader is in the voter set (apache#16079) 1. Changing log message from error to info - We may expect the HW calculation to give us a smaller result than the current HW in the case of quorum reconfiguration. We will continue to not allow the HW to actually decrease. 2. Logic for finding the updated LeaderEndOffset for updateReplicaState is changed as well. We do not assume the leader is in the voter set and check the observer states as well. 3. updateLocalState now accepts an additional "lastVoterSet" param which allows us to update the leader state with the last known voters. any nodes in this set but not in voterStates will be added to voterStates and removed from observerStates, any nodes not in this set but in voterStates will be removed from voterStates and added to observerStates Reviewers: Luke Chen <[email protected]>, José Armando García Sancio <[email protected]> commit 3835515 Author: Okada Haruki <[email protected]> Date: Thu Jun 6 15:10:13 2024 +0900 KAFKA-16541 Fix potential leader-epoch checkpoint file corruption (apache#15993) A patch for KAFKA-15046 got rid of fsync on LeaderEpochFileCache#truncateFromStart/End for performance reason, but it turned out this could cause corrupted leader-epoch checkpoint file on ungraceful OS shutdown, i.e. OS shuts down in the middle when kernel is writing dirty pages back to the device. To address this problem, this PR makes below changes: (1) Revert LeaderEpochCheckpoint#write to always fsync (2) truncateFromStart/End now call LeaderEpochCheckpoint#write asynchronously on scheduler thread (3) UnifiedLog#maybeCreateLeaderEpochCache now loads epoch entries from checkpoint file only when current cache is absent Reviewers: Jun Rao <[email protected]> commit 7763243 Author: Florin Akermann <[email protected]> Date: Thu Jun 6 00:22:31 2024 +0200 KAFKA-12317: Update FK-left-join documentation (apache#15689) FK left-join was changed via KIP-962. This PR updates the docs accordingly. Reviewers: Ayoub Omari <[email protected]>, Matthias J. Sax <[email protected]> commit 1134520 Author: Ayoub Omari <[email protected]> Date: Thu Jun 6 00:05:04 2024 +0200 KAFKA-16573: Specify node and store where serdes are needed (apache#15790) Reviewers: Matthias J. Sax <[email protected]>, Bruno Cadonna <[email protected]>, Anna Sophie Blee-Goldman <[email protected]> commit 896af1b Author: Sanskar Jhajharia <[email protected]> Date: Thu Jun 6 01:46:59 2024 +0530 MINOR: Raft module Cleanup (apache#16205) Reviewers: Chia-Ping Tsai <[email protected]> commit 0109a3f Author: Antoine Pourchet <[email protected]> Date: Wed Jun 5 14:09:37 2024 -0600 KAFKA-15045: (KIP-924 pt. 17) State store computation fixed (apache#16194) Fixed the calculation of the store name list based on the subtopology being accessed. Also added a new test to make sure this new functionality works as intended. Reviewers: Anna Sophie Blee-Goldman <[email protected]> commit 52514a8 Author: Greg Harris <[email protected]> Date: Wed Jun 5 11:35:32 2024 -0700 KAFKA-16858: Throw DataException from validateValue on array and map schemas without inner schemas (apache#16161) Signed-off-by: Greg Harris <[email protected]> Reviewers: Chris Egerton <[email protected]> commit f2aafcc Author: Sanskar Jhajharia <[email protected]> Date: Wed Jun 5 20:06:01 2024 +0530 MINOR: Cleanups in Shell Module (apache#16204) Reviewers: Chia-Ping Tsai <[email protected]> commit bd9d68f Author: Abhijeet Kumar <[email protected]> Date: Wed Jun 5 19:12:25 2024 +0530 KAFKA-15265: Integrate RLMQuotaManager for throttling fetches from remote storage (apache#16071) Reviewers: Kamal Chandraprakash<[email protected]>, Luke Chen <[email protected]>, Satish Duggana <[email protected]> commit 62e5cce Author: gongxuanzhang <[email protected]> Date: Wed Jun 5 18:57:32 2024 +0800 KAFKA-10787 Update spotless version and remove support JDK8 (apache#16176) Reviewers: Chia-Ping Tsai <[email protected]> commit 02c794d Author: Kamal Chandraprakash <[email protected]> Date: Wed Jun 5 12:12:23 2024 +0530 KAFKA-15776: Introduce remote.fetch.max.timeout.ms to configure DelayedRemoteFetch timeout (apache#14778) KIP-1018, part1, Introduce remote.fetch.max.timeout.ms to configure DelayedRemoteFetch timeout Reviewers: Luke Chen <[email protected]> commit 7ddfa64 Author: Dongnuo Lyu <[email protected]> Date: Wed Jun 5 02:08:38 2024 -0400 MINOR: Adjust validateOffsetCommit/Fetch in ConsumerGroup to ensure compatibility with classic protocol members (apache#16145) During online migration, there could be ConsumerGroup that has members that uses the classic protocol. In the current implementation, `STALE_MEMBER_EPOCH` could be thrown in ConsumerGroup offset fetch/commit validation but it's not supported by the classic protocol. Thus this patch changed `ConsumerGroup#validateOffsetCommit` and `ConsumerGroup#validateOffsetFetch` to ensure compatibility. Reviewers: Jeff Kim <[email protected]>, David Jacot <[email protected]> commit 252c1ac Author: Apoorv Mittal <[email protected]> Date: Wed Jun 5 05:55:24 2024 +0100 KAFKA-16740: Adding skeleton code for Share Fetch and Acknowledge RPC (KIP-932) (apache#16184) The PR adds skeleton code for Share Fetch and Acknowledge RPCs. The changes include: 1. Defining RPCs in KafkaApis.scala 2. Added new SharePartitionManager class which handles the RPCs handling 3. Added SharePartition class which manages in-memory record states and for fetched data. Reviewers: David Jacot <[email protected]>, Andrew Schofield <[email protected]>, Manikumar Reddy <[email protected]> commit b89999b Author: PoAn Yang <[email protected]> Date: Wed Jun 5 08:02:52 2024 +0800 KAFKA-16483: Remove preAppendErrors from createPutCacheCallback (apache#16105) The method createPutCacheCallback has a input argument preAppendErrors. It is used to keep the "error" happens before appending. However, it is always empty. Also, the pre-append error is handled before createPutCacheCallback by calling responseCallback. Hence, we can remove preAppendErrors. Signed-off-by: PoAn Yang <[email protected]> Reviewers: Luke Chen <[email protected]> commit 01e9918 Author: Kuan-Po (Cooper) Tseng <[email protected]> Date: Wed Jun 5 07:56:18 2024 +0800 KAFKA-16814 KRaft broker cannot startup when `partition.metadata` is missing (apache#16165) When starting up kafka logManager, we'll check stray replicas to avoid some corner cases. But this check might cause broker unable to startup if partition.metadata is missing because when startup kafka, we load log from file, and the topicId of the log is coming from partition.metadata file. So, if partition.metadata is missing, the topicId will be None, and the LogManager#isStrayKraftReplica will fail with no topicID error. The partition.metadata missing could be some storage failure, or another possible path is unclean shutdown after topic is created in the replica, but before data is flushed into partition.metadata file. This is possible because we do the flush in async way here. When finding a log without topicID, we should treat it as a stray log and then delete it. Reviewers: Luke Chen <[email protected]>, Gaurav Narula <[email protected]> commit d652f5c Author: TingIāu "Ting" Kì <[email protected]> Date: Wed Jun 5 07:52:06 2024 +0800 MINOR: Add topicIds and directoryIds to the return value of the toString method. (apache#16189) Add topicIds and directoryIds to the return value of the toString method. Reviewers: Luke Chen <[email protected]> commit 7e0caad Author: Igor Soarez <[email protected]> Date: Tue Jun 4 22:12:33 2024 +0100 MINOR: Cleanup unused references in core (apache#16192) Reviewers: Chia-Ping Tsai <[email protected]> commit 9821aca Author: PoAn Yang <[email protected]> Date: Wed Jun 5 05:09:04 2024 +0800 MINOR: Upgrade gradle from 8.7 to 8.8 (apache#16190) Reviewers: Chia-Ping Tsai <[email protected]> commit 9ceed8f Author: Colin P. McCabe <[email protected]> Date: Tue Jun 4 14:04:59 2024 -0700 KAFKA-16535: Implement AddVoter, RemoveVoter, UpdateVoter RPCs Implement the add voter, remove voter, and update voter RPCs for KIP-853. This is just adding the RPC handling; the current implementation in RaftManager just throws UnsupportedVersionException. Reviewers: Andrew Schofield <[email protected]>, José Armando García Sancio <[email protected]> commit 8b3c77c Author: TingIāu "Ting" Kì <[email protected]> Date: Wed Jun 5 04:21:20 2024 +0800 KAFKA-15305 The background thread should try to process the remaining task until the shutdown timer is expired. (apache#16156) Reviewers: Lianet Magrans <[email protected]>, Chia-Ping Tsai <[email protected]> commit cda2df5 Author: Kamal Chandraprakash <[email protected]> Date: Wed Jun 5 00:41:30 2024 +0530 KAFKA-16882 Migrate RemoteLogSegmentLifecycleTest to ClusterInstance infra (apache#16180) - Removed the RemoteLogSegmentLifecycleManager - Removed the TopicBasedRemoteLogMetadataManagerWrapper, RemoteLogMetadataCacheWrapper, TopicBasedRemoteLogMetadataManagerHarness and TopicBasedRemoteLogMetadataManagerWrapperWithHarness Reviewers: Kuan-Po (Cooper) Tseng <[email protected]>, Chia-Ping Tsai <[email protected]> commit 2b47798 Author: Chris Egerton <[email protected]> Date: Tue Jun 4 21:04:34 2024 +0200 MINOR: Fix return tag on Javadocs for consumer group-related Admin methods (apache#16197) Reviewers: Greg Harris <[email protected]>, Chia-Ping Tsai <[email protected]>

…schemas without inner schemas (apache#16161) Signed-off-by: Greg Harris <[email protected]> Reviewers: Chris Egerton <[email protected]>

gharris1727 added 2 commits May 31, 2024 16:23

KAFKA-16858: Add tests for null array schemas

13870ad

Signed-off-by: Greg Harris <[email protected]>

KAFKA-16858: Throw DataException from validateValue when array and ma…

963a34c

…p schemas don't have inner schemas Signed-off-by: Greg Harris <[email protected]>

C0urante reviewed Jun 4, 2024

View reviewed changes

mimaison added the connect label Jun 4, 2024

C0urante reviewed Jun 4, 2024

View reviewed changes

fixup: change exception message field name context to generic location

554f648

Signed-off-by: Greg Harris <[email protected]>

C0urante approved these changes Jun 4, 2024

View reviewed changes

gharris1727 merged commit 52514a8 into apache:trunk Jun 5, 2024
1 check failed

gharris1727 added a commit that referenced this pull request Jun 5, 2024

KAFKA-16858: Throw DataException from validateValue on array and map …

c1accdb

…schemas without inner schemas (#16161) Signed-off-by: Greg Harris <[email protected]> Reviewers: Chris Egerton <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-16858: Throw DataException from validateValue on array and map schemas without inner schemas #16161

KAFKA-16858: Throw DataException from validateValue on array and map schemas without inner schemas #16161

gharris1727 commented May 31, 2024

gharris1727 commented Jun 3, 2024

C0urante left a comment

gharris1727 commented Jun 4, 2024

C0urante left a comment

C0urante Jun 4, 2024

gharris1727 Jun 4, 2024

C0urante left a comment

gharris1727 commented Jun 5, 2024

KAFKA-16858: Throw DataException from validateValue on array and map schemas without inner schemas #16161

KAFKA-16858: Throw DataException from validateValue on array and map schemas without inner schemas #16161

Conversation

gharris1727 commented May 31, 2024

Committer Checklist (excluded from commit message)

gharris1727 commented Jun 3, 2024

C0urante left a comment

Choose a reason for hiding this comment

gharris1727 commented Jun 4, 2024

C0urante left a comment

Choose a reason for hiding this comment

C0urante Jun 4, 2024

Choose a reason for hiding this comment

gharris1727 Jun 4, 2024

Choose a reason for hiding this comment

C0urante left a comment

Choose a reason for hiding this comment

gharris1727 commented Jun 5, 2024