[Bug]: KafkaIO SplittableDoFn not resuming from last committed offset #21730

jeanwisser · 2022-06-07T09:24:23Z

What happened?

Using KafkaIO with ReadFromKafkaDoFn.java and commitOffsetsFinalize should commit offsets of processed messages and if the pipeline is restarted, should resume from the last committed offset.

While committing the offset works correctly, resuming from the latest committed offset does not work because the groupId used are not the same (using 2.39.0).

Committing the offsets happens in KafkaCommitOffset.java with a consumer defined as:
consumerFactoryFn.apply(updatedConsumerConfig)
Reading the startOffset is done in initialRestriction() in ReadFromKafkaDoFn.java with a consumer defined as:
consumerFactoryFn.apply( KafkaIOUtils.getOffsetConsumerConfig( "initialOffset", offsetConsumerConfig, updatedConsumerConfig)))

Since the group name of the consumer is not the same as the group name of the consumer fetching the latest offset, the pipeline will always start from the beginning of the (topic,partition) again.

Issue Priority

Priority: 2

Issue Component

Component: io-java-kafka

The text was updated successfully, but these errors were encountered:

chamikaramj · 2022-06-09T17:47:45Z

@johnjcasey I believe you are looking into a similar issue ?

johnjcasey · 2022-06-09T18:01:13Z

I am, you can assign this to me

chamikaramj · 2022-06-09T18:32:57Z

Done. Feel free mark as a duplicate if needed.

kennknowles · 2022-09-13T23:12:15Z

Doesn't this mean that some stated functionality just isn't working? Is there a chance of data corruption if the user doesn't notice? That would make this P1 and release blocking, really.

chamikaramj · 2022-09-13T23:18:19Z

I think this was fixed by #22450

johnjcasey · 2022-09-14T14:59:22Z

It was

Abacn · 2022-12-22T18:41:47Z

There is still a reference to this issue in our code: Update: the reason is here:

beam/sdks/python/apache_beam/io/kafka.py

Line 191 in eb23b0a

append_args=['--experiments=use_unbounded_sdf_wrapper']))

Since it is fixed by #22450 is it safe to remove append_args=['--experiments=use_unbounded_sdf_wrapper'] below that line?

* Fix kafka run out of resource upon increase number of records for streaming test * Remove 'use_unbounded_sdf_wrapper' since apache#21730 is resolved * Fix assert expected and actual misplace * Adjust string pipeline timeout 1800s -> 1500s suffice * Fix Java test fail whenever a write bundle fail * Use streaming pipeline to read in Python

jeanwisser added awaiting triage bug labels Jun 7, 2022

github-actions bot added io-java-kafka P2 labels Jun 7, 2022

chamikaramj removed the awaiting triage label Jun 9, 2022

chamikaramj assigned johnjcasey Jun 9, 2022

damccorm added io java kafka and removed io-java-kafka labels Jun 16, 2022

chamikaramj mentioned this issue Jul 14, 2022

Enables UnboundedSource wrapped SDF Kafka source by default for Python x-lang #22286

Merged

4 tasks

lostluck mentioned this issue Jul 15, 2022

[New Feature]: Be able to Pass additional args to automated expansion service startup [Go SDK] #22302

Open

johnjcasey mentioned this issue Jul 15, 2022

[Task]: Add tests to Kafka SDF and fix known and discovered issues #22303

Closed

johnjcasey mentioned this issue Jul 26, 2022

21730 fix offset resetting #22450

Merged

4 tasks

chamikaramj closed this as completed in #22450 Jul 27, 2022

Abacn mentioned this issue Dec 22, 2022

Add Python xlang KafkaIO performance test #24633

Merged

3 tasks

chamikaramj mentioned this issue Mar 1, 2023

Reverts Python Kafka source to use the Java SDF-based implementation #25684

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: KafkaIO SplittableDoFn not resuming from last committed offset #21730

[Bug]: KafkaIO SplittableDoFn not resuming from last committed offset #21730

jeanwisser commented Jun 7, 2022 •

edited

Loading

chamikaramj commented Jun 9, 2022

johnjcasey commented Jun 9, 2022

chamikaramj commented Jun 9, 2022

kennknowles commented Sep 13, 2022

chamikaramj commented Sep 13, 2022

johnjcasey commented Sep 14, 2022

Abacn commented Dec 22, 2022

[Bug]: KafkaIO SplittableDoFn not resuming from last committed offset #21730

[Bug]: KafkaIO SplittableDoFn not resuming from last committed offset #21730

Comments

jeanwisser commented Jun 7, 2022 • edited Loading

What happened?

Issue Priority

Issue Component

chamikaramj commented Jun 9, 2022

johnjcasey commented Jun 9, 2022

chamikaramj commented Jun 9, 2022

kennknowles commented Sep 13, 2022

chamikaramj commented Sep 13, 2022

johnjcasey commented Sep 14, 2022

Abacn commented Dec 22, 2022

jeanwisser commented Jun 7, 2022 •

edited

Loading