Consumption stopped with slow consumer #386

abialas · 2024-03-07T11:43:01Z

I have a simple but slow consumer which consumes 1 record at time:

  private Flux<Void> consumeRecords(Flux<ReceiverRecord<String, Value>> records) {
    return records
        .concatMap(receiverRecord -> handleReceivedRecord(receiverRecord))
        .concatMap(this::ackOffset)
        .doOnEach(logOnError(logErrorReceiveRecord()))
        .retryWhen(RETRY_FAILED_CONSUMER_SPEC);
  }

Processing time of method handleReceivedRecord is less than 500ms. I understand this consumer is slow and needs to be fixed (because of concurrency).
However, in my test I produce just about 3000 records in 1 minute to the topic the above consumer is consuming from. Initially it consumes fine but after some time I see consumer is not consuming anymore. There is no error log or similar.

In the logs I see such messages:

Rebalance during back pressure, re-pausing new assignments
Rebalancing; waiting for 104 records in pipeline

and I have to restart consumer instance to fix this. It is also worth to mention that when I disable scaling up of consumers it works fine.

Expected Behavior

Consuming records from topic should not stop.

Actual Behavior

Consuming records from topic is stuck and restart is required.

Your Environment

Reactor Core: 3.5.10
Reactor Kafka: 1.3.18
JVM version (java -version): 21.0.2

The text was updated successfully, but these errors were encountered:

3vl · 2024-07-16T15:31:48Z

I am seeing similar behavior in 1.3.23. It seems to happen intermittently. I think this happens after a rebalance?

It looks like the partitions have been paused and never resumed. I added an endpoint that allows me to see paused partitions. When consumption stops I can see that the partitions are paused. If I use another endpoint to force them to resume the consumption starts again.

@abialas, can you reproduce consistently or is it an intermittent problem like I am seeing.

3vl · 2024-07-16T19:45:15Z

I don't see how the partitions paused on this line are resumed

reactor-kafka/src/main/java/reactor/kafka/receiver/internals/ConsumerEventLoop.java

Line 248 in 2ae3abb

consumer.pause(partitions);

The only place I see a resume Is for the partitions in pausedByUs and these aren't added to that collection.

reactorbot added the ❓need-triage This issue needs triage, hasn't been looked at by a team member yet label Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumption stopped with slow consumer #386

Consumption stopped with slow consumer #386

abialas commented Mar 7, 2024 •

edited

Loading

3vl commented Jul 16, 2024

3vl commented Jul 16, 2024

Consumption stopped with slow consumer #386

Consumption stopped with slow consumer #386

Comments

abialas commented Mar 7, 2024 • edited Loading

Expected Behavior

Actual Behavior

Your Environment

3vl commented Jul 16, 2024

3vl commented Jul 16, 2024

abialas commented Mar 7, 2024 •

edited

Loading