Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumption stopped with slow consumer #386

Open
abialas opened this issue Mar 7, 2024 · 2 comments
Open

Consumption stopped with slow consumer #386

abialas opened this issue Mar 7, 2024 · 2 comments
Labels
❓need-triage This issue needs triage, hasn't been looked at by a team member yet

Comments

@abialas
Copy link

abialas commented Mar 7, 2024

I have a simple but slow consumer which consumes 1 record at time:

  private Flux<Void> consumeRecords(Flux<ReceiverRecord<String, Value>> records) {
    return records
        .concatMap(receiverRecord -> handleReceivedRecord(receiverRecord))
        .concatMap(this::ackOffset)
        .doOnEach(logOnError(logErrorReceiveRecord()))
        .retryWhen(RETRY_FAILED_CONSUMER_SPEC);
  }

Processing time of method handleReceivedRecord is less than 500ms. I understand this consumer is slow and needs to be fixed (because of concurrency).
However, in my test I produce just about 3000 records in 1 minute to the topic the above consumer is consuming from. Initially it consumes fine but after some time I see consumer is not consuming anymore. There is no error log or similar.

In the logs I see such messages:

Rebalance during back pressure, re-pausing new assignments
Rebalancing; waiting for 104 records in pipeline

and I have to restart consumer instance to fix this. It is also worth to mention that when I disable scaling up of consumers it works fine.

Expected Behavior

Consuming records from topic should not stop.

Actual Behavior

Consuming records from topic is stuck and restart is required.

Your Environment

  • Reactor Core: 3.5.10
  • Reactor Kafka: 1.3.18
  • JVM version (java -version): 21.0.2
@reactorbot reactorbot added the ❓need-triage This issue needs triage, hasn't been looked at by a team member yet label Mar 7, 2024
@3vl
Copy link

3vl commented Jul 16, 2024

I am seeing similar behavior in 1.3.23. It seems to happen intermittently. I think this happens after a rebalance?

It looks like the partitions have been paused and never resumed. I added an endpoint that allows me to see paused partitions. When consumption stops I can see that the partitions are paused. If I use another endpoint to force them to resume the consumption starts again.

@abialas, can you reproduce consistently or is it an intermittent problem like I am seeing.

@3vl
Copy link

3vl commented Jul 16, 2024

I don't see how the partitions paused on this line are resumed

The only place I see a resume Is for the partitions in pausedByUs and these aren't added to that collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❓need-triage This issue needs triage, hasn't been looked at by a team member yet
Projects
None yet
Development

No branches or pull requests

3 participants