Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: KafkaIO Batch Write Failing due to producer cannot commit messages within timeout #24963

Open
2 of 15 tasks
Abacn opened this issue Jan 10, 2023 · 4 comments
Open
2 of 15 tasks

Comments

@Abacn
Copy link
Contributor

Abacn commented Jan 10, 2023

What happened?

Found by Kafka performance test failing after #24879 in. This is because we removed shuffle=appliance there and then read from shuffle becomes faster. However, the producer is unable to digests data within timeout (can be seen by the flooding warning log of send failed : 'Expiring 148 record(s) for beam-sdf-0:130675 ms has passed since batch creation'). However, the message itself does not contain a timestamp and should be tolerant to throttling.

The error happens because producer.send is asynchronous. It adds a system timestamp when .send gets called and returns immediately. We need a mechanism to prevent overwhelming Kafka producer.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Abacn
Copy link
Contributor Author

Abacn commented Jan 10, 2023

Under umbrella issue #22303

@aromanenko-dev
Copy link
Contributor

Kind ping. What is status of this issue?

@Abacn
Copy link
Contributor Author

Abacn commented Mar 16, 2023

@aromanenko-dev unfortunately I do not have a good idea for short term fix. Throttling detection is considered as a long term solution, that is the IO connector has can detect throttling and make runner aware, then runner can prevent scaling up and possibly scaling down.

@aromanenko-dev
Copy link
Contributor

aromanenko-dev commented Mar 16, 2023

@Abacn Yes, throttling detection is a general problem across all IO connectors and runners. So, should we close this one then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants