Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throttling / backpressure and correct error handling to OTLP receiver #669

Open
tigrannajaryan opened this issue Mar 20, 2020 · 6 comments
Labels
area:receiver help wanted Good issue for contributors to OpenTelemetry Service to pick up
Milestone

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Mar 20, 2020

OTLP must throttle the clients when it receives a backpressure from the pipeline (e.g. memory_limiter).

Must comply with OTLP spec and use gRPC status codes + RetryInfo (see the spec).

Also currently does not seem to correctly handle non-permanent errors returned by next consumer. We return 500, which according to OTLP spec means client will not retry. This means we will lose data.

Instead for retryable errors we must return 503 and for permanent errors must return 400 or similar.

@tigrannajaryan
Copy link
Member Author

@pjanotti can you please add details here about how exactly receivers are informed about backpressure?

@tigrannajaryan tigrannajaryan added the help wanted Good issue for contributors to OpenTelemetry Service to pick up label May 13, 2020
@flands flands added this to the GA 1.0 milestone Jun 18, 2020
@tigrannajaryan
Copy link
Member Author

This is a nice-to-have, removing from 1.0.

@tigrannajaryan tigrannajaryan modified the milestones: GA 1.0, Backlog Oct 19, 2020
@tigrannajaryan
Copy link
Member Author

We also need to test this for OTLP receiver's HTTP transport and ensure we return the correct backpressure code according to OTLP/HTTP spec.

Preferably these all should have automated tests.

@punya
Copy link
Member

punya commented Jul 29, 2021

Is this specific to OTLP, or does it apply to other push-based receivers too?

@tigrannajaryan
Copy link
Member Author

Briefly checked the source code and what we need to do is check that the error returned from the consumer is PermanentError (which can happen if connected directly to exporter) and return a BadData to the sender.

In all other cases we should return a "backoff" response to the sender (real-life use cases are either errors from memory limiter or errors from queued_retry when the queue is fill). If the error is throttleRetry error we can also extract and use the retry interval from throttleRetry.

All others receivers need to behave similarly. Not sure if we can implement some sort of helper (likely doable at least for all http-based receivers or for all grpc-based receivers).

MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
…try#669)

* Reimplement histogram using mutex instead of stateLocker

Move existing implementation to histogram_statelocker.go. Implement
benchmarks for single thread and parallel histogram updates comparing
mutex version to stateLocker version

* Drop statelocker implementation and alignment tests, benchmarks

Co-authored-by: Joshua MacDonald <[email protected]>
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
Fixes open-telemetry#657

With the changes in open-telemetry#667 and open-telemetry#669 to use a plain-old-mutex for
concurrent access of Histogram and MinMaxSumCount aggregators,
the StateLocker implementation is no longer used in the project.

Co-authored-by: Joshua MacDonald <[email protected]>
@tigrannajaryan tigrannajaryan changed the title Add throttling / backpressure to OTLP receiver Add throttling / backpressure and correct error handling to OTLP receiver Mar 31, 2023
@tigrannajaryan
Copy link
Member Author

Related or duplicate of #4335 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:receiver help wanted Good issue for contributors to OpenTelemetry Service to pick up
Projects
None yet
Development

No branches or pull requests

3 participants