forked from open-telemetry/opentelemetry-collector-contrib
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[exporter/loadbalancing] Add top level sending_queue, retry_on_failur…
…e and timeout settings (open-telemetry#36094) #### Description ##### Problem statement `loadbalancing` exporter is actually a wrapper that's creates and manages set of actual `otlp` exporters Those `otlp` exporters technically shares same configuration parameters that are defined on `loadbalancing` exporter level, including `sending_queue` configuration. The only difference is `endpoint` parameter that are substituted by `loadbalancing` exporter itself This means, that `sending_queue`, `retry_on_failure` and `timeout` settings can be defined only on `otlp` sub-exporters, while top-level `loadbalancing` exporter is missing all those settings This configuration approach produces several issue, that are already reported by users: * Impossibility to use Persistent Queue in `loadbalancing` exporter (see open-telemetry#16826). That's happens because `otlp` sub-exporters are sharing the same configurations, including configuration of the queue, i.e. they all are using the same `storage` instance at the same time which is not possible at the moment * Data loss even using `sending_queue` configuration (see open-telemetry#35378). That's happens because Queue is defined on level of `otlp` sub-exporters and if this exporter cannot flush data from queue (for example, endpoint is not available anymore) there is no other options that just to discard data from queue, i.e. there is no higher level queue and persistent storage where data can be returned is case of permanent failure There might be some other potential issue that was already tracked and related to current configuration approach ##### Proposed solution The easiest way to solve issues above - is to use standard approach for queue, retry and timeout configuration using `exporterhelper` This will bring queue, retry and timeout functionality to the top-level of `loadbalancing` exporter, instead of `otlp` sub-exporters Related to mentioned issues it will bring: * Single Persistent Queue, that is used by all `otlp` sub-exporters (not directly of course) * Queue will not be discarded/destroyed if any (or all) of endpoint that are unreachable anymore, top-level queue will keep data until new endpoints will be available * Scale-up and scale-down event for next layer of OpenTelemetry Collectors in K8s environments will be more predictable, and will not include data loss anymore (potential fix for open-telemetry#33959). There is still a big chance of inconsistency when some data will be send to incorrect endpoint, but it's already better state that we have right now ##### Noticeable changes * `loadbalancing` exporter on top-level now uses `exporterhelper` with all supported functionality by it * `sending_queue` will be automatically disabled on `otlp` exporters when it already present on top-level `loadbalancing` exporter. This change is done to prevent data loss on `otlp` exporters because queue there doesn't provide expected result. Also it will prevent potential misconfiguration from user side and as result - irrelevant reported issues * `exporter` attribute for metrics generated from `otlp` sub-exporters now includes endpoint for better visibility and to segregate them from top-level `loadbalancing` exporter - was `"exporter": "loadbalancing"`, now `"exporter": "loadbalancing/127.0.0.1:4317"` * logs, generated by `otlp` sub-exporters now includes additional attribute `endpoint` with endpoint value with the same reasons as for metrics #### Link to tracking issue Fixes open-telemetry#35378 Fixes open-telemetry#16826 #### Testing Proposed changes was heavily tested on large K8s environment with set of different scale-up/scale-down event using persistent queue configuration - no data loss were detected, everything works as expected #### Documentation `README.md` was updated to reflect new configuration parameters available. Sample `config.yaml` was updated as well
- Loading branch information
1 parent
d35866a
commit 0b15a2b
Showing
12 changed files
with
354 additions
and
89 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Use this changelog template to create an entry for release notes. | ||
|
||
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | ||
change_type: enhancement | ||
|
||
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver) | ||
component: loadbalancingexporter | ||
|
||
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
note: Adding sending_queue, retry_on_failure and timeout settings to loadbalancing exporter configuration | ||
|
||
# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. | ||
issues: [35378,16826] | ||
|
||
# (Optional) One or more lines of additional information to render under the primary note. | ||
# These lines will be padded with 2 spaces and then inserted directly into the document. | ||
# Use pipe (|) for multiline entries. | ||
subtext: | | ||
When switching to top-level sending_queue configuration - users should carefully review queue size | ||
In some rare cases setting top-level queue size to n*queueSize might be not enough to prevent data loss | ||
# If your change doesn't affect end users or the exported elements of any package, | ||
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label. | ||
# Optional: The change log or logs in which this entry should be included. | ||
# e.g. '[user]' or '[user, api]' | ||
# Include 'user' if the change is relevant to end users. | ||
# Include 'api' if there is a change to a library API. | ||
# Default: '[user]' | ||
change_logs: [user] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.