Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in v0.109 when running collector in deployment mode #35344

Open
tomeroszlak opened this issue Sep 23, 2024 · 3 comments
Open

Memory leak in v0.109 when running collector in deployment mode #35344

tomeroszlak opened this issue Sep 23, 2024 · 3 comments
Labels
bug Something isn't working Stale

Comments

@tomeroszlak
Copy link

tomeroszlak commented Sep 23, 2024

Component(s)

cmd/otelcontribcol

What happened?

Description
We recently upgraded our OpenTelemetry collector from v0.94.0 to v0.109.0 and are running it as a deployment behind an NGINX ingress. We’ve observed that memory usage spikes to 80% of the pod's available memory within five minutes and does not decrease.

Upon reviewing the metrics for failed log records (otelcol_exporter_send_failed_log_records{}), we noticed two exporters—{exporter="otlphttp"} and {exporter="otlp"}—which are not defined in our configuration but are continuously dropping logs.

Additionally, it appears that the memory_limiter is not updating GOMEMLIMIT to 100%; instead, it remains fixed at 80%.

Steps to Reproduce
Expected Result
The garbage collector should free up memory, preventing the pod from being stuck at 80%.

Actual Result
The pod reaches 80% of the available memory and remains at that level.

Collector version

v0.109.0

Environment information

Environment

running on K8S v1.28 as a deployment.

OpenTelemetry Collector configuration

Log output

2024/09/22 22:30:19 http: superfluous response.WriteHeader call from go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp/internal/request.(*RespWriterWrapper).writeHeader (resp_writer_wrapper.go:78)

Additional context

No response

@tomeroszlak tomeroszlak added bug Something isn't working needs triage New item requiring triage labels Sep 23, 2024
@ChrsMark
Copy link
Member

It would be super helpful if you could enable the pprofextension and gather some heap dumps. This would help spot what the potential "leaking" components are. Also please provide the full Collector's configuration.

@atoulme
Copy link
Contributor

atoulme commented Oct 2, 2024

Please upgrade to the latest release to remove the superfluous logs. Please provide the complete configuration of the collector, masking the passwords and confidential information, and follow @ChrsMark 's lead on using the pprofextension to collect more data.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 12, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

3 participants