Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/loadbalancing] couldn't find the exporter for the endpoint bug #35153

Open
Frapschen opened this issue Sep 12, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working exporter/loadbalancing

Comments

@Frapschen
Copy link
Contributor

Component(s)

exporter/loadbalancing

What happened?

Description

Using this exporter encounters a bug. I have tried dsn and k8s resolver; both cause bugs.

use dns resolver, config:

  loadbalancing/jaeger:
    routing_key: "traceID"
    protocol:
      otlp:
        # all options from the OTLP exporter are supported, except the endpoint
        timeout: 1s
        tls:
          insecure: true
        sending_queue:
          enabled: true
          storage: file_storage/all_settings
        retry_on_failure:
          enabled: true
          max_elapsed_time: 500s
    resolver:
      dns:
        hostname: jaeger-collector-headless.insight-system.svc.cluster.local
        port: "4317"
        interval: 1s
        timeout: 200ms

It's error logs:
img_v3_02el_0ff18611-890a-42bf-b8aa-598edaf5394g

using k8s resolver config:

  loadbalancing/jaeger:
    routing_key: "traceID"
    protocol:
      otlp:
        # all options from the OTLP exporter are supported, except the endpoint
        timeout: 1s
        tls:
          insecure: true
        sending_queue:
          enabled: true
          storage: file_storage/all_settings
        retry_on_failure:
          enabled: true
          max_elapsed_time: 500s
    resolver:
      k8s:
        service: insight-jaeger-collector
        ports:
          - 4317

It's error logs:
image

I have noticed this:

// something is really wrong... how come we couldn't find the exporter??

It also shows that something is wrong with the code.

Collector version

v0.109.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@Frapschen Frapschen added bug Something isn't working needs triage New item requiring triage labels Sep 12, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented Nov 9, 2024

Please provide your complete configuration file and indicate the version of the collector used.

@atoulme atoulme added waiting for author and removed needs triage New item requiring triage labels Nov 9, 2024
@jpkrohling jpkrohling self-assigned this Dec 2, 2024
@jpkrohling
Copy link
Member

jpkrohling commented Dec 2, 2024

It looks like I can reproduce this with a fairly simple config, like:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otelcol-loadbalancer
spec:
  image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.114.0
  config:
    receivers:
      otlp:
        protocols:
          grpc: {}

    exporters:
      loadbalancing:
        protocol:
          otlp:
            timeout: 1s
            tls:
              insecure: true
            sending_queue:
              enabled: true
              storage: file_storage/all_settings
            retry_on_failure:
              enabled: true
              max_elapsed_time: 500s
        resolver:
          dns:
            hostname: otelcol-backend-collector-headless

    service:
      extensions: [ ]
      pipelines:
        traces:
          receivers:  [ otlp ]
          processors: [  ]
          exporters:  [ loadbalancing ]

And with this backend:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otelcol-backend
spec:
  image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.114.0
  replicas: 10
  config:
    receivers:
      otlp:
        protocols:
          grpc: {}

    exporters:
      debug: {}

    service:
      extensions: [ ]
      pipelines:
        traces:
          receivers:  [ otlp  ]
          processors: [  ]
          exporters:  [ debug ]

This seems to be connected to the persistent sending queue, as removing it confirms that it works.
When everything is working, the following metrics are available:

# HELP otelcol_loadbalancer_backend_outcome Number of successes and failures for each endpoint.
# TYPE otelcol_loadbalancer_backend_outcome counter
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.31:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.32:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.33:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.34:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.35:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.36:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.37:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.38:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.39:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261
otelcol_loadbalancer_backend_outcome{endpoint="10.42.0.40:4317",service_instance_id="19534c16-77db-4e3d-892a-82b65ed1db09",service_name="otelcol-contrib",service_version="0.114.0",success="true"} 2261

With the sending queue, all spans end up being refused, like this:

otelcol_receiver_refused_spans{receiver="otlp",service_instance_id="e3fc5786-a50e-4df2-bd6f-71167e37a26e",service_name="otelcol-contrib",service_version="0.114.0",transport="grpc"} 1536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/loadbalancing
Projects
None yet
Development

No branches or pull requests

3 participants