Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent sends malformed requests for logs collected by opentelemetry input source to opensearch output #9679

Open
nrmilstein opened this issue Dec 2, 2024 · 0 comments

Comments

@nrmilstein
Copy link

nrmilstein commented Dec 2, 2024

Bug Report

Describe the bug

If I configure fluentbit with an opentelemetry input and an opensearch output, then fluent-bit's requests to opensearch fail with:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\\n]"}],"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\\n]"},"status":400}

To Reproduce

Using the following docker-compose:

---
services:
  fluent-bit:
    image: fluent/fluent-bit:latest-debug
    volumes:
      - ./fluent-bit.yaml:/fluent-bit/etc/fluent-bit.yaml
    ports:
      - 4318:4318
    depends_on:
      - opensearch-node1
      - opensearch-node2
    command: [ "/fluent-bit/bin/fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
    networks:
      - opensearch-net

  opensearch-node1:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node1
    environment:
      - logger.org.opensearch.discovery=TRACE
      - plugins.security.ssl.http.enabled=false
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node1
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
      - OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=dGEtFi4oXWtGy_brZcLX # Sets the demo admin user password when using demo configuration, required for OpenSearch 2.12 and higher
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data
    ports:
      - 9200:9200
      - 9600:9600 # required for Performance Analyzer
    networks:
      - opensearch-net
  opensearch-node2:
    image: opensearchproject/opensearch:latest
    container_name: opensearch-node2
    environment:
      - logger.org.opensearch.discovery=TRACE
      - plugins.security.ssl.http.enabled=false
      - cluster.name=opensearch-cluster
      - node.name=opensearch-node2
      - discovery.seed_hosts=opensearch-node1,opensearch-node2
      - cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
      - bootstrap.memory_lock=true
      - OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m
      - OPENSEARCH_INITIAL_ADMIN_PASSWORD=dGEtFi4oXWtGy_brZcLX
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - opensearch-data2:/usr/share/opensearch/data
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:


networks:
  opensearch-net:

And the following fluent-bit config:

service:
  log_level: trace
pipeline:
  inputs:
    - name: opentelemetry

  outputs:
    - name: opensearch
      host: opensearch-node1
      tls: false
      match: '*'
      index: myindex
      http_user: admin
      # dev credentials
      http_passwd: "dGEtFi4oXWtGy_brZcLX"
      suppress_type_name: true
      trace_output: true
    - name: stdout
      match: '*'

If I log to fluent-bit via OTLP, then I receive the following error from fluent-bit:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\\n]"}],"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\\n]"},"status":400}

If I examine the HTTP request, I see it is malformed. Whereas a normal opensearch request from fluentBit (e.g. from the cpu input) might look like:

POST /_bulk HTTP/1.1
Host: opensearch-node1:9200
Content-Length: 686
Content-Type: application/x-ndjson
Authorization: Basic YWRtaW46ZEdFdEZpNG9YV3RHeV9iclpjTFg=
User-Agent: Fluent-Bit
Connection: keep-alive

{"create":{"_index":"myindex"}}
{"@timestamp":"2024-12-02T21:52:19.661Z","cpu_p":56.6,"user_p":55.3,"system_p":1.3,"cpu0.p_cpu":61.0,"cpu0.p_user":60.0,"cpu0.p_system":1.0,"cpu1.p_cpu":57.0,"cpu1.p_user":56.0,"cpu1.p_system":1.0,"cpu2.p_cpu":99.0,"cpu2.p_user":96.0,"cpu2.p_system":3.0,"cpu3.p_cpu":64.0,"cpu3.p_user":63.0,"cpu3.p_system":1.0,"cpu4.p_cpu":49.0,"cpu4.p_user":49.0,"cpu4.p_system":0.0,"cpu5.p_cpu":55.0,"cpu5.p_user":55.0,"cpu5.p_system":0.0,"cpu6.p_cpu":56.0,"cpu6.p_user":53.0,"cpu6.p_system":3.0,"cpu7.p_cpu":19.0,"cpu7.p_user":17.0,"cpu7.p_system":2.0,"cpu8.p_cpu":54.0,"cpu8.p_user":53.0,"cpu8.p_system":1.0,"cpu9.p_cpu":51.0,"cpu9.p_user":51.0,"cpu9.p_system":0.0}

Ones made from the opentelemetry input only have the payload data (without the first "create" line) and are not newline-terminated:

POST /_bulk HTTP/1.1
Host: opensearch-node1:9200
Content-Length: 1830
Content-Type: application/x-ndjson
Authorization: Basic YWRtaW46ZEdFdEZpNG9YV3RHeV9iclpjTFg=
User-Agent: Fluent-Bit
Connection: keep-alive

{ "resourceSpans": [ { "resource": { "attributes": { "service.version": "0.1.0", "service.name": "my-service", "deployment.environment.name": "develop" }, "dropped_attributes_count": 0 }, "schema_url": "https://opentelemetry.io/schemas/1.28.0", "scope_spans": [ { "scope": { "name": "tracing-otel-subscriber", "version": "", "attributes": {}, "dropped_attributes_count": 0 }, "spans": [ { "trace_id": "bed77d0d777c78a8f778423e0eab21a9", "span_id": "0ac24914dae204ce", "parent_span_id": "987702f35578d254", "trace_state": null, "name": "inter span", "kind": 1, "start_time_unix_nano": 1733176461628327000, "end_time_unix_nano": 1733176461628368000, "attributes": { "code.filepath": "src/main.rs", "code.namespace": "my_service", "code.lineno": 58, "thread.id": 1, "thread.name": "main", "busy_ns": 35375, "idle_ns": 10625 }, "dropped_attributes_count": 0, "events": [ { "time_unix_nano": 1733176461628356000, "name": "Inner span", "attributes": { "level": "INFO", "target": "my_service", "code.filepath": "src/main.rs", "code.namespace": "my_service", "code.lineno": 60 }, "dropped_attributes_count": 0 } ], "links": [], "status": { "code": 0, "message": "" } }, { "trace_id": "bed77d0d777c78a8f778423e0eab21a9", "span_id": "987702f35578d254", "parent_span_id": null, "trace_state": null, "name": "outer span", "kind": 1, "start_time_unix_nano": 1733176461628119000, "end_time_unix_nano": 1733176461628438000, "attributes": { "code.filepath": "src/main.rs", "code.namespace": "my_service", "code.lineno": 53, "thread.id": 1, "thread.name": "main", "busy_ns": 192750, "idle_ns": 129792 }, "dropped_attributes_count": 0, "events": [ { "time_unix_nano": 1733176461628287000, "name": "Outer span", "attributes": { "level": "INFO", "target": "my_service", "code.filepath": "src/main.rs", "code.namespace": "my_service", "code.lineno": 55 }, "dropped_attributes_count": 0 } ], "links": [], "status": { "code": 0, "message": "" } } ], "schema_url": "https://opentelemetry.io/schemas/1.28.0" } ] } ] }

I am using the Rust tracing-opentelemetry library to log via OTLP.

Your Environment

  • Version used: 3.2.2
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version:
  • Operating System and version: macos
  • Filters and plugins:

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant