Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Exporter incorrectly sends data from filelog receiver to topic (Container operator does not parse the CRI-O logs.) #35686

Closed
xzizka opened this issue Oct 8, 2024 · 11 comments
Assignees

Comments

@xzizka
Copy link

xzizka commented Oct 8, 2024

Component(s)

receiver/filelog

Describe the issue you're reporting

I want to configure Kubernetes log collection and forward them to Kafka.
My testing setup is:

  • K8S 1.29.4
  • CRI-O 1.29.1
  • OpenTelemetry contrib 0.110.0

Log are located in /var/log/pods/. This location is mounted inside of the deamonset pods using the manifest below:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: otel-agent
  namespace: otel-system
  labels:
    app: opentelemetry
    component: otel-agent
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-agent
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-agent
    spec:
      serviceAccountName: otel-collector
      securityContext:
        {}
      containers:
      - name: otel-agent
        command:
          - "/otelcol-contrib"
        args:
          - "--config=/conf/otel-agent-config.yaml"
        image: dockerhub.local/opentelemetry-collector-contrib/opentelemetry-collector-contrib:0.110.0-amd64
        imagePullPolicy: IfNotPresent
        ports:
          - name: otlp
            containerPort: 4317
            protocol: TCP
            hostPort: 4317
          - name: otlp-http
            containerPort: 4318
            protocol: TCP
            hostPort: 4318
          - name: zipkin
            containerPort: 9411
            protocol: TCP
            hostPort: 9411
        env:
          - name: KAFKA_TOPIC_NAME
            valueFrom:
              secretKeyRef:
                key: topicName
                name: kib-logging-secret        
          - name: MY_POD_IP
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: status.podIP
          - name: GOMEMLIMIT
            value: "152MiB"
        livenessProbe:
          httpGet:
            path: /
            port: 13133
        readinessProbe:
          httpGet:
            path: /
            port: 13133
        resources:
          limits:
            cpu: 200m
            memory: 400M
          requests:
            cpu: 100m
            memory: 200M
        securityContext:
          runAsUser: 0
          runAsGroup: 0
        volumeMounts:
          - name: otel-agent-config-vol
            mountPath: /conf
          - mountPath: /var/log
            name: varlog
          - name: kafka-tls
            mountPath: /etc/ca.pem
            subPath: ca.pem
          - name: kafka-tls
            mountPath: /etc/cert.pem
            subPath: cert.pem
          - name: kafka-tls
            mountPath: /etc/cert.key
            subPath: cert.key
      volumes:
        - name: kafka-tls
          secret:
            secretName: kib-logging-secret      
            items:
              - key: cert.key
                path: cert.key
              - key: cert.pem
                path: cert.pem
              - key: ca.pem
                path: ca.pem
        - hostPath:
            path: /var/log
            type: ""
          name: varlog      
        - name: otel-agent-config-vol
          configMap:
            name: otel-agent-conf
            items:
              - key: otel-agent-config
                path: otel-agent-config.yaml
      tolerations:
      - operator: Exists

The daemonset uses SA with the following clusterrole (it is also used for metrics collection):

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
  creationTimestamp: "2024-09-03T14:46:36Z"
  name: otel-collector
  resourceVersion: "15037896"
  uid: 093284eb-9be5-4673-9f0a-89842d527938
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - namespaces
  - services
  - endpoints
  - nodes
  - nodes/proxy
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - apps
  resources:
  - replicasets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - replicasets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/stats
  verbs:
  - get

The confiuration for OTEL collector is following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-agent-conf
  namespace: otel-system
  labels:
    app: opentelemetry
    component: otel-agent-conf
data:
  otel-agent-config: |-
    exporters:
      kafka:
        brokers: ["kafka-logman-ha-test:9093"]
        topic: ${env:KAFKA_TOPIC_NAME}
        encoding: "raw"
        protocol_version: "2.0.0"
        auth:
          tls:
            insecure: false
            insecure_skip_verify: true
            ca_file: /etc/ca.pem    
            cert_file: /etc/cert.pem    
            key_file: /etc/cert.key                            
      debug:
        verbosity: basic       
    extensions:
      health_check:
        endpoint: ${env:MY_POD_IP}:13133
    processors:
      batch:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 20
    receivers:
      filelog:
        start_at: end
        include_file_path: true
        include_file_name: false
        include: 
          - /var/log/pods/*/*/*.log 
        operators:
          - id: container-parser
            type: container
    service:
      extensions:
      - health_check
      pipelines:
        logs:
          exporters:
          - kafka
          - debug
          processors:
          - memory_limiter
          - batch
          receivers:
          - filelog

A part of detailed logs from one of the pods looks like this.

e57toc@vncub17999:~/otel-logs $ klo otel-agent-l4hd6 | head -n 50
2024-10-08T14:05:00.739Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 40, "log records": 3827}
2024-10-08T14:05:01.126Z        info    ResourceLog #0
Resource SchemaURL:
Resource attributes:
     -> k8s.container.restart_count: Str(0)
     -> k8s.pod.uid: Str(31d2cc3d-f415-4397-b94a-54b2242638ba)
     -> k8s.container.name: Str(otel-agent)
     -> k8s.namespace.name: Str(otel-system)
     -> k8s.pod.name: Str(otel-agent-l4hd6)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2024-10-08 14:04:59.83530993 +0000 UTC
Timestamp: 2024-10-08 14:01:37.431483255 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(SeverityText:)
Attributes:
     -> log.file.path: Str(/var/log/pods/otel-system_otel-agent-l4hd6_31d2cc3d-f415-4397-b94a-54b2242638ba/otel-agent/0.log)
     -> log.iostream: Str(stderr)
     -> logtag: Str(F)
Trace ID:
Span ID:
Flags: 0
LogRecord #1
ObservedTimestamp: 2024-10-08 14:04:59.835327215 +0000 UTC
Timestamp: 2024-10-08 14:01:37.431483255 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(SeverityNumber: Unspecified(0))
Attributes:
     -> logtag: Str(F)
     -> log.file.path: Str(/var/log/pods/otel-system_otel-agent-l4hd6_31d2cc3d-f415-4397-b94a-54b2242638ba/otel-agent/0.log)
     -> log.iostream: Str(stderr)
Trace ID:
Span ID:
Flags: 0
LogRecord #2
...

If I switxch back to basic logging I see the following:

e57toc@vncub17999:~/otel-logs $ klo otel-agent-xn8lt
2024-10-08T14:08:16.062Z        info    [email protected]/service.go:137 Setting up own telemetry...
2024-10-08T14:08:16.063Z        info    [email protected]/service.go:186 Skipped telemetry setup.
2024-10-08T14:08:16.063Z        info    builders/builders.go:26 Development component. May change in the future.        {"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-10-08T14:08:16.064Z        info    memorylimiter/memorylimiter.go:151      Using percentage memory limiter {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "total_memory_mib": 381, "limit_percentage": 75, "spike_limit_percentage": 20}
2024-10-08T14:08:16.064Z        info    memorylimiter/memorylimiter.go:75       Memory limiter configured       {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "limit_mib": 286, "spike_limit_mib": 76, "check_interval": 1}
2024-10-08T14:08:16.143Z        info    [email protected]/service.go:208 Starting otelcol-contrib...     {"Version": "0.110.0", "NumCPU": 4}
2024-10-08T14:08:16.143Z        info    extensions/extensions.go:39     Starting extensions...
2024-10-08T14:08:16.143Z        info    extensions/extensions.go:42     Extension is starting...        {"kind": "extension", "name": "health_check"}
2024-10-08T14:08:16.143Z        info    [email protected]/healthcheckextension.go:33        Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"192.168.50.71:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-10-08T14:08:16.143Z        info    extensions/extensions.go:59     Extension started.      {"kind": "extension", "name": "health_check"}
2024-10-08T14:08:16.443Z        info    adapter/receiver.go:47  Starting stanza receiver        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2024-10-08T14:08:16.444Z        info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2024-10-08T14:08:16.444Z        info    [email protected]/service.go:234 Everything is ready. Begin running and processing data.
2024-10-08T14:08:16.646Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-notifications-controller-6c55d68cd9-4x9cd_9bbef952-54fd-43ba-ab7f-0737f796fcba/argocd-notifications-controller/0.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-notifications-controller-6c55d68cd9-4x9cd_9bbef952-54fd-43ba-ab7f-0737f796fcba/argocd-notifications-controller/1.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-redis-ha-haproxy-6977b8cd75-4zkrm_f3b504ce-b51b-441f-9361-6e03190259ce/config-init/0.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-redis-ha-haproxy-6977b8cd75-4zkrm_f3b504ce-b51b-441f-9361-6e03190259ce/config-init/1.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-redis-ha-haproxy-6977b8cd75-4zkrm_f3b504ce-b51b-441f-9361-6e03190259ce/haproxy/0.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-redis-ha-haproxy-6977b8cd75-4zkrm_f3b504ce-b51b-441f-9361-6e03190259ce/haproxy/1.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-redis-ha-haproxy-6977b8cd75-4zkrm_f3b504ce-b51b-441f-9361-6e03190259ce/secret-init/0.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-redis-ha-haproxy-6977b8cd75-4zkrm_f3b504ce-b51b-441f-9361-6e03190259ce/secret-init/2.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-redis-ha-server-1_03b67787-99ad-4ce0-99cb-c98edb2e03fb/config-init/0.log"}
2024-10-08T14:08:16.647Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-redis-ha-server-1_03b67787-99ad-4ce0-99cb-c98edb2e03fb/config-init/1.log"}
2024-10-08T14:08:17.672Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 2}
2024-10-08T14:08:17.873Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 2}
2024-10-08T14:08:18.074Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-08T14:08:18.275Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-08T14:08:18.476Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-08T14:08:18.676Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-08T14:08:18.877Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-08T14:08:19.077Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-08T14:08:19.279Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 2, "log records": 3}

If I check what is in the Kafka topic, I see just a message from the log and nothing else.
image

I was following this blogpost:
https://opentelemetry.io/blog/2024/otel-collector-container-log-parser/
I tried to use format: auto and format: crio, but without any positive effect.
Also add_metadata_from_filepath: true had no effect.

Cloud you please advice me where can be the issue? Why the logs are not processed to the JSON format and forwarded to the Kafka topic?
Is it possible it is an issue with Kafka exproter and not with the container parser?
Thank you

@djaglowski @TylerHelmuth

@xzizka xzizka added the needs triage New item requiring triage label Oct 8, 2024
Copy link
Contributor

github-actions bot commented Oct 8, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

cc @ChrsMark

@ChrsMark
Copy link
Member

ChrsMark commented Oct 9, 2024

Hey @xzizka!

I'm not really sure what can be the issue here but let's try to narrow the scope down:

  1. Could you ensure that Collector's logs are not collected back? Since you have the debug exporter enabled it would mean that you collect back the whole output again and again. Check how it's done in the Helm charts: https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_config.tpl#L167 . It should be sth like the following:
filelog:
  exclude:
  - /var/log/pods/default_daemonset-collector*_*/opentelemetry-collector/*.log
  include:
  - /var/log/pods/*/*/*.log
  include_file_name: false
  include_file_path: true
  operators: ...

From the logs that you shared it looks like indeed the Collector is collecting and parsing its own output:
Body: Str(SeverityNumber: Unspecified(0))

  1. After step 1 please target a specific Pod and check if its logs are collected and logged in Collector's output.

I don't know if something can be wrong with kafka exporter but let's isolate this for now.

@xzizka
Copy link
Author

xzizka commented Oct 9, 2024

Hello @ChrsMark,

Thank you for your reaction.
ad 1)
I excluded all log from the otel-system (namespace, where all pods of OTEL are installed). The filelog part:

    receivers:
      filelog:
        start_at: end
        include_file_path: true
        include_file_name: false
        include: 
          - /var/log/pods/*/*/*.log 
        exclude:
          # Exclude logs from otel-system namespace
          - /var/log/pods/otel-system_*/*/*.log          
        operators:
          - id: container-parser
            type: container
            add_metadata_from_filepath: true
          - type: add
            field: attributes.pod_name
            value: resource.k8s.pod.name
    service:

ad 2)
Here is a beginning of the log from one of the OTEL pods where is parsed a non-OTEL pod.

2024-10-09T09:16:01.634Z        info    [email protected]/service.go:137 Setting up own telemetry...
2024-10-09T09:16:01.634Z        info    [email protected]/service.go:186 Skipped telemetry setup.
2024-10-09T09:16:01.635Z        info    builders/builders.go:26 Development component. May change in the future.        {"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-10-09T09:16:01.635Z        info    memorylimiter/memorylimiter.go:151      Using percentage memory limiter {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "total_memory_mib": 381, "limit_percentage": 75, "spike_limit_percentage": 20}
2024-10-09T09:16:01.635Z        info    memorylimiter/memorylimiter.go:75       Memory limiter configured       {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "limit_mib": 286, "spike_limit_mib": 76, "check_interval": 1}
2024-10-09T09:16:01.637Z        info    [email protected]/service.go:208 Starting otelcol-contrib...     {"Version": "0.110.0", "NumCPU": 4}
2024-10-09T09:16:01.637Z        info    extensions/extensions.go:39     Starting extensions...
2024-10-09T09:16:01.637Z        info    extensions/extensions.go:42     Extension is starting...        {"kind": "extension", "name": "health_check"}
2024-10-09T09:16:01.637Z        info    [email protected]/healthcheckextension.go:33        Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"192.168.105.221:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-10-09T09:16:01.638Z        info    extensions/extensions.go:59     Extension started.      {"kind": "extension", "name": "health_check"}
2024-10-09T09:16:01.827Z        info    adapter/receiver.go:47  Starting stanza receiver        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2024-10-09T09:16:01.828Z        info    healthcheck/handler.go:132      Health Check state change       {"kind": "extension", "name": "health_check", "status": "ready"}
2024-10-09T09:16:01.828Z        info    [email protected]/service.go:234 Everything is ready. Begin running and processing data.
2024-10-09T09:16:02.030Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-server-f49dccbcb-tzntt_c9b0edb4-662a-484e-bd75-0a6816a7383f/argocd-server/0.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argocd-system_argocd-server-f49dccbcb-tzntt_c9b0edb4-662a-484e-bd75-0a6816a7383f/argocd-server/1.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/dashboard-system_dashboard-kubernetes-dashboard-web-6b55b6df84-vjk7f_c0933ae9-e01f-41e8-86d4-8f1a8ae03fc9/kubernetes-dashboard-web/0.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/dashboard-system_dashboard-kubernetes-dashboard-web-6b55b6df84-vjk7f_c0933ae9-e01f-41e8-86d4-8f1a8ae03fc9/kubernetes-dashboard-web/1.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/dedo-system_dedo-upload-28803938-8mdss_37fc03f8-03a5-45e9-bd35-7569432a146b/dedo-python/0.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/dedo-system_dedo-upload-28805378-pqltr_7f579a0d-6710-44f9-b408-c2d516c8293d/dedo-python/0.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/dedo-system_dedo-upload-28806818-ds8sn_7e99d6f1-3ca6-409b-9008-9a9579bcaed4/dedo-python/0.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/istio-system_istio-ingressgateway-56dd698b66-9xrfz_155c2667-9d03-4cf7-8e73-1e73056b851a/istio-proxy/0.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/istio-test_nginx-57f6f7f454-gfzvq_71c83ba4-2af7-4099-b7aa-b4a2dd7cdca6/istio-proxy/0.log"}
2024-10-09T09:16:02.031Z        info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/istio-test_nginx-57f6f7f454-gfzvq_71c83ba4-2af7-4099-b7aa-b4a2dd7cdca6/istio-validation/0.log"}
2024-10-09T09:16:02.638Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 2}
2024-10-09T09:16:02.638Z        info    ResourceLog #0
Resource SchemaURL:
Resource attributes:
     -> k8s.namespace.name: Str(kube-system)
     -> k8s.pod.name: Str(calico-node-7bmg7)
     -> k8s.container.restart_count: Str(1)
     -> k8s.pod.uid: Str(a868bd35-6366-469f-9c17-dd812778b17f)
     -> k8s.container.name: Str(calico-node)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2024-10-09 09:16:02.434378111 +0000 UTC
Timestamp: 2024-10-09 09:16:02.317561416 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(2024-10-09 09:16:02.317 [INFO][78] felix/int_dataplane.go 1836: Received *proto.WireguardEndpointUpdate update from calculation graph msg=hostname:"vncub28392" public_key:"ZdLWXSx4+QLr+4BWKB1CAzP73X7WE0WFN2bZgFoacjU=" interface_ipv4_addr:"192.168.227.193")
Attributes:
     -> log.file.path: Str(/var/log/pods/kube-system_calico-node-7bmg7_a868bd35-6366-469f-9c17-dd812778b17f/calico-node/1.log)
     -> log.iostream: Str(stdout)
     -> pod_name: Str(resource.k8s.pod.name)
     -> logtag: Str(F)
Trace ID:
Span ID:
Flags: 0
LogRecord #1
ObservedTimestamp: 2024-10-09 09:16:02.434896336 +0000 UTC
Timestamp: 2024-10-09 09:16:02.317625143 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(2024-10-09 09:16:02.317 [INFO][78] felix/int_dataplane.go 1836: Received *proto.HostMetadataV4V6Update update from calculation graph msg=hostname:"vncub28392" ipv4_addr:"10.85.47.151/25" labels:<key:"beta.kubernetes.io/arch" value:"amd64" > labels:<key:"beta.kubernetes.io/instance-type" value:"C_2.R_6.D_30" > labels:<key:"beta.kubernetes.io/os" value:"linux" > labels:<key:"failure-domain.beta.kubernetes.io/region" value:"RegionOne" > labels:<key:"failure-domain.beta.kubernetes.io/zone" value:"thp" > labels:<key:"kubernetes.io/arch" value:"amd64" > labels:<key:"kubernetes.io/hostname" value:"vncub28392" > labels:<key:"kubernetes.io/os" value:"linux" > labels:<key:"node-role.kubernetes.io/control-plane" value:"" > labels:<key:"node.kubernetes.io/exclude-from-external-load-balancers" value:"" > labels:<key:"node.kubernetes.io/instance-type" value:"C_2.R_6.D_30" > labels:<key:"topology.kubernetes.io/region" value:"RegionOne" > labels:<key:"topology.kubernetes.io/zone" value:"thp" >)
Attributes:
     -> log.file.path: Str(/var/log/pods/kube-system_calico-node-7bmg7_a868bd35-6366-469f-9c17-dd812778b17f/calico-node/1.log)
     -> log.iostream: Str(stdout)
     -> pod_name: Str(resource.k8s.pod.name)
     -> logtag: Str(F)
Trace ID:
Span ID:
Flags: 0
        {"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-10-09T09:16:02.839Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 13}
2024-10-09T09:16:02.839Z        info    ResourceLog #0
Resource SchemaURL:
Resource attributes:
     -> k8s.pod.name: Str(calico-node-7bmg7)
     -> k8s.container.restart_count: Str(1)
     -> k8s.pod.uid: Str(a868bd35-6366-469f-9c17-dd812778b17f)
     -> k8s.container.name: Str(calico-node)
     -> k8s.namespace.name: Str(kube-system)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2024-10-09 09:16:02.633545545 +0000 UTC
Timestamp: 2024-10-09 09:16:02.554610044 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(2024-10-09 09:16:02.554 [INFO][78] felix/calc_graph.go 467: Local endpoint updated id=WorkloadEndpoint(node=vncub28397, orchestrator=k8s, workload=otel-system/otel-agent-bzcws, name=eth0))
Attributes:
     -> log.file.path: Str(/var/log/pods/kube-system_calico-node-7bmg7_a868bd35-6366-469f-9c17-dd812778b17f/calico-node/1.log)
     -> log.iostream: Str(stdout)
     -> pod_name: Str(resource.k8s.pod.name)
     -> logtag: Str(F)
Trace ID:
Span ID:
Flags: 0
LogRecord #1
ObservedTimestamp: 2024-10-09 09:16:02.633689065 +0000 UTC
Timestamp: 2024-10-09 09:16:02.554682889 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(2024-10-09 09:16:02.554 [INFO][78] felix/int_dataplane.go 1836: Received *proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"otel-system/otel-agent-bzcws" endpoint_id:"eth0" > endpoint:<state:"active" name:"calib3ea58f4866" profile_ids:"kns.otel-system" profile_ids:"ksa.otel-system.otel-collector" ipv4_nets:"192.168.105.221/32" >)
Attributes:
     -> log.iostream: Str(stdout)
     -> pod_name: Str(resource.k8s.pod.name)
     -> log.file.path: Str(/var/log/pods/kube-system_calico-node-7bmg7_a868bd35-6366-469f-9c17-dd812778b17f/calico-node/1.log)
     -> logtag: Str(F)
Trace ID:
Span ID:
Flags: 0

I also see, that in the output above, there is -> pod_name: Str(resource.k8s.pod.name), but I would expect there a name of the pod...

...
2024-10-09T09:16:02.839Z        info    ResourceLog #0
Resource SchemaURL:
Resource attributes:
     -> k8s.pod.name: Str(calico-node-7bmg7)
     -> k8s.container.restart_count: Str(1)
     -> k8s.pod.uid: Str(a868bd35-6366-469f-9c17-dd812778b17f)
     -> k8s.container.name: Str(calico-node)
     -> k8s.namespace.name: Str(kube-system)
ScopeLogs #0
...

Did I configured the addoperator correctly?

Thank you for any advice....

@ChrsMark
Copy link
Member

ChrsMark commented Oct 9, 2024

So everything looks to work, right?

The logs seem to be parsed and the content ends up in the Body: Body: Str(2024-10-09 09:16:02.554 [INFO][78] felix/int_dataplane.go 1836: Received *proto.WorkloadEndpointUpdate update from calculation graph msg=id:<orchestrator_id:"k8s" workload_id:"otel-system/otel-agent-bzcws" endpoint_id:"eth0" > endpoint:<state:"active" name:"calib3ea58f4866" profile_ids:"kns.otel-system" profile_ids:"ksa.otel-system.otel-collector" ipv4_nets:"192.168.105.221/32" >)

Did I configured the addoperator correctly?

What you have set just add the resource.k8s.pod.name string as a value.
But I think you don't need the add operator at all. The k8s.pod.name is already present in the Resource Attributes, for example k8s.pod.name: Str(calico-node-7bmg7).
If you still need such a copy you can use https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/copy.md.

@xzizka
Copy link
Author

xzizka commented Oct 9, 2024

Hello @ChrsMark,

Thank you for the analysis.
So from your point of view filelog and container operator works correctly?
In that case the issue then points to the kafka exporter. Am I right? @pavolloffay @MovieStoreGuy

Thank you

@xzizka
Copy link
Author

xzizka commented Oct 9, 2024

I did some more tests... I changed the configuration to this one:

    receivers:
      filelog:
        start_at: end
        include_file_path: true
        include_file_name: false
        include: 
          - /var/log/pods/*/*/*.log 
        exclude:
          # Exclude logs from all containers named otel-collector and otel-agent
          - /var/log/pods/otel-system_*/*/*.log          
        operators:
          - id: container-parser
            type: container
            add_metadata_from_filepath: true
          - type: stdout
    service:

Then I saw in the log records like this one (formated for better reading):

{
    "observed_timestamp": "2024-10-09T10:57:30.794746368Z",
    "timestamp": "2024-10-09T10:57:30.6709744Z",
    "body": "time=\"2024-10-09T10:57:30Z\" level=info msg=\"Found 1 new/updated HPA(s)\" provider=hpa",
    "attributes": {
        "log.file.path": "/var/log/pods/kube-system_kube-metrics-adapter-75c64b5548-fkn8w_7eb00a60-581e-46cb-9fdc-74cdfe25a976/kube-metrics-adapter/1.log",
        "log.iostream": "stderr",
        "logtag": "F"
    },
    "resource": {
        "k8s.container.name": "kube-metrics-adapter",
        "k8s.container.restart_count": "1",
        "k8s.namespace.name": "kube-system",
        "k8s.pod.name": "kube-metrics-adapter-75c64b5548-fkn8w",
        "k8s.pod.uid": "7eb00a60-581e-46cb-9fdc-74cdfe25a976"
    },
    "severity": 0,
    "scope_name": ""
}

But in the Kafka topic I still see only content of the body.
image

I created a new Kafka topic and sent there tem full message manually. I was able to see there the full message.
image

Then I forwarded the parsed log to this topic and I see the same again like mentioned before.
image
image

So it looks Kafkaexporter sends just the content of the body and not the whole message.
@pavolloffay @MovieStoreGuy, would be please possible to investigate this issue?
Thank you.

/label exporter/kafka

Copy link
Contributor

github-actions bot commented Oct 9, 2024

Pinging code owners for exporter/kafka: @pavolloffay @MovieStoreGuy. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@xzizka xzizka changed the title Container operator does not parse the CRI-O logs. Kafka Exporter incorrectly sends data from filelog receiver to topic (Container operator does not parse the CRI-O logs.) Oct 10, 2024
@xzizka
Copy link
Author

xzizka commented Oct 11, 2024

I was checking the Kafka exporter code. I am not a Go programmer, but I see in the code that the RAW marshaller takes only the body of the log (I expect, that the Body means just the error message).
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/kafkaexporter/raw_marshaler.go#L32

...
for k := 0; k < sl.LogRecords().Len(); k++ {
            lr := sl.LogRecords().At(k)
            b, err := r.logBodyAsBytes(lr.Body())
...

This would provide an answer to my question, why do I see just the content of the Body parameter in the Kafka topic and not the whole message produced by filelog. See the output from the filelog from the debug mode I provided above.

...
LogRecord #1
ObservedTimestamp: 2024-10-09 09:16:02.434896336 +0000 UTC
Timestamp: 2024-10-09 09:16:02.317625143 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(2024-10-09 09:16:02.317 [INFO][78] felix/int_dataplane.go 1836: Received *proto.HostMetadataV4V6Update update from calculation graph msg=hostname:"vncub28392" ipv4_addr:"10.85.47.151/25" labels:<key:"beta.kubernetes.io/arch" value:"amd64" > labels:<key:"beta.kubernetes.io/instance-type" value:"C_2.R_6.D_30" > labels:<key:"beta.kubernetes.io/os" value:"linux" > labels:<key:"failure-domain.beta.kubernetes.io/region" value:"RegionOne" > labels:<key:"failure-domain.beta.kubernetes.io/zone" value:"thp" > labels:<key:"kubernetes.io/arch" value:"amd64" > labels:<key:"kubernetes.io/hostname" value:"vncub28392" > labels:<key:"kubernetes.io/os" value:"linux" > labels:<key:"node-role.kubernetes.io/control-plane" value:"" > labels:<key:"node.kubernetes.io/exclude-from-external-load-balancers" value:"" > labels:<key:"node.kubernetes.io/instance-type" value:"C_2.R_6.D_30" > labels:<key:"topology.kubernetes.io/region" value:"RegionOne" > labels:<key:"topology.kubernetes.io/zone" value:"thp" >)
Attributes:
     -> log.file.path: Str(/var/log/pods/kube-system_calico-node-7bmg7_a868bd35-6366-469f-9c17-dd812778b17f/calico-node/1.log)
     -> log.iostream: Str(stdout)
     -> pod_name: Str(resource.k8s.pod.name)
     -> logtag: Str(F)
Trace ID:
Span ID:
Flags: 0
        {"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-10-09T09:16:02.839Z        info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 13}
2024-10-09T09:16:02.839Z        info    ResourceLog #0
Resource SchemaURL:
Resource attributes:
     -> k8s.pod.name: Str(calico-node-7bmg7)
     -> k8s.container.restart_count: Str(1)
     -> k8s.pod.uid: Str(a868bd35-6366-469f-9c17-dd812778b17f)
     -> k8s.container.name: Str(calico-node)
     -> k8s.namespace.name: Str(kube-system)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
...

@pavolloffay @MovieStoreGuy Could you please confirm that?

@xzizka
Copy link
Author

xzizka commented Oct 11, 2024

Ah yes... It is also written in the documentation...

The following encodings are valid only for logs.
-> raw: if the log record body is a byte array, it is sent as is. Otherwise, it is serialized to JSON. Resource and record attributes are discarded.

Maybe a feature request to send the whole record and not just a body. I need to think about the use case. Thanks to all participants in this discussion.

@xzizka xzizka closed this as completed Oct 11, 2024
@ChrsMark
Copy link
Member

Thank's for investigating this @xzizka !
Without being really familiar with the kafka exporter I think it does make sense to ship attributes and resource attributes along with log messages.
Please feel free to open a follow-up issue to propose this once you have the details figured out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants