Trace-preserving mode for processor/tailsampling #25122

garry-cairns · 2023-08-09T16:12:01Z

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

We would like to use tail-based sampling because we believe it will give better insights into our running processes than head-based, and we have far too much data volume to store 100% of traces. We would, however, like to retain connections between our aggregated metrics, which we produce using the spanmetrics connector, and our stored traces. This is not currently possible.

Describe the solution you'd like

We would like there to be a configurable option to separate the concerns of sampling from that of filtering. In this model, the tail-based sampling processor could be configured in a "soft" mode (the name isn't important if you prefer another) that would simply update sampling.priority on all spans for a trace it has decided to sample and do no filtering. This would let subsequent processors including, but not limited to, spanmetrics use this information. The user would then be responsible for filtering unsampled traces/spans using the filter processor in their trace pipeline(s).

To expand on the connector/spanmetrics example, this would involve a separate feature request to make its exemplar behavior smarter such that it would only include trace IDs where sampling.priority > 0 as exemplars of aggregated metrics in the presence of such an attribute. This means spanmetrics could produce accurate metrics based on 100% of traces, which it would need, without incurring the cost of storing all of those traces.

Describe alternatives you've considered

One alternative we considered was changing spanmetrics such that it would mutate any trace it used as an exemplar to make connections between its metrics and the traces from which they were derived simpler. But this would mean further changes to spanmetrics, which currently stores references to 100% of traces it uses to produce its output as "exemplars" and also
couples the solution too tightly to spanmetrics. Our preferred solution leaves current behavior in place for those relying on it, while also offering a nice separation of concerns giving other users much more flexibility to innovate with their pipelines.

Additional context

We are working in an environment with many thousands of hosts running hundreds of thousands of services, each of which may pass context belonging to the same logical traces between them.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-08-09T16:12:20Z

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling · 2023-08-22T08:09:56Z

I see the problem, and instead of using the filter processor, it would probably make sense to use a second-stage sampler as a connector:

receivers:
- otlp

processors:
- firststagesampling # (our current tail-sampling processor?)
- spanmetrics
- batch

exporters: 
- otlp

connectors:
- secondstagesampling

pipelines:
- traces:
  - receivers: [otlp]
  - processors: [firstagesampling, spanmetrics]
  - exporters: [secondstagesampling]
- traces/export:
  - receivers: [secondstagesampling]
  - processors: [batch]
  - exporters: [otlp]

I'm not sure I would use the current tail-sampling for that.

garry-cairns · 2023-08-22T10:50:49Z

I like the pipeline design, and would likely use it, but couldn't we just use the existing routing connector with the first stage sampling decision as the criterion on which it's routing? (this may have been your intent but it wasn't clear to me so let me know)

jpkrohling · 2023-08-23T07:11:53Z

The idea is that the first stage sampling will appropriately mark the root spans with the sampling decision and the second stage sampling will effectively sample out the traces that were not marked as selected. While the routing connector has some of the same features (filter out data that is not relevant for the pipeline's specific exporter), I think having sampling in two stages will have a better user experience.

github-actions · 2023-10-23T03:30:08Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

garry-cairns · 2023-12-15T11:07:49Z

I've got some capacity just now so I'm going to have a go at implementing this.

github-actions · 2024-02-15T03:29:08Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-04-15T05:18:24Z

This issue has been closed as inactive because it has been stale for 120 days with no activity.

github-actions · 2024-07-01T03:32:03Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-09-09T03:33:06Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-11-11T03:33:48Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

garry-cairns added enhancement New feature or request needs triage New item requiring triage labels Aug 9, 2023

github-actions bot added the processor/tailsampling Tail sampling processor label Aug 9, 2023

jpkrohling removed the needs triage New item requiring triage label Aug 22, 2023

github-actions bot added the Stale label Oct 23, 2023

github-actions bot removed the Stale label Dec 16, 2023

github-actions bot added the Stale label Feb 15, 2024

github-actions bot added the closed as inactive label Apr 15, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 15, 2024

jpkrohling removed Stale closed as inactive labels Apr 30, 2024

jpkrohling self-assigned this Apr 30, 2024

jpkrohling reopened this Apr 30, 2024

jpkrohling mentioned this issue Apr 30, 2024

Refactor tail-sampling processor #31580

Open

4 tasks

github-actions bot added the Stale label Jul 1, 2024

jpkrohling added help wanted Extra attention is needed and removed Stale labels Jul 8, 2024

jpkrohling removed their assignment Jul 8, 2024

github-actions bot added the Stale label Sep 9, 2024

jpkrohling removed the Stale label Sep 9, 2024

github-actions bot added the Stale label Nov 11, 2024

jpkrohling removed the Stale label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace-preserving mode for processor/tailsampling #25122

Trace-preserving mode for processor/tailsampling #25122

garry-cairns commented Aug 9, 2023 •

edited

Loading

github-actions bot commented Aug 9, 2023

jpkrohling commented Aug 22, 2023

garry-cairns commented Aug 22, 2023 •

edited

Loading

jpkrohling commented Aug 23, 2023

github-actions bot commented Oct 23, 2023

garry-cairns commented Dec 15, 2023

github-actions bot commented Feb 15, 2024

github-actions bot commented Apr 15, 2024

github-actions bot commented Jul 1, 2024

github-actions bot commented Sep 9, 2024

github-actions bot commented Nov 11, 2024

Trace-preserving mode for processor/tailsampling #25122

Trace-preserving mode for processor/tailsampling #25122

Comments

garry-cairns commented Aug 9, 2023 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Aug 9, 2023

jpkrohling commented Aug 22, 2023

garry-cairns commented Aug 22, 2023 • edited Loading

jpkrohling commented Aug 23, 2023

github-actions bot commented Oct 23, 2023

garry-cairns commented Dec 15, 2023

github-actions bot commented Feb 15, 2024

github-actions bot commented Apr 15, 2024

github-actions bot commented Jul 1, 2024

github-actions bot commented Sep 9, 2024

github-actions bot commented Nov 11, 2024

garry-cairns commented Aug 9, 2023 •

edited

Loading

garry-cairns commented Aug 22, 2023 •

edited

Loading