-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smarter waiting for late spans in tailsamplingprocessor #31498
Comments
Pinging code owners for processor/tailsampling: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hello. I think this idea is great. Currently, in tail sampling, there is no check to determine whether a span is the root span. Only the time tick triggers the actual analysis. I would like to suggest some modifications base on your idea:
So the configuration layout could be: processors:
tail_sampling:
decision_wait: 30s
# new config here. default empty, which is equal to current mechanism.
# it could also be set to 0s so that it will start analysing once the root span is received.
decision_wait_after_root_span: 5s
num_traces: 100
expected_new_traces_per_sec: 10 |
Hey @jiekun, thanks for your thoughts! You raise some good points, especially around async spans (I hadn't considered this edge case). For the above reason, perhaps it's not enough to just use a shorter wait for any lagging or async spans? Perhaps maintaining a set of trace ids that we have sampled in the past X minutes is a better approach. It'll consume less memory that storing all spans for the full duration of There's also the question of spans that arrive late for traces that we decided not to sample- what should we do with these? Do these become interesting if they've arrived late? |
I understand your perspective of reducing memory usage while maintaining sampling accuracy, which is great! However, I'm concerned that it might make the processor more complex and less easy to understand. Additionally, for users who don't need to worry about asynchronous spans, if they upgrade without adjusting the tail sampling latency, they will only notice an increase in memory consumption due to storing additional trace IDs. I still support these new ideas, but they may require the support of the maintainer. Personally, I would be more than happy to implement them in the our internal collector :) |
BTW, may I ask if you plan to submit a PR for those ideas or just the feature request? I would like to split them into (at least) 2 parts:
They can be implemented independently if we have support from the maintainer. |
I've been talking to a few people about a decision cache, which should solve the second problem. The idea would be to have a simple map of trace id with boolean values, indicating whether they were sampled. Note that we want to cache both a positive and a negative answer: we don't want to sample spans for a trace that was rejected before, and we want to sample spans for traces that were accepted. A limitation we need to document is that this cache isn't distributed, so, a scalable tail-sampling setup will likely still have the same problems as today if spans get into different collectors where decisions were made, potentially because of topology changes. My original idea was to implement a ring buffer as the cache, so that we have a fixed number of decisions in memory. I also considered a LRU, but not sure this brings any benefits. |
cc @kentquirk, as I believe you are interested in those components as well |
Thanks for your thoughts @jpkrohling, I had a similar thought about the negative case but was thinking about a different solution: what about pairing the set of sampled trace IDs with a bloom filter that contained the set of non-sampled trace ids? The idea of this approach is based on the premise that the tail sampler is likely to reject more traces than it accepts. Using a map for the accepted + rejected spans would increase memory use further. A bloom filter could optimize the rejected case. EDIT: I thought about it a bit more and modelled some potential parameters, I don't think the bloom filter would be the first implementation choice- the map would be simpler and not require orders of more memory. |
@jiekun I'm interested in submitting a PR but if you're keen to work this too, we could always split the work between us- happy either way 👍 |
Right, traceID is only 16 bytes, which means that ~10MiB is enough to store more than 650k entries in the cache, if my math is right. |
I recorded some of the things I had in mind here: #31580 |
It looks like there's been a lot of good discussion here, and some actions items have been noted. Removing |
I'm happy to starting thinking about #31583 if that's fine with everyone. I can move future discussion around the decision cache into this issue. EDIT: wrong issue link, was referring to decision cache only |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Personally also wondering if disk can play a role in storing longer lived traces. Or additionally an option to compress the spans in memory before caching them (if you are willing to take the CPU tradeoff). |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I believe this has been implemented by the decision caches implemented recently by @jamesmoessis. If that's not the case, feel free to reopen. |
Component(s)
No response
Is your feature request related to a problem? Please describe.
Currently we have to deploy the tailsamplingprocessor with a reasonably large
decision_wait
value (2m). We do this in order to be able to capture our long-tail traces but this imposes an unwelcome cost due to how completed traces are buffered in memory:decision_wait
period seems unnecessary (although it makes sense to wait a short period for any lagging spans that are a part of the trace).decision_wait
, even though they are ready to view. In our case, this adds two minutes of latency to every trace!Describe the solution you'd like
I'm not entirely sure what the solution looks like here but some thoughts:
Describe alternatives you've considered
I'm aware of no other alternatives.
Additional context
The services we tail sample are deal with some very high throughput and can process 100K spans/ sec+. Attempting to tail sample at this volume imposes a significant memory overhead considering we need to effectively buffer 12mil spans (120s x 100K spans).
The text was updated successfully, but these errors were encountered: