-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not create ES indices too far into the past/future #841
Comments
I think this is a good idea. Internally, we've run into a problem where services were setting timestamps far enough in the future to cause overflows.
I don't think dropping spans for incorrect timestamps is reasonable, instead we could overwrite the timestamp with the ingestion time, and log a warning on the span. (Ideally, we'd like users to be able to retrieve these spans as part of a trace even if the timestamps are invalid). I'm not sure whether saving them into the current index accomplishes the same thing. |
Overwriting sounds good. |
It turns out that it isn't a bug in the service - I've added extra logging wrapped around Sender and it didn't catch anything. I suspect that once in a while UDP packets sent to the agent are corrupted. Based on number of extra indices in ES It happens few times per 10^9 spans. |
Just adding a note this behavior is not present when using rollover aliases |
Solution to this problem would really help. Currently we run into the trouble with elasticsearch having too many indices. I agree that we should rewrite timestamps if they are in the future. |
Requirement - what kind of business use case are you trying to solve?
Jaeger performance should not degrade due to bugs in services reporting spans.
Problem - what in Jaeger blocks you from solving the requirement?
One of our services (due to some bug) reports spans with begin timestamps far into the future (years). This causes a lot of indices to be created in the elasticsearch for strange dates because ES indices are created per day. For example:
This impacts ES cluster for example because each index's shard holds own file handles.
Additionally our curator script does not remove those indices as they are considered to be in the future (and only past ones are removed).
Proposal - what do you suggest to solve the problem or improve the existing situation?
Restrict in the collector what timestamps are allowed and reject spans which are too old or too far into the future. E.g. not older than 14 days, at most 1 day into the future. Drop spans outside of this range or save them into the "current" index.
The range could be configurable.
The text was updated successfully, but these errors were encountered: