Do not create ES indices too far into the past/future #841

mabn · 2018-05-23T09:32:37Z

Requirement - what kind of business use case are you trying to solve?

Jaeger performance should not degrade due to bugs in services reporting spans.

Problem - what in Jaeger blocks you from solving the requirement?

One of our services (due to some bug) reports spans with begin timestamps far into the future (years). This causes a lot of indices to be created in the elasticsearch for strange dates because ES indices are created per day. For example:

$ curl $ESADDR/_cat/shards -s | wc -l
9986
$ curl $ESADDR/_cat/indices -s | head
green open jaeger-service-8160-12-02   E759tRv5TeyZD0cB2xubCw 48 0          1   0   9.4kb   9.4kb
green open jaeger-span-8154-09-01      tDhwyd9sQEqIgDYxmd8xXw 48 0          0   0     6kb     6kb
green open jaeger-span-8157-02-28      4Yam481RTF-jS7NpGKBS1g 48 0          0   0     6kb     6kb
green open jaeger-service-8154-03-05   YGICrbBkTNaFu-tiaoFYbw 48 0          1   0   9.1kb   9.1kb
green open jaeger-service-8148-12-15   97E695J7R3uyFSImODLx7g 48 1          1   0    19kb   9.5kb
green open jaeger-span-8151-11-09      U1owZ6ieQ6CrPMHlmcNc1g 48 0          0   0     6kb     6kb
green open jaeger-span-8160-02-18      FFVy6vQ1RamzyauHT_Njow 48 0          0   0     6kb     6kb
green open jaeger-span-8156-02-26      -hNsr6rtS6CCxAUH4HrJmw 48 0          0   0     6kb     6kb
green open jaeger-service-208917-08-31 Qx7Xm9jbQe-3Lf1ryv0paQ 48 0          1   0   9.1kb   9.1kb
green open jaeger-service-8163-10-27   g5mLfk9IQQCvYlCooWWqWQ 48 0          1   0   9.4kb   9.4kb9.5kb

This impacts ES cluster for example because each index's shard holds own file handles.
Additionally our curator script does not remove those indices as they are considered to be in the future (and only past ones are removed).

Proposal - what do you suggest to solve the problem or improve the existing situation?

Restrict in the collector what timestamps are allowed and reject spans which are too old or too far into the future. E.g. not older than 14 days, at most 1 day into the future. Drop spans outside of this range or save them into the "current" index.
The range could be configurable.

The text was updated successfully, but these errors were encountered:

vprithvi · 2018-05-23T13:34:47Z

I think this is a good idea.

Internally, we've run into a problem where services were setting timestamps far enough in the future to cause overflows.

Drop spans outside of this range or save them into the "current" index.

I don't think dropping spans for incorrect timestamps is reasonable, instead we could overwrite the timestamp with the ingestion time, and log a warning on the span. (Ideally, we'd like users to be able to retrieve these spans as part of a trace even if the timestamps are invalid). I'm not sure whether saving them into the current index accomplishes the same thing.

mabn · 2018-05-23T14:09:20Z

Overwriting sounds good.

mabn · 2018-06-12T09:27:27Z

It turns out that it isn't a bug in the service - I've added extra logging wrapped around Sender and it didn't catch anything. I suspect that once in a while UDP packets sent to the agent are corrupted.

Based on number of extra indices in ES It happens few times per 10^9 spans.

pavolloffay · 2019-12-05T14:43:46Z

Just adding a note this behavior is not present when using rollover aliases --es.use-aliases flag as It uses a single index to write data.

mehta-ankit · 2022-07-29T18:41:14Z

I think this is a good idea.

Internally, we've run into a problem where services were setting timestamps far enough in the future to cause overflows.

Drop spans outside of this range or save them into the "current" index.

I don't think dropping spans for incorrect timestamps is reasonable, instead we could overwrite the timestamp with the ingestion time, and log a warning on the span. (Ideally, we'd like users to be able to retrieve these spans as part of a trace even if the timestamps are invalid). I'm not sure whether saving them into the current index accomplishes the same thing.

Solution to this problem would really help. Currently we run into the trouble with elasticsearch having too many indices.

I agree that we should rewrite timestamps if they are in the future.
But ideally have a flag where the user can decide if they want our of order spans (aka future spans) or rewrite timestamp.
Similar to what Vector has: https://vector.dev/docs/reference/configuration/sinks/loki/#out_of_order_action

pavolloffay added the storage/elasticsearch label Aug 3, 2018

pavolloffay mentioned this issue Jan 11, 2019

Support archive traces for ES storage #1197

Merged

pavolloffay mentioned this issue Jul 22, 2019

Collector: Some Spans are not persisted in ElasticSearch #1674

Closed

pavolloffay mentioned this issue Sep 18, 2019

Strange indices names in ES #1804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not create ES indices too far into the past/future #841

Do not create ES indices too far into the past/future #841

mabn commented May 23, 2018

vprithvi commented May 23, 2018 •

edited

Loading

mabn commented May 23, 2018

mabn commented Jun 12, 2018 •

edited

Loading

pavolloffay commented Dec 5, 2019 •

edited

Loading

mehta-ankit commented Jul 29, 2022

Do not create ES indices too far into the past/future #841

Do not create ES indices too far into the past/future #841

Comments

mabn commented May 23, 2018

Requirement - what kind of business use case are you trying to solve?

Problem - what in Jaeger blocks you from solving the requirement?

Proposal - what do you suggest to solve the problem or improve the existing situation?

vprithvi commented May 23, 2018 • edited Loading

mabn commented May 23, 2018

mabn commented Jun 12, 2018 • edited Loading

pavolloffay commented Dec 5, 2019 • edited Loading

mehta-ankit commented Jul 29, 2022

vprithvi commented May 23, 2018 •

edited

Loading

mabn commented Jun 12, 2018 •

edited

Loading

pavolloffay commented Dec 5, 2019 •

edited

Loading