add topk/bottomk that aggregates others #1248

brharrington · 2021-02-02T18:08:51Z

Adds variants of topk/bottomk that in addition to computing
the highest priority values will also return an aggregate
time series that includes all of the other time series that
were not high enough priority. This can be useful to see
what proportion of the overall volume is represented by
the highest priority time series. The operators have the
same signature as :topk and :bottomk, but the
aggregation to use is specified in the operator name.
There are four aggregates supported:

min
max
sum
avg

Count was not included as an aggregate because it could be
ambiguous and does not seem useful.

As part of this work there was some refactoring of the
existing aggregation operations to make the count case
more consistent.

brharrington · 2021-02-02T18:09:28Z

Note, this PR depends on #1247 and will need to be updated once that is merged.

brharrington · 2021-02-02T18:17:24Z

Regarding count, the reason it was considered problematic is that it could mean the aggregate count across others or the count of how many others were present. For example:

Expr:
    Q,:count,(,foo,),:by

Output:
    foo1: 1
    foo2: 2
    foo3: 3 (means 3 time series had values for foo3)

Expr with priority operator:
    Q,:count,(,foo,),:by,max,1,:bottomk-others-count

Option 1:
    foo1:       1 (highest priority)
    --others--: 5 (cumulative number of input time series contributing to others)

Option 2:
    foo1:       1 (highest priority)
    --others--: 2 (number of time series that were not high enough priority)

Fixes Netflix#1224. Adds variants of topk/bottomk that in addition to computing the highest priority values will also return an aggregate time series that includes all of the other time series that were not high enough priority. This can be useful to see what proportion of the overall volume is represented by the highest priority time series. The operators have the same signature as `:topk` and `:bottomk`, but the aggregation to use is specified in the operator name. There are four aggregates supported: - min - max - sum - avg Count was not included as an aggregate because it could be ambiguous and does not seem useful. As part of this work there was some refactoring of the existing aggregation operations to make the count case more consistent.

brharrington added this to the 1.7.0 milestone Feb 2, 2021

brharrington requested a review from jfz February 2, 2021 18:08

jfz approved these changes Feb 2, 2021

View reviewed changes

brharrington force-pushed the topk-others branch from af414f6 to cc9217c Compare February 2, 2021 19:46

brharrington merged commit 690d6ce into Netflix:master Feb 2, 2021

brharrington deleted the topk-others branch February 2, 2021 19:53

brharrington added the enhancement label Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add topk/bottomk that aggregates others #1248

add topk/bottomk that aggregates others #1248

brharrington commented Feb 2, 2021

brharrington commented Feb 2, 2021

brharrington commented Feb 2, 2021

add topk/bottomk that aggregates others #1248

add topk/bottomk that aggregates others #1248

Conversation

brharrington commented Feb 2, 2021

brharrington commented Feb 2, 2021

brharrington commented Feb 2, 2021