Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add topk/bottomk that aggregates others #1248

Merged
merged 1 commit into from
Feb 2, 2021

Conversation

brharrington
Copy link
Contributor

Fixes #1224.

Adds variants of topk/bottomk that in addition to computing
the highest priority values will also return an aggregate
time series that includes all of the other time series that
were not high enough priority. This can be useful to see
what proportion of the overall volume is represented by
the highest priority time series. The operators have the
same signature as :topk and :bottomk, but the
aggregation to use is specified in the operator name.
There are four aggregates supported:

  • min
  • max
  • sum
  • avg

Count was not included as an aggregate because it could be
ambiguous and does not seem useful.

As part of this work there was some refactoring of the
existing aggregation operations to make the count case
more consistent.

@brharrington brharrington added this to the 1.7.0 milestone Feb 2, 2021
@brharrington brharrington requested a review from jfz February 2, 2021 18:08
@brharrington
Copy link
Contributor Author

Note, this PR depends on #1247 and will need to be updated once that is merged.

@brharrington
Copy link
Contributor Author

Regarding count, the reason it was considered problematic is that it could mean the aggregate count across others or the count of how many others were present. For example:

Expr:
    Q,:count,(,foo,),:by

Output:
    foo1: 1
    foo2: 2
    foo3: 3 (means 3 time series had values for foo3)

Expr with priority operator:
    Q,:count,(,foo,),:by,max,1,:bottomk-others-count

Option 1:
    foo1:       1 (highest priority)
    --others--: 5 (cumulative number of input time series contributing to others)

Option 2:
    foo1:       1 (highest priority)
    --others--: 2 (number of time series that were not high enough priority)

Fixes Netflix#1224.

Adds variants of topk/bottomk that in addition to computing
the highest priority values will also return an aggregate
time series that includes all of the other time series that
were not high enough priority. This can be useful to see
what proportion of the overall volume is represented by
the highest priority time series. The operators have the
same signature as `:topk` and `:bottomk`, but the
aggregation to use is specified in the operator name.
There are four aggregates supported:

- min
- max
- sum
- avg

Count was not included as an aggregate because it could be
ambiguous and does not seem useful.

As part of this work there was some refactoring of the
existing aggregation operations to make the count case
more consistent.
@brharrington brharrington merged commit 690d6ce into Netflix:master Feb 2, 2021
@brharrington brharrington deleted the topk-others branch February 2, 2021 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

proposal for top/bottom-K operators
2 participants