auto rollup or drop tags based on cardinality limits #1280

jfz · 2021-03-19T20:54:04Z

Automatically rollup or drop tags based on cardinality limit configuration, also provide an api to check if a query can be served based on the limiter stats.

It tracks cardinality by a list of pre-defined keys and associated limits, and build out a tree structure level by level based on that, also tracks cardinality for other tag keys at leaf level.

Given below example configuration:
atlas.auto-rollup = { prefix = [ { key = "app", value-limit = 20, total-limit = 1000, }, { key = "name", value-limit = 30, total-limit = 50, } ], tag-value-limit = 40 }

Level 1: track cardinality by "app", max number of apps allowed is 20, max number of tags across all apps is 1000; drop new apps if either limit is reached.
Level 2: for a specific app, track cardinality by "name", max number of names is 30, max number of tags across all names is 50; drop new names if limit reached; drop new names if either limit is reached.
Level 3: for a specific app and name, tracks number of unique values per non prefix tag keys, max number of values is 40; rollup if the limit is reached.

brharrington

Looks good, added some minor notes and questions

brharrington · 2021-03-29T15:09:59Z

atlas-core/src/main/scala/com/netflix/atlas/core/limiter/CardinalityLimiter.scala

+    update(tags, TaggedItem.computeId(tags))
+  }
+
+  /** Get total cardinality for total tags ever seen. */


Consider wording this as the "total number of distinct tag maps ever seen". As a tag can be a single key/value pair within a tag map, I found the current wording a bit confusing.

brharrington · 2021-03-29T15:13:58Z

atlas-core/src/main/scala/com/netflix/atlas/core/limiter/CardinalityLimiter.scala

+  /**
+    * A convenient way get topk keys by cardinality at all levels, mainly used for debug/inspect.
+    */
+  def topk(k: Int): AnyRef


Is there a better return type than AnyRef we could use here? Not really sure how to use it right now.

introduced a new type: CardinalityStats

brharrington · 2021-03-29T15:21:45Z

atlas-core/src/main/scala/com/netflix/atlas/core/limiter/LimiterConifg.scala

+  * @param prefixConfigs list of config with prefix key and associated limits
+  * @param tagValueLimit max number of values per non prefix key
+  */
+case class LimiterConfig(prefixConfigs: Array[PrefixConfig], tagValueLimit: Int) {


ArraySeq might be a better option here as it preserves immutability. It wraps an array so there is a bit of overhead, but I don't think it will be noticeable for this use-case. We mainly try to avoid it for things like payloads getting deserialized frequently where the additional allocations start to add up.

brharrington · 2021-03-29T15:26:58Z

atlas-core/src/main/scala/com/netflix/atlas/core/limiter/CardinalityLimiter.scala

+    val value = tags.getOrElse(key, MissingKey)
+    val conf = limiterConfig.getPrefixConfig(level)
+
+    def reachLimit: Boolean = {


Could key and conf be moved to member variables and then have reachLimit be a private method? Otherwise the lamba object would need to be created for each invocation of update.

Extracted it to class level without adding extra member field because number of instances is likely to be high for this class.

brharrington · 2021-03-29T15:28:13Z

atlas-core/src/main/scala/com/netflix/atlas/core/limiter/CardinalityLimiter.scala

+    val conf = limiterConfig.getPrefixConfig(level)
+
+    def reachLimit: Boolean = {
+      conf.totalLimit > 0 && (cardinality >= conf.totalLimit || children.size() >= conf.valueLimit)


What is a use-case where we would have totalLimit set to 0 for an inner node? Wouldn't that essentially force everything to be dropped?

0 means no limit because reachLimit return false, can be useful for perf test or if we don't want to apply limit at root level.

brharrington · 2021-03-29T15:58:27Z

atlas-core/src/main/scala/com/netflix/atlas/core/limiter/CardinalityLimiter.scala

+        if (rollupKeys.contains(k))
+          (k, RollupValue)
+        else
+          (k, v)


I think this could be refactored a bit so that the tuple passed into the map function could just be returned instead of creating a new one. Might be less readable though.

Changed to use tuple for iteration instead since it's in hot path.

brharrington · 2021-03-29T16:01:18Z

atlas-core/src/main/scala/com/netflix/atlas/core/limiter/CardinalityLimiter.scala

+  val prefixValues: Array[String] = genSearchPath()
+
+  // Not needed if drop found early in search path, so generate lazily
+  def queryKeys: Set[String] = {


Could you just declare it a lazy val?

brharrington · 2021-03-29T16:08:30Z

atlas-core/src/main/scala/com/netflix/atlas/core/util/BoundedPriorityBuffer.scala

@@ -81,4 +81,13 @@ class BoundedPriorityBuffer[T <: AnyRef](maxSize: Int, comparator: Comparator[T]
    }
    builder.result()
  }
+
+  /** Return a list containing all of the items in the buffer - preserving order by priority. */


Not sure if there is a better name (drainToOrderedList similar to BlockingQueue.drainTo?), I think we should at lease update the scaladoc comment to indicate that this will empty out the buffer and doesn't just return a copy of the elements.

brharrington · 2021-03-29T16:10:22Z

atlas-core/src/main/scala/com/netflix/atlas/core/util/CardinalityEstimator.scala

 }

 object CardinalityEstimator {

  /**
-    * Create a new estimator instance using the [CPC] algorithm.
+    * Create a new estimator instance using the [CPC] algorithm. This created estimator is NOT
+    * thread safe, use {@link newSyncEstimator} to create a thread-safe estimator.


Do we need both? Can we just make CpcEstimator thread safe?

updated, so far we need only one.

brharrington · 2021-03-29T16:11:06Z

atlas-core/src/main/scala/com/netflix/atlas/core/util/CardinalityEstimator.scala

+    private val sketch = new CpcSketch(lgK)
+    private val _cardinality = new AtomicLong()
+
+    override def update(obj: AnyRef): Unit = {


This is only thread safe for reads, correct?

Right, added some comments for this.

auto rollup or drop tags based on cardinality limits

d7fafdc

jfz force-pushed the autoRollup branch from d3b127b to d7fafdc Compare March 19, 2021 21:02

brharrington added this to the 1.7.0 milestone Mar 29, 2021

brharrington reviewed Mar 29, 2021

View reviewed changes

jfz force-pushed the autoRollup branch from 2ffd095 to bc71a51 Compare March 30, 2021 20:22

fixes based on code review

22f5e37

jfz force-pushed the autoRollup branch from bc71a51 to 22f5e37 Compare March 30, 2021 20:25

brharrington approved these changes Mar 30, 2021

View reviewed changes

brharrington merged commit 0cf9f95 into Netflix:master Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto rollup or drop tags based on cardinality limits #1280

auto rollup or drop tags based on cardinality limits #1280

jfz commented Mar 19, 2021 •

edited

Loading

brharrington left a comment

brharrington Mar 29, 2021

brharrington Mar 29, 2021

jfz Mar 30, 2021

brharrington Mar 29, 2021

brharrington Mar 29, 2021

jfz Mar 30, 2021

brharrington Mar 29, 2021

jfz Mar 30, 2021

brharrington Mar 29, 2021

jfz Mar 30, 2021 •

edited

Loading

brharrington Mar 29, 2021

brharrington Mar 29, 2021

brharrington Mar 29, 2021

jfz Mar 30, 2021

brharrington Mar 29, 2021

jfz Mar 30, 2021

auto rollup or drop tags based on cardinality limits #1280

auto rollup or drop tags based on cardinality limits #1280

Conversation

jfz commented Mar 19, 2021 • edited Loading

brharrington left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfz Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfz commented Mar 19, 2021 •

edited

Loading

jfz Mar 30, 2021 •

edited

Loading