Models can not be scaled up to 2 copies #151

haiminh2001 · 2024-09-18T04:14:46Z

Context

I am learning how the auto scaling of the model mesh works. I found this piece of docs:

Models will scale to two copies if they have been used recently regardless of the load - the autoscaling behaviour applies between 2 and N>2 copies.

It is so vague that I have to dive in the source code, and then I found this code:

                                // For 1->2 copies, scale-up can also be triggered by a pattern of recent usage
                                // See explanation of CacheEntry#usageSlices
                                if (loadedCount == 1) {
                                    // assert mr.getInstanceIds().containsKey(instanceId);

                                    int i1 = ce.earlierUseIteration, i2 = ce.lastUsedIteration;
                                    // invariants: lower < upper, i1 <= i2
                                    if (logger.isDebugEnabled()) {
                                        logger.debug("Second copy trigger evaluation for model " + modelId
                                            + ": target range [" + lower + ", " + upper + "], I1="
                                                + i1 + ", I2=" + i2 + ", curIteration=" + iterationCounter);
                                    }
                                    boolean i1inRange = false, i2inRange = false;
                                    if (i2 >= lower && i1 <= upper) {
                                        i1inRange = i1 >= lower;
                                        i2inRange = i2 <= upper;
                                    }
                                    if (i2inRange || !i1inRange) {
                                        ce.earlierUseIteration = i2;
                                    }
                                    ce.lastUsedIteration = iterationCounter;

                                    if (i1inRange || i2inRange) {
                                        // Model was used within the target range [MIN_AGE, MAX_AGE] iterations ago
                                        // so trigger loading of a second copy

                                        // Don't do it if > 90% full and cache is younger than secondCopyLruThresholdMillis
                                        if ((10 * clusterStats.totalFree) / clusterStats.totalCapacity >= 1
                                                || (now - clusterStats.globalLru) > secondCopyLruThresholdMillis) {
                                            logger.info("Attempting to add second copy of model " + modelId
                                                    + " in another instance since \"regular\" usage was detected");
                                            ensureLoadedInternalAsync(modelId, lastTime, ce.getWeight(), excludeThisInstance, 0);
                                            continue;
                                        }
                                    }
                                }

As far as I understand, the logic if a model is recently used and there is a prior usage of it falling into the interval of 40 minutes and 7 minutes before the correspond time, the model should be scaled to 2 copies.

Current behaviour

If a model is consistently used, the earlierUseIteration and lastUsedIteration will be updated continuously, to the last check time. That logic is indicated in these lines of code:

                                    if (i2inRange || !i1inRange) {
                                        ce.earlierUseIteration = i2;
                                    }
                                    ce.lastUsedIteration = iterationCounter;

i2inRange and i1inRange will never be true, since both i1 i2 will always be updated to the most recently point of time and consequently exceed the upper point.
Therefore a model has to be used once, wait for 7 minutes without receiving any requests in order to be scaled to 2 copies ( I have tested that behaviour).
Having to wait for 7 minutes

Expectation

If a model is being used consistently for over 7 minutes, that model should be scaled to 2 copies.

Suggestion

Perhaps the point of time that the oldest request that not exceed 40 minutes should be recorded instead of the earlierUseIteration. The remaining logic is the same.

The text was updated successfully, but these errors were encountered:

spolti · 2024-10-07T18:05:56Z

Hi @haiminh2001, here is the commit with more details about the motivation, not sure if you already see it, if not, it can help understand it better.
2790ef2#diff-1955f855ee9b6cf2eed89d74caca0b4a5181d856c645c4247561ac6f140aace4R503

haiminh2001 · 2024-11-21T07:08:47Z

Hi @spolti , I have read the commit before, but in short, my question is: Why using a model continuously would not trigger the second copy, but I have to use the model, wait for 7 minutes without any usage and then finally use the model again? That does not make sense to me.

Piggy-back on the existing frequent rate-tracking task to keep track of "iteration numbers" in which single-copy models are loaded, and only trigger a second copy when there's a prior usage more than 7 minutes but less than 40 minutes ago. If the usage is confined to a < 7min window it could be isolated; if > 40min apart the value of a second copy is lower (probability of pod death causing disruption is minimal).

And in the commit said: there's a prior usage more than 7 minutes but the code logic means the exact last usage, not any prior usage, that has to be in the interval of 7 and 40 minutes ago.

haiminh2001 · 2024-11-23T14:33:58Z

Hi @spolti, I fixed the logic and opened a PR. Can you have a look ? Thank you in advanced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models can not be scaled up to 2 copies #151

Models can not be scaled up to 2 copies #151

haiminh2001 commented Sep 18, 2024

spolti commented Oct 7, 2024

haiminh2001 commented Nov 21, 2024 •

edited

Loading

haiminh2001 commented Nov 23, 2024 •

edited

Loading

Models can not be scaled up to 2 copies #151

Models can not be scaled up to 2 copies #151

Comments

haiminh2001 commented Sep 18, 2024

Context

Current behaviour

Expectation

Suggestion

spolti commented Oct 7, 2024

haiminh2001 commented Nov 21, 2024 • edited Loading

haiminh2001 commented Nov 23, 2024 • edited Loading

haiminh2001 commented Nov 21, 2024 •

edited

Loading

haiminh2001 commented Nov 23, 2024 •

edited

Loading