-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise idleTime mechanism #10008
Milestone
Comments
Closed to track in AB#128 |
kian-thompson
added a commit
that referenced
this issue
Jun 16, 2022
AB#128 * Initial implementation * Use const variables so expression is easier to read * Add comments * Add to config in unit tests * Add comments * API changes * Fix unit tests * Unit test new idle timer functionality * Fix minIdleTime comment * Attempt to fix SharedTree tests * Using new min max idleTime in unit tests * Fix lint * Update BREAKING.md * Calculate idleTime based on op weights * Change default minIdleTime to 0 * Fix lint * Improve doc comments * Fix test coming from main integration * Changes to API md
ghost
pushed a commit
that referenced
this issue
Aug 24, 2022
# [AB#762](https://dev.azure.com/fluidframework/internal/_workitems/edit/762) For more information about how to contribute to this repo, visit [this page](https://github.com/microsoft/FluidFramework/blob/main/CONTRIBUTING.md). ## Description Deprecated by [AB#128](https://dev.azure.com/fluidframework/internal/_workitems/edit/128) and [#10008](#10008). ## Does this introduce a breaking change? Please move all usage to the new `minIdleTime` and `maxIdleTime` properties in `ISummaryConfigurationHeuristics`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@NicholasCouri asked in #10007
I believe it makes total sense to reduce idleTime to 10s, maybe even 5s. And data suggests that 72% of last summaries (files at rest, looking at last summary produced for such files) are done with reason = "idle" and P95 = 138 ops being summarized, so reducing this timer will reduce number of trailing ops for sure. It's possible that the need for that change will be reduced due to MaxOps change in that PR.
But the main reason why I think we should take it a bit slower is different - it's wasteful to summarize 2-3 ops frequently - it does not give us much but increases COGS and pressure on service. So I think we should think how to have much tighter idleTime when there are many unsummarized ops, and lax idleTime when we have few. I.e. it seems the best outcome is
That way it becomes less function of idleness of the product, and more a function of rate of changes.
Useful query to look at last summary stats:
let lastSummary = union Office_Fluid_FluidRuntime_*
| where Data_eventName == "fluid:telemetry:Summarizer:Running:Summarize_generate"
| summarize max(Event_Time) by Data_docId;
let end = union Office_Fluid_FluidRuntime_*
| where Data_eventName == "fluid:telemetry:Summarizer:Running:Summarize_generate"
| where Event_Time < ago(3d)
| project Event_Time, Data_docId, reason = Data_summarizeReason, Data_opsSinceLastSummary;
lastSummary | join (end) on Data_docId, $left.max_Event_Time == $right.Event_Time
| summarize count(), avg(Data_opsSinceLastSummary), percentile(Data_opsSinceLastSummary, 95) by reason
The text was updated successfully, but these errors were encountered: