-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staging - [Alerting] Autoscale: Minutes to scale-up from zero machine alert #9217
Comments
@premun based off the queues in question this makes me concerned about your startup changes, mind taking a peek? |
Seems like the machines are stuck in the "Initializing" state. I'll try to connect there and see what's up |
There are two suspects - my change undoubtedly, and then "Forcibly take ownership of the log folder every work item". It wasn't failing before and after this it's 100% rate: (the purple build is my change but it fails on some windows queue while all the other linux queues are rebuilt there) |
I opened https://dev.azure.com/dnceng/internal/_git/dotnet-helix-machines/pullrequest/22601 so that it runs in the meanwhile while I investigate some more |
The build is green so it is something about the ownership change that makes these go in the boot loop so I merged a revert |
💚 Metric state changed to ok
|
💔 Metric state changed to alerting
Go to rule
@dotnet/dnceng, please investigate
Automation information below, do not change
Grafana-Automated-Alert-Id-54aa0d7e647e46ff9e880bf6ae532b99
The text was updated successfully, but these errors were encountered: