-
-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make scheduling timeout constant to be a configurable option #4886
Conversation
@guzzijones Can you expand your explanation and why you commented out this line? I will ping @m4dcoder again for feedback, I think I saw a note that he was able to reproduce it. I just had it happen many times in one of my environments. |
It prevents the action from redeploying on exit. Large json take a while to update. |
More info? |
@blag mentioned we need a unit test for this PR. before I spend a bunch of time on that can someone please at least bless the approach here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scheduler's garbage collection actively looks for action execution that is locked for long time and hasn't been released. This may be caused by a scheduler being terminated abnormally before releasing locks. The time before a lock is manually released by GC is set at the constant EXECUTION_SCHEDUELING_TIMEOUT_THRESHOLD_MS. This is currently not configurable. Since this issue is specific to the end user (large input which causes delay in scheduling), the solution should be to make EXECUTION_SCHEDUELING_TIMEOUT_THRESHOLD_MS configurable and then the end user adjust the value according to needs.
The alternative solution is to add service discovery for all the st2 components and instead of blindly releasing the lock, first look to see if the scheduler that is processing the action execution is still healthy and alive. This takes more work though. |
Thanks @m4dcoder. I will redo this pr at some point soon hopefully. That makes a lot more sense. |
Yes , this is also breaking the unit test for this method. Working on this today. I will probably have to close this request and point it to a new one as I made this change in the gui through github.com |
You don't have to close this a create a new PR, you can just: # Fetch changes from all branches from GitHub
git fetch --all
# Checkout the master branch
git checkout master
# Pull in all changes to your local master branch
git pull
# Switch back to your patch-5 branch
git checkout patch-5
# Rebase back on top of the master branch
git rebase master And then you can just continue development in this branch as normal. |
I simply cherry picked this from a working local branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the change here. Why not just remove the constant EXECUTION_SCHEDUELING_TIMEOUT_THRESHOLD_MS and then replace it with a config option at https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/scheduler/handler.py#L103? The config option if min should be float/decimal to allow for under 1 minute. The calculation self._execution_scheduling_timeout_threshold_ms should be just cfg.CONF.scheduler.execution_scheduling_timeout_threshold_m * 60 * 1000.
Is there a fixture I need to put my new scheduler config in? Everything runs locally fine with I start up stackstorm using the |
I see it |
8bd2307
to
2d5a4e2
Compare
c8d99fe
to
4b21db7
Compare
I undid the change to st2.conf.sample. I guess I am not supposed to touch that file. Someone will have to fill me in on how to handle that. It is failing the CI checks. |
6e30c5f
to
f300da9
Compare
edit comments Co-Authored-By: Eugen C. <[email protected]>
edit sample config Co-Authored-By: Eugen C. <[email protected]>
I will need some help on why the unit test for st2.conf.sample is failing. |
Run |
still failing |
"Thanks" to IDE "fixing" trailing spaces
@guzzijones If you're curious, the 4774918 was close. If you look at CI failure, - it highlighted the following diff: https://travis-ci.org/github/StackStorm/st2/jobs/675013639#L1571-L1576 where new st2.conf setting was expected to appear a few lines before its position. Computers ¯_(ツ)_/¯. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guzzijones Thanks for the contribution and everyone involved @m4dcoder @blag @punkrokk for reviews 👍
Oh wow that is specific. Thanks for the quick fixes. And thanks for all the patience. A lot of hours went into this on our end so we are glad it is merged. |
Fixes #4887
This stopped my execution queue item from being requested more than once for long running requests.
Is there a config setting for the timing of this?
I am not sure if this is needed since there is retry code in the code base.