Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make scheduling timeout constant to be a configurable option #4886

Merged
merged 9 commits into from
Apr 14, 2020

Conversation

guzzijones
Copy link
Contributor

@guzzijones guzzijones commented Mar 15, 2020

Fixes #4887

This stopped my execution queue item from being requested more than once for long running requests.
Is there a config setting for the timing of this?

I am not sure if this is needed since there is retry code in the code base.

@pull-request-size pull-request-size bot added the size/XS PR that changes 0-9 lines. Quick fix/merge. label Mar 15, 2020
@CLAassistant
Copy link

CLAassistant commented Mar 15, 2020

CLA assistant check
All committers have signed the CLA.

@blag blag requested a review from m4dcoder March 17, 2020 18:21
@punkrokk punkrokk added this to the 3.2.0 milestone Mar 21, 2020
@punkrokk
Copy link
Member

@guzzijones Can you expand your explanation and why you commented out this line? I will ping @m4dcoder again for feedback, I think I saw a note that he was able to reproduce it. I just had it happen many times in one of my environments.

@arm4b arm4b removed this from the 3.2.0 milestone Mar 21, 2020
@guzzijones
Copy link
Contributor Author

guzzijones commented Mar 22, 2020

It prevents the action from redeploying on exit. Large json take a while to update.

@guzzijones
Copy link
Contributor Author

More info?

@blag blag added this to the 3.2.0 milestone Mar 24, 2020
@guzzijones
Copy link
Contributor Author

@blag mentioned we need a unit test for this PR. before I spend a bunch of time on that can someone please at least bless the approach here.
my PR basically negates this entire function.
It probably is better to figure out why we end up here in the first place if the action is still not finished writing it's changes to the database.
I have the issue listed above if someone wants to verify we nuke this function.

@m4dcoder
Copy link
Contributor

@punkrokk I don't know where you read I was able to reproduce this. This is the first I look at this issue #4887 and the proposed solution here. I have not had a chance to review this to make the connection here between the issue and how this solution fixes it.

Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler's garbage collection actively looks for action execution that is locked for long time and hasn't been released. This may be caused by a scheduler being terminated abnormally before releasing locks. The time before a lock is manually released by GC is set at the constant EXECUTION_SCHEDUELING_TIMEOUT_THRESHOLD_MS. This is currently not configurable. Since this issue is specific to the end user (large input which causes delay in scheduling), the solution should be to make EXECUTION_SCHEDUELING_TIMEOUT_THRESHOLD_MS configurable and then the end user adjust the value according to needs.

@m4dcoder
Copy link
Contributor

The alternative solution is to add service discovery for all the st2 components and instead of blindly releasing the lock, first look to see if the scheduler that is processing the action execution is still healthy and alive. This takes more work though.

@guzzijones
Copy link
Contributor Author

Thanks @m4dcoder. I will redo this pr at some point soon hopefully. That makes a lot more sense.

@guzzijones
Copy link
Contributor Author

Yes , this is also breaking the unit test for this method. Working on this today. I will probably have to close this request and point it to a new one as I made this change in the gui through github.com

@blag
Copy link
Contributor

blag commented Mar 25, 2020

You don't have to close this a create a new PR, you can just:

# Fetch changes from all branches from GitHub
git fetch --all
# Checkout the master branch
git checkout master
# Pull in all changes to your local master branch
git pull
# Switch back to your patch-5 branch
git checkout patch-5
# Rebase back on top of the master branch
git rebase master

And then you can just continue development in this branch as normal.

@guzzijones
Copy link
Contributor Author

I simply cherry picked this from a working local branch.
I am not sure why it thinks the option doesn't exist. I will have to double check.
I see this error in the unit tests:
raise NoSuchOptError(opt_name, group) NoSuchOptError: no such option in group scheduler: execution_scheduling_timeout_threshold_m

Copy link
Contributor

@m4dcoder m4dcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the change here. Why not just remove the constant EXECUTION_SCHEDUELING_TIMEOUT_THRESHOLD_MS and then replace it with a config option at https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/scheduler/handler.py#L103? The config option if min should be float/decimal to allow for under 1 minute. The calculation self._execution_scheduling_timeout_threshold_ms should be just cfg.CONF.scheduler.execution_scheduling_timeout_threshold_m * 60 * 1000.

st2actions/st2actions/scheduler/config.py Outdated Show resolved Hide resolved
st2actions/st2actions/scheduler/handler.py Outdated Show resolved Hide resolved
st2actions/st2actions/scheduler/handler.py Outdated Show resolved Hide resolved
st2actions/st2actions/scheduler/handler.py Outdated Show resolved Hide resolved
st2actions/st2actions/scheduler/handler.py Outdated Show resolved Hide resolved
@m4dcoder m4dcoder changed the title Disabled removing lock for orphanded Make scheduling timeout constant to be a configurable option Apr 8, 2020
@pull-request-size pull-request-size bot added size/S PR that changes 10-29 lines. Very easy to review. and removed size/XS PR that changes 0-9 lines. Quick fix/merge. labels Apr 8, 2020
@guzzijones
Copy link
Contributor Author

Is there a fixture I need to put my new scheduler config in? Everything runs locally fine with I start up stackstorm using the launch.sh start command.

@guzzijones
Copy link
Contributor Author

I see it

@guzzijones guzzijones force-pushed the patch-5 branch 2 times, most recently from 8bd2307 to 2d5a4e2 Compare April 8, 2020 16:46
@guzzijones guzzijones force-pushed the patch-5 branch 2 times, most recently from c8d99fe to 4b21db7 Compare April 13, 2020 16:31
@guzzijones
Copy link
Contributor Author

I undid the change to st2.conf.sample. I guess I am not supposed to touch that file. Someone will have to fill me in on how to handle that. It is failing the CI checks.

conf/st2.conf.sample Outdated Show resolved Hide resolved
conf/st2.conf.sample Outdated Show resolved Hide resolved
CHANGELOG.rst Outdated Show resolved Hide resolved
@guzzijones guzzijones force-pushed the patch-5 branch 2 times, most recently from 6e30c5f to f300da9 Compare April 14, 2020 15:15
@guzzijones
Copy link
Contributor Author

I will need some help on why the unit test for st2.conf.sample is failing.

@m4dcoder
Copy link
Contributor

Run make configgen to update st2.conf.sample and then push the changes as commit.

@guzzijones
Copy link
Contributor Author

Run make configgen to update st2.conf.sample and then push the changes as commit.

still failing

@arm4b
Copy link
Member

arm4b commented Apr 14, 2020

st2.conf.sample looks good now.

@guzzijones If you're curious, the 4774918 was close. If you look at CI failure, - it highlighted the following diff: https://travis-ci.org/github/StackStorm/st2/jobs/675013639#L1571-L1576 where new st2.conf setting was expected to appear a few lines before its position. Computers ¯_(ツ)_/¯.

Copy link
Member

@arm4b arm4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guzzijones Thanks for the contribution and everyone involved @m4dcoder @blag @punkrokk for reviews 👍

@arm4b arm4b merged commit cc59af7 into StackStorm:master Apr 14, 2020
@guzzijones
Copy link
Contributor Author

Oh wow that is specific. Thanks for the quick fixes. And thanks for all the patience. A lot of hours went into this on our end so we are glad it is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement size/S PR that changes 10-29 lines. Very easy to review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

large json input object cause double run tasks
6 participants