-
-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Action runner output streaming single collection approach #3729
Action runner output streaming single collection approach #3729
Conversation
objects. Those objects are deleted by default after 7 days.
have been re-generated, committed and included in the changeset.
values change based on the environment where this script run. This way we ensure consistent and stable output of this script, regardgless of where it runs.
CI check which verifies that all the automatically generated files are up to date
When result of action execution contains unicode, Mistral callback fails and leads to orphaned workflow execution.
The interface for the authenticate method has changed. Fix the unit tests associated with the interface change.
Fix mistral callback failure when result contains unicode
Fix Travis caching
Fix missing ';' in travis.yml
Clean up unit tests to be consistent in load runners and actions from st2test fixtures.
On cancellation request, if the liveaction has a parent, set the status to canceling to trigger post_run for the liveaction.
Separate the imports for readability and searchability.
Fix call to keystone auth in mistral runner
Fix cancellation of delayed action execution
…tackStorm/st2 into python_runner_actions_real_time_output
@lakshmi-kannan @m4dcoder @bigmstone Let's please agree on this data model today if possible so I can finish rest of the changes and get this feature out. |
for storing the output instead of utilizing two collections. This approach makes whole thing more generic and makes it more future-proof and work with other runner which doesn't necessary have two streams (stdout, stderr). Now "type" argument on the document stores type of the output and it means each runner execution can produce as little or as many output types as it needs. Most of the existing runners will produce two type of output (stdout, stderr), but it's likely that in the future we will have other runners which will just produce one type of the output.
46ba975
to
03c37d4
Compare
tail behavior by periodically reading from this collection. | ||
Stores output of a particular execution. | ||
|
||
New document is inserted dynamically when a new chunk / line is received which means you can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this, I gather that this model just represents one "write"? Meaning, for each instance of data that an execution wishes to stream about itself, each would be a unique instance of this model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - mentioned the reason for that in the other PR (performance - updating single document is not a good idea because of how locking works).
timestamp = me.DateTimeField(required=True, default=date_utils.get_datetime_utc_now) | ||
type = me.StringField(required=True, default='output') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of using names of builtins as variable names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, will rename it to output_type
.
Do you need rebase this branch? It's including some unrelated commits/fixes from master (unicode, travis, etc.) |
execution_id: ID of the execution to which this output belongs. | ||
action_ref: Parent action reference. | ||
runner_ref: Parent action runner reference. | ||
timestamp: Timestamp when this output has been produced / received. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add an auto-increment sequence field in the collection? In case the timestamps are the same for one or more entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I will just change it to use a custom ComplexDateTimeField
field we wrote which has micro-second precision and which we use for executions.
I believe microsecond precision should be fine.
LGTM except for a few minor questions regarding unrelated commits in this branch and adding an auto increment sequence field in the ActionExecutionOutput collection. |
It was already synced with master and the other branch is based on so no idea why Github is showing this diff the way it is. |
Will merge those changes into existing PR and work from there. |
This pull request works on top off #3657 and changes two collection approach with single "generic" collection approach - 03c37d4.
For some reason the actual diff looks a bit messed up, but the change is located in 03c37d4.
It addresses the feedback people had and makes the whole approach a bit more generic so it can work with other / future runners which produce just a single output stream.
The main difference is that this approach utilizes single "action_output_d_b" collection with the following fields:
execution_id
action_ref
runner_ref
timestamp
type
data
Most fields are self explanatory, but two fields of interest are "data" and "type".
"data" contains actual partial execution output. Depending on the runner, this could either be chunk, line or similar. For all the existing runners for which we implement streaming right now this is a line. I was thinking about a different name (e.g.
chunk
), but in the end I settled withdata
.type
contains type of the output. Right now that is eitherstdout
orstderr
, but for runners with just a single output stream, this could simply beoutput
or similar. I'm open to suggestions for a better name for this field.I was thinking about "stream" / "stream_type" / "output_type", but in the end I settled with "type". If we can't come up with anything better, I will probably change it to "output_type" so it doesn't clash with Python built-in
type
function.As for now, I only implemented model changes so I won't cause unnecessary additional work for myself in case people don't agree with this approach.
Once we agree that's the model we all like, I will go ahead and implement rest of the changes.
If we agree on this approach, actual API endpoint and CLI command will look like this:
/v1/executions/<execution id>/output[?type=stdout/stderr/etc]
st2 execution tail [--type=stdout/stderr/etc]
TODO