Orquesta workflows have incorrect status #4740

bishopbm1 · 2019-07-12T14:55:41Z

SUMMARY

When using orquesta workflows if a task fails but still publishes content the workflow is marked as succeeded. This happens if there is no when clause and when there is a when failed clause.

STACKSTORM VERSION

# st2 --version
st2 3.1.0, on Python 2.7.5

OS, environment, install method

# cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
ID="rhel"

Install method is Puppet forge module

Steps to reproduce the problem

Using the examples.orquesta-basic workflow. We can need to modify the workflow to remove the when: <% succeeded() %> or modify it to be when: <% succeeded() or failed() %>

Workflow:

version: 1.0

description: A basic workflow that runs an arbitrary linux command.

input:
  - cmd
  - timeout

tasks:
  task1:
    action: core.local cmd=<% ctx(cmd) %> timeout=<% ctx(timeout) %>
    next:
      - when: <% succeeded() or failed() %>
        publish:
          - stdout: <% result().stdout %>
          - stderr: <% result().stderr %>

output:
  - stdout: <% ctx(stdout) %>

or:

version: 1.0

description: A basic workflow that runs an arbitrary linux command.

input:
  - cmd
  - timeout

tasks:
  task1:
    action: core.local cmd=<% ctx(cmd) %> timeout=<% ctx(timeout) %>
    next:
      - publish:
          - stdout: <% result().stdout %>
          - stderr: <% result().stderr %>

output:
  - stdout: <% ctx(stdout) %>

Action file was unaltered:

---
name: orquesta-basic
pack: examples
description: Run a local linux command
runner_type: orquesta
entry_point: workflows/orquesta-basic.yaml
enabled: true
parameters:
  cmd:
    required: true
    type: string
  timeout:
    type: integer
    default: 60

can be forced to fail by running invalid command:

st2 run examples.orquesta-basic cmd="secho 'hi'"

Expected Results

I would expect the results of the workflow to be failed when the task fails unless manually overridden.

Actual Results

The workflow says succeeded even though the single task failed. Looking at the logs it looks like a second subtask is formed for the publish method that succeeds and send that to the workflow.

# st2 run examples.orquesta-basic cmd="secho 'hi'"
..
id: 5d289d9e49b3d2af833a29a9
action.ref: examples.orquesta-basic
parameters: 
  cmd: secho 'hi'
status: succeeded
start_timestamp: Fri, 12 Jul 2019 14:47:58 UTC
end_timestamp: Fri, 12 Jul 2019 14:48:02 UTC
result: 
  output:
    stdout: ''
+--------------------------+---------------------+-------+------------+-------------------------------+
| id                       | status              | task  | action     | start_timestamp               |
+--------------------------+---------------------+-------+------------+-------------------------------+
| 5d289da049b3d2af5f26fa4f | failed (1s elapsed) | task1 | core.local | Fri, 12 Jul 2019 14:48:00 UTC |
+--------------------------+---------------------+-------+------------+-------------------------------+

Added debugging logs to:
https://github.com/StackStorm/orquesta/blob/master/orquesta/conducting.py#L624
and produced the following logs showing the second subtask.

2019-07-12 09:38:43,757 140001394854000 DEBUG conducting [-] Conductor Event Status: [failed]
2019-07-12 09:38:43,757 140001394854000 DEBUG conducting [-] Conductor staged task: None
2019-07-12 09:38:43,758 140001394854000 DEBUG conducting [-] Conductor task state entry: {u'status': u'running', u'route': 0, u'next': {}, u'ctxs': {u'in': [0]}, u'prev': {}, u'id': u'task1'}
2019-07-12 09:38:43,758 140001394854000 DEBUG conducting [-] Conductor task state entry: {u'status': u'running', u'route': 0, u'next': {}, u'ctxs': {u'in': [0]}, u'prev': {}, u'id': u'task1'}
2019-07-12 09:38:43,759 140001394854000 DEBUG conducting [-] Conductor task state idx: 0
2019-07-12 09:38:43,759 140001394854000 DEBUG conducting [-] Conductor task state entry: {u'status': u'running', u'route': 0, u'next': {}, u'ctxs': {u'in': [0]}, u'prev': {}, u'id': u'task1'}
2019-07-12 09:38:43,760 140001394854000 DEBUG conducting [-] Conductor task state idx: 0
2019-07-12 09:38:43,761 140001394854000 DEBUG conducting [-] Conductor staged task: None
2019-07-12 09:38:43,761 140001394854000 DEBUG conducting [-] Conductor old task status: running
2019-07-12 09:38:43,762 140001394854000 DEBUG conducting [-] Conductor new task status: failed
2019-07-12 09:38:43,770 140001394854000 DEBUG conducting [-] Conductor Event Status: [succeeded]
2019-07-12 09:38:43,771 140001394854000 DEBUG conducting [-] Conductor staged task: {'ready': True, 'route': 0, 'ctxs': {'in': [0, 1]}, 'prev': {u'task1__t0': 0}, 'id': u'noop'}
2019-07-12 09:38:43,771 140001394854000 DEBUG conducting [-] Conductor task state entry: None
2019-07-12 09:38:43,772 140001394854000 DEBUG conducting [-] Conductor task state entry: {'ctxs': {'in': [0, 1]}, 'route': 0, 'prev': {u'task1__t0': 0}, 'id': u'noop', 'next': {}}
2019-07-12 09:38:43,772 140001394854000 DEBUG conducting [-] Conductor task state idx: 1
2019-07-12 09:38:43,773 140001394854000 DEBUG conducting [-] Conductor task state entry: {'ctxs': {'in': [0, 1]}, 'route': 0, 'prev': {u'task1__t0': 0}, 'id': u'noop', 'next': {}}
2019-07-12 09:38:43,774 140001394854000 DEBUG conducting [-] Conductor task state idx: 1
2019-07-12 09:38:43,774 140001394854000 DEBUG conducting [-] Conductor staged task: {'ready': True, 'route': 0, 'ctxs': {'in': [0, 1]}, 'prev': {u'task1__t0': 0}, 'id': u'noop'}
2019-07-12 09:38:43,775 140001394854000 DEBUG conducting [-] Conductor old task status: null
2019-07-12 09:38:43,775 140001394854000 DEBUG conducting [-] Conductor new task status: succeeded
2019-07-12 09:38:43,793 140001394854000 DEBUG workflows [-] [5d288d6149b3d2af833a2991] Conductor: {'state': {'status': 'succeeded', 'tasks': {u'task1__r0': 0, u'noop__r0': 1}, 'sequence': [{u'status': 'failed', 'term': True, u'route': 0, u'next': {u'noop__t0': True}, u'ctxs': {u'in': [0], 'out': {u'noop__t0': 1}}, u'prev': {}, u'id': u'task1'}, {'status': 'succeeded', 'term': True, 'route': 0, 'next': {}, 'ctxs': {'in': [0, 1]}, 'prev': {u'task1__t0': 0}, 'id': u'noop'}], 'contexts': [{u'parent': {u'user': u'test', u'pack': u'examples'}, u'cmd': u"secho 'hi'", u'st2': {u'workflow_execution_id': u'5d288d6249b3d2ae94b89c73', u'pack': u'examples', u'action_execution_id': u'5d288d6149b3d2af833a2991', u'api_url': u'http://127.0.0.1:9101/v1', u'user': u'test'}, u'timeout': 60}, {u'stdout': {u'failed': True, u'stderr': u'bash: secho: command not found', u'return_code': 127, u'succeeded': False, u'stdout': u''}}], 'staged': [], 'routes': [[]]}, 'errors': [{'message': 'Execution failed. See result for details.', 'type': 'error', 'result': {u'succeeded': False, u'failed': True, u'return_code': 127, u'stderr': u'bash: secho: command not found', u'stdout': u''}, 'task_id': u'task1'}], 'log': [], 'context': {u'parent': {u'user': u'test', u'pack': u'examples'}, u'st2': {u'workflow_execution_id': u'5d288d6249b3d2ae94b89c73', u'user': u'test', u'action_execution_id': u'5d288d6149b3d2af833a2991', u'api_url': u'http://127.0.0.1:9101/v1', u'pack': u'examples'}}, 'input': {u'cmd': u"secho 'hi'", u'timeout': 60}, 'graph': {'directed': True, 'graph': [], 'nodes': [{'id': u'task1'}, {'id': u'noop'}], 'adjacency': [[{u'ref': 0, u'id': u'noop', u'key': 0, u'criteria': []}], []], 'multigraph': True}, 'spec': {'catalog': 'native', 'version': '1.0', 'spec': {u'input': [u'cmd', u'timeout'], u'tasks': {u'task1': {u'action': u'core.local cmd=<% ctx(cmd) %> timeout=<% ctx(timeout) %>', u'next': [{u'publish': [{u'stdout': u'<% result() %>'}]}]}}, u'description': u'A basic workflow that runs an arbitrary linux command.', u'version': 1.0, u'output': [{u'stdout': u'<% ctx(stdout) %>'}]}}, 'output': {u'stdout': {u'failed': True, u'stderr': u'bash: secho: command not found', u'return_code': 127, u'succeeded': False, u'stdout': u''}}}
2019-07-12 09:38:43,795 140001394854000 INFO workflows [-] [5d288d6149b3d2af833a2991] Updating workflow execution from status "running" to "succeeded".
2019-07-12 09:38:43,857 140001394854000 DEBUG workflows [-] [5d288d6149b3d2af833a2991] Updating workflow liveaction from status "running" to "succeeded".

The text was updated successfully, but these errors were encountered:

nmaludy · 2019-07-12T15:05:46Z

@m4dcoder this is causing us issues because we're expecting the workflow to fail since the task failed.

However, when we publish a variable to return data to the calling workflow, this automatically marks the workflow as succeeding.

namachieli · 2019-07-16T22:20:38Z

I'm seeing similar, but opposite. I have a workflow that should Succeed, but fails when using publish: and do: noop with no when:

Tasks fail, and cause the workflow to fail, even though every task has do: noop

m4dcoder · 2019-07-16T22:36:01Z

I'm seeing similar, but opposite. I have a workflow that should Succeed, but fails when using publish: and do: noop with no when:

@namachieli Please open a separate issue.

m4dcoder · 2019-07-16T22:40:54Z

@nmaludy This behavior is expected. When a task failed and there is a task transition (an entry under next), then the task is considered remediated. When a task is remediated, the workflow will continue. If you want the workflow to fail, you have to explicitly tell the workflow to fail with the engine command.

nmaludy · 2019-07-16T22:47:45Z

@m4dcoder well that's confusing and inconsistent with the way Mistral work (that's fine).. would be nice if this was called out somewhere

It will also cause us to duplicate our next blocks, one for success, one for failure just like Mistral

nmaludy · 2019-07-16T22:49:09Z

I also think (if I remember correctly) this only happens when we do a publish, if there is no publish then it fails as expected

m4dcoder · 2019-07-16T22:52:44Z

well that's confusing and inconsistent with the way Mistral work

I really don't care about how Mistral works at this point.

m4dcoder · 2019-07-16T22:53:07Z

We can update our documentation so this is clearer.

nmaludy · 2019-07-16T22:58:26Z

so, basically in any task that contains any transition at all we also need another transition that is:

- when: failed()
   do: fail

then copy that over/over?

m4dcoder · 2019-07-16T23:02:15Z

I also think (if I remember correctly) this only happens when we do a publish, if there is no publish then it fails as expected

That's true. The rule here is that if there's an entry under next with any combination of the fields, then it's considered remediated. So even if there's just a publish, it's considered remediated.

It will also cause us to duplicate our next blocks, one for success, one for failure just like Mistral

Yes, you will have to explicitly do that. We are also working on a proposal at StackStorm/orquesta#116. I believe this will help where you can publish some vars regardless of task completion status and then nest other whens to say do this and that.

then copy that over/over

You have to provide a more concrete use case. You can conceptually do this..

  next:
    - publish: blahblahblah
      do: fail

m4dcoder · 2019-07-16T23:39:33Z

@nmaludy Per conversation on https://stackstorm-community.slack.com/team/U4H00PLGP, we'll have to explore this a bit more. Regarding your proposal for this use case, when you have a task that fails and no transitions are executed, that the workflow itself should fail. By no transitions are executed, 1) no task transition matched condition or 2) when condition of one or more task transitions are met but there are no tasks to execute next.

m4dcoder · 2019-07-17T00:13:07Z

From initial assessment, changes need to happen in the state machine where the task is evaluated if remediation has occurred at https://github.com/StackStorm/orquesta/blob/master/orquesta/machines.py#L657. In this case, 1) if no task transition matched condition or 2) when condition of one or more task transitions are met but there are no tasks to execute next, then the task is not considered remediated.

jinpingh added the bug label Jul 12, 2019

arm4b added the workflows: orquesta label Jul 15, 2019

m4dcoder added enhancement and removed bug labels Jul 17, 2019

m4dcoder self-assigned this Jul 17, 2019

m4dcoder mentioned this issue Jul 23, 2019

Add new engine command named "continue" StackStorm/orquesta#172

Merged

m4dcoder closed this as completed in StackStorm/orquesta#172 Jul 24, 2019

arm4b mentioned this issue Apr 17, 2020

StackStorm v3.2dev pre-release Testing StackStorm/community#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orquesta workflows have incorrect status #4740

Orquesta workflows have incorrect status #4740

bishopbm1 commented Jul 12, 2019

nmaludy commented Jul 12, 2019

namachieli commented Jul 16, 2019

m4dcoder commented Jul 16, 2019

m4dcoder commented Jul 16, 2019

nmaludy commented Jul 16, 2019

nmaludy commented Jul 16, 2019

m4dcoder commented Jul 16, 2019 •

edited

Loading

m4dcoder commented Jul 16, 2019

nmaludy commented Jul 16, 2019

m4dcoder commented Jul 16, 2019

m4dcoder commented Jul 16, 2019

m4dcoder commented Jul 17, 2019

Orquesta workflows have incorrect status #4740

Orquesta workflows have incorrect status #4740

Comments

bishopbm1 commented Jul 12, 2019

SUMMARY

STACKSTORM VERSION

OS, environment, install method

Steps to reproduce the problem

Expected Results

Actual Results

nmaludy commented Jul 12, 2019

namachieli commented Jul 16, 2019

m4dcoder commented Jul 16, 2019

m4dcoder commented Jul 16, 2019

nmaludy commented Jul 16, 2019

nmaludy commented Jul 16, 2019

m4dcoder commented Jul 16, 2019 • edited Loading

m4dcoder commented Jul 16, 2019

nmaludy commented Jul 16, 2019

m4dcoder commented Jul 16, 2019

m4dcoder commented Jul 16, 2019

m4dcoder commented Jul 17, 2019

m4dcoder commented Jul 16, 2019 •

edited

Loading