Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spawning of mixed-parentage tasks. #4906

Merged
merged 18 commits into from
Jul 20, 2022

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Jun 9, 2022

(Those that have parents in some cycles but not in others).

Before spawn-on-demand, all tasks depended implicitly on their own previous instance submitting (same task at previous cycle point), which had several unpleasant consequences such as the inability of same-task instances to run out of order.

This fixes the last vestige of previous-instance dependence in Cylc 8: if a task instance has no parents to spawn it, we spawn it automatically when its previous instance is released from runahead. This (correctly) makes parentless tasks spawn out to the runahead limit immediately, but only if they have no parents in all cycle points. If they have parents at some points, those points temporarily block further spawning (because the parented instance does not get released from runahead until after it gets spawned by its parent).

This PR also tidies spawning-related code a lot (it was really hard to follow after my incremental shoe-horning of SOD into the old codebase ~2 yrs ago 🤦 :), and computes the runahead limit much less often.

[scheduling]
    cycling mode = integer
    initial cycle point = 1
    final cycle point = 10
    [[xtriggers]]
        never = xrandom("0")  # never satisfied
    [[graph]]
        # foo has no parents, should spawn to runahead limit
        P1 = "foo => bar"
        # except at point 3, where it should be spawned by baz
        R1/3 = "@never => baz => foo"
[runtime]
    [[foo, bar, baz]]
        script = sleep 10

On master, foo should auto-spawn to the runahead limit except for point 3 where it will be spawned by 3/baz, but spawning is blocked at point 3:

Screenshot from 2022-06-13 14-39-59

On this branch:
Screenshot from 2022-06-13 14-40-54

Also: close #4913

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Appropriate change log entry included.
  • (master branch) I have opened a documentation PR at Update runahead limit documentation. cylc-doc#479 (for the runahead fix)

@hjoliver hjoliver added bug Something is wrong :( sod-follow-up labels Jun 9, 2022
@hjoliver hjoliver added this to the cylc-8.0rc4 milestone Jun 9, 2022
@hjoliver hjoliver self-assigned this Jun 9, 2022
@hjoliver hjoliver force-pushed the sod-parentless-spawn branch from 83decfb to fd38b0a Compare June 12, 2022 21:27
cylc/flow/id_match.py Outdated Show resolved Hide resolved
@@ -820,7 +820,8 @@ def select_task_pool_for_restart(self, callback):
LEFT OUTER JOIN
%(task_outputs)s
ON %(task_pool)s.cycle == %(task_outputs)s.cycle AND
%(task_pool)s.name == %(task_outputs)s.name
%(task_pool)s.name == %(task_outputs)s.name AND
%(task_pool)s.flow_nums == %(task_outputs)s.flow_nums
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug fix! Loading the task pool from the DB at restart, on master I get multiple rows per task proxy if a waiting task had also run in previous flows (and hence attempt to spawn multiple copies of the same task).

@hjoliver hjoliver force-pushed the sod-parentless-spawn branch 2 times, most recently from 938e4d9 to 4c98f6a Compare June 14, 2022 08:41
@hjoliver
Copy link
Member Author

hjoliver commented Jun 14, 2022

(NB: I've just pushed a commit to fix the runahead off-by-one error #4913; this will require a minor adjustment to a whole bunch of tests... which were all passing) DONE

@hjoliver hjoliver force-pushed the sod-parentless-spawn branch 2 times, most recently from e41a656 to 5183cdd Compare June 15, 2022 23:09
@hjoliver hjoliver marked this pull request as ready for review June 16, 2022 11:18
@hjoliver hjoliver mentioned this pull request Jun 17, 2022
7 tasks
@MetRonnie MetRonnie self-requested a review June 21, 2022 10:15
Copy link
Member

@dwsutherland dwsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed, and found working (could reproduce the issue and fix) 👍
I reviewed the code, followed most of it...

cylc/flow/scheduler.py Show resolved Hide resolved
@hjoliver
Copy link
Member Author

hjoliver commented Jun 23, 2022

NB: I have one thing to investigate on this branch: it might stop the inactivity timer working properly (if so, should be easy to fix). [done]

@oliver-sanders
Copy link
Member

@hjoliver What font are you using in those terminal screenshots?

@hjoliver
Copy link
Member Author

hjoliver commented Jun 30, 2022

@hjoliver What font are you using in those terminal screenshots?

Ubuntu Mono Derivative. You like or no like?

@hjoliver
Copy link
Member Author

hjoliver commented Jul 1, 2022

@MetRonnie - this is good to go now.

@MetRonnie
Copy link
Member

I'm happy to review but will be fairly busy this upcoming week with workshops, so someone might like to beat me to it

@hjoliver
Copy link
Member Author

hjoliver commented Jul 3, 2022

That's fine (I just pinged you as you'd self-assigned as a reviewer above).

@hjoliver hjoliver modified the milestones: cylc-8.0rc4, cylc-8.0.0 Jul 5, 2022
Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

cylc/flow/cfgspec/workflow.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/task_pool.py Outdated Show resolved Hide resolved
cylc/flow/scheduler.py Show resolved Hide resolved
(Those that have parents in some cycles but not others).
Also tidy up task spawning and runahead computation.
@hjoliver hjoliver force-pushed the sod-parentless-spawn branch from c9e276c to 965d356 Compare July 9, 2022 02:35
@oliver-sanders
Copy link
Member

oliver-sanders commented Jul 11, 2022

Ubuntu Mono Derivative. You like or no like?

Me likey, is very nice for Tui.

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Halfway through...

cylc/flow/taskdef.py Outdated Show resolved Hide resolved
cylc/flow/taskdef.py Outdated Show resolved Hide resolved
CHANGES.md Outdated Show resolved Hide resolved
Comment on lines 5 to +6
[[events]]
inactivity timeout = PT10S
inactivity timeout = PT20S
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting some early aborts due to this, under load, when running many tests at once. Pretty sure not related to this branch.

@hjoliver
Copy link
Member Author

(All review comments addressed)

@hjoliver hjoliver merged commit 37ea94b into cylc:master Jul 20, 2022
@hjoliver hjoliver deleted the sod-parentless-spawn branch July 20, 2022 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Runahead limit consistency: P1 vs PT1H
4 participants