Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up _remote_background_indep_poll tests #4433

Closed
hjoliver opened this issue Sep 27, 2021 · 1 comment · Fixed by #4465 or #4489
Closed

speed up _remote_background_indep_poll tests #4433

hjoliver opened this issue Sep 27, 2021 · 1 comment · Fixed by #4465 or #4489
Labels
could be better Not exactly a bug, but not ideal.
Milestone

Comments

@hjoliver
Copy link
Member

hjoliver commented Sep 27, 2021

This test batch times out a lot. Taking one particular run as an example (not sure if it is representative or not):

The following tests failed at first attempt:

Test Summary Report
-------------------
tests/k/cylc-poll/16-execution-time-limit.t                                    (Wstat: 0 Tests: 4 Failed: 3)
  Failed tests:  2-4
tests/f/events/33-task-event-job-logs-retrieve-3.t                             (Wstat: 0 Tests: 5 Failed: 2)
  Failed tests:  3-4
tests/f/intelligent-host-selection/03-polling.t                                (Wstat: 0 Tests: 4 Failed: 1)
  Failed test:  2
tests/f/job-submission/17-remote-localtime.t                                   (Wstat: 0 Tests: 2 Failed: 1)
  Failed test:  2
tests/f/restart/26-remote-kill.t                                               (Wstat: 0 Tests: 5 Failed: 3)
  Failed tests:  2-3, 5
tests/f/cylc-cat-log/11-remote-retrieve.t                                      (Wstat: 0 Tests: 7 Failed: 1)
  Failed test:  2
tests/f/platforms/02-host-to-platform-upgrade.t                                (Wstat: 0 Tests: 6 Failed: 1)
  Failed test:  5
tests/f/database/03-remote.t                                                   (Wstat: 0 Tests: 3 Failed: 1)
  Failed test:  2
tests/f/events/11-cycle-task-event-job-logs-retrieve.t                         (Wstat: 0 Tests: 3 Failed: 2)
  Failed tests:  2-3
tests/f/job-submission/09-activity-log-host-bad-submit.t                       (Wstat: 0 Tests: 2 Failed: 1)
  Failed test:  2
tests/f/job-submission/02-job-nn-remote-host.t                                 (Wstat: 0 Tests: 2 Failed: 1)
  Failed test:  2
tests/f/remote/03-polled-task-started.t                                        (Wstat: 0 Tests: 4 Failed: 1)
  Failed test:  2
tests/f/job-submission/13-tidy-submits-of-prev-run-remote-host.t               (Wstat: 0 Tests: 11 Failed: 3)
  Failed tests:  2, 4, 7
tests/f/events/32-task-event-job-logs-retrieve-2.t                             (Wstat: 0 Tests: 5 Failed: 1)
  Failed test:  2
tests/f/cylc-ping/04-check-keys-remote.t                                       (Wstat: 0 Tests: 5 Failed: 2)
  Failed tests:  2, 5
tests/f/events/17-task-event-job-logs-retrieve-command.t                       (Wstat: 0 Tests: 3 Failed: 2)
  Failed tests:  2-3
tests/f/events/10-task-event-job-logs-retrieve.t                               (Wstat: 0 Tests: 4 Failed: 3)
  Failed tests:  2-4
tests/f/remote/06-poll.t                                                       (Wstat: 0 Tests: 4 Failed: 1)
  Failed test:  2
tests/f/cylc-clean/01-remote.t                                                 (Wstat: 0 Tests: 10 Failed: 1)
  Failed test:  2
tests/f/cylc-cat-log/10-remote-no-retrieve.t                                   (Wstat: 0 Tests: 5 Failed: 1)
  Failed test:  2
tests/f/cylc-cat-log/08-editor-remote.t                                        (Wstat: 0 Tests: 16 Failed: 1)
  Failed test:  2
Files=85, Tests=141, 1688 wallclock secs ( 0.23 usr  0.14 sys + 1179.33 cusr 56.51 csys = 1236.21 CPU)
Result: FAIL

Then "rerun failed tests" timeout. From the output, all of the above tests passed, except no indication that tests/f/cylc-cat-log/08-editor-remote.t had started:

10m 13s
Run # re-run failed tests providing that they didn't time out first time
tr: write error: Broken pipe
tr: write error
[05:06:04] tests/f/cylc-ping/04-check-keys-remote.t .......................... 
/usr/bin/lsof
1..5
INSTALLED cylctb-20210926T233542Z-Cg5R/f/cylc-ping/04-check-keys-remote from /tmp/tmp.A9tFvRg7kX/cylctb-20210926T233542Z-Cg5R/f/cylc-ping/04-check-keys-remote
ok 1 - 04-check-keys-remote-validate
ok 2 - 04-check-keys-remote-run
ok 3 - job-find-out-cmp-ok
ok 4 - grep-ok: log-grep-ok
ok 5 - find.out-cmp-ok
ok    21475 ms ( 0.00 usr  0.00 sys +  8.70 cusr  0.98 csys =  9.68 CPU)
[05:06:24] tests/f/job-submission/02-job-nn-remote-host.t .................... 
/usr/bin/lsof
1..2
INSTALLED cylctb-20210926T233542Z-Cg5R/f/job-submission/02-job-nn-remote-host from /tmp/tmp.DmkA9OgmmE/cylctb-20210926T233542Z-Cg5R/f/job-submission/02-job-nn-remote-host
ok 1 - 02-job-nn-remote-host-validate
ok 2 - 02-job-nn-remote-host-restart
ok    19857 ms ( 0.00 usr  0.00 sys +  8.19 cusr  0.90 csys =  9.09 CPU)
[05:06:51] tests/f/platforms/02-host-to-platform-upgrade.t ................... 
/usr/bin/lsof
1..6
INSTALLED cylctb-20210926T233542Z-Cg5R/f/platforms/02-host-to-platform-upgrade from /tmp/tmp.SX43nEg4UX/cylctb-20210926T233542Z-Cg5R/f/platforms/02-host-to-platform-upgrade
ok 1 - 02-host-to-platform-upgrade-validate-fail
ok 2 - 02-host-to-platform-upgrade-run
ok 3 - grep-ok: 02-host-to-platform-upgrade-run.stderr-grep-ok
ok 4 - grep-ok: 02-host-to-platform-upgrade-run.stderr-grep-ok
ok 5 - 02-host-to-platform-upgrade-run
ok 6 - grep-ok: hosts.log-grep-ok
ok    27239 ms ( 0.00 usr  0.00 sys + 10.72 cusr  1.22 csys = 11.94 CPU)
[05:08:09] tests/f/events/33-task-event-job-logs-retrieve-3.t ................ 
/usr/bin/lsof
1..5
INSTALLED cylctb-20210926T233542Z-Cg5R/f/events/33-task-event-job-logs-retrieve-3 from /tmp/tmp.PrcpOmbdhO/cylctb-20210926T233542Z-Cg5R/f/events/33-task-event-job-logs-retrieve-3
ok 1 - 33-task-event-job-logs-retrieve-3-validate
ok 2 - 33-task-event-job-logs-retrieve-3-run
ok 3 - edited-activities.log-cmp-ok
ok 4 - job.out-file-exists-ok
ok 5 - job.err-file-exists-fail
ok    78578 ms ( 0.00 usr  0.00 sys +  7.50 cusr  0.85 csys =  8.35 CPU)
[05:08:30] tests/f/remote/06-poll.t .......................................... 
/usr/bin/lsof
1..4
INSTALLED cylctb-20210926T233542Z-Cg5R/f/remote/06-poll from /tmp/tmp.Xj7hqrBFjC/cylctb-20210926T233542Z-Cg5R/f/remote/06-poll
ok 1 - 06-poll-validate
ok 2 - 06-poll-run
scanning for '\[foo.1\] status=submitted: (p'.: succeeded
ok 3 - 06-poll-poll-1
scanning for '\[foo.1\] status=succeeded: (p'.: succeeded
ok 4 - 06-poll-poll-2
ok    20579 ms ( 0.00 usr  0.00 sys +  8.33 cusr  0.91 csys =  9.24 CPU)
[05:09:03] tests/f/events/11-cycle-task-event-job-logs-retrieve.t ............ 
/usr/bin/lsof
1..3
INSTALLED cylctb-20210926T233542Z-Cg5R/f/events/11-cycle-task-event-job-logs-retrieve from /tmp/tmp.1xOaETiIpm/cylctb-20210926T233542Z-Cg5R/f/events/11-cycle-task-event-job-logs-retrieve
ok 1 - 11-cycle-task-event-job-logs-retrieve-validate
ok 2 - 11-cycle-task-event-job-logs-retrieve-run
ok 3 - edited-activities.log-cmp-ok
ok    32766 ms ( 0.00 usr  0.00 sys +  7.86 cusr  0.96 csys =  8.82 CPU)
[05:09:37] tests/f/remote/03-polled-task-started.t ........................... 
/usr/bin/lsof
1..4
INSTALLED cylctb-20210926T233542Z-Cg5R/f/remote/03-polled-task-started from /tmp/tmp.nIdqqkdByf/cylctb-20210926T233542Z-Cg5R/f/remote/03-polled-task-started
ok 1 - 03-polled-task-started-validate
ok 2 - 03-polled-task-started-run
ok 3 - grep-ok: job-activity.log-grep-ok
ok 4 - grep-ok: job-activity.log-grep-ok
ok    33886 ms ( 0.01 usr  0.00 sys +  7.64 cusr  1.01 csys =  8.66 CPU)
[05:09:59] tests/k/cylc-poll/16-execution-time-limit.t ....................... 
/usr/bin/lsof
1..4
INSTALLED cylctb-20210926T233542Z-Cg5R/k/cylc-poll/16-execution-time-limit from /tmp/tmp.kl038dhiVB/cylctb-20210926T233542Z-Cg5R/k/cylc-poll/16-execution-time-limit
ok 1 - 16-execution-time-limit-validate
ok 2 - 16-execution-time-limit-run
ok 3 - 16-execution-time-limit-log-entry
ok 4 - 16-execution-time-limit-poll-time
ok    22695 ms ( 0.00 usr  0.00 sys +  8.26 cusr  0.97 csys =  9.23 CPU)
[05:10:19] tests/f/events/32-task-event-job-logs-retrieve-2.t ................ 
/usr/bin/lsof
1..5
INSTALLED cylctb-20210926T233542Z-Cg5R/f/events/32-task-event-job-logs-retrieve-2 from /tmp/tmp.OCUSm5EdvA/cylctb-20210926T233542Z-Cg5R/f/events/32-task-event-job-logs-retrieve-2
ok 1 - 32-task-event-job-logs-retrieve-2-validate
ok 2 - 32-task-event-job-logs-retrieve-2-run
ok 3 - edited-activities.log-cmp-ok
ok 4 - job.out-file-exists-ok
ok 5 - job.err-file-exists-fail
ok    19819 ms ( 0.00 usr  0.00 sys +  7.68 cusr  0.82 csys =  8.50 CPU)
[05:10:39] tests/f/database/03-remote.t ...................................... 
/usr/bin/lsof
1..3
INSTALLED cylctb-20210926T233542Z-Cg5R/f/database/03-remote from /tmp/tmp.CzJLwrfvbn/cylctb-20210926T233542Z-Cg5R/f/database/03-remote
ok 1 - 03-remote-validate
ok 2 - 03-remote-run
ok 3 - select-task-jobs.out-cmp-ok
ok    19379 ms ( 0.00 usr  0.00 sys +  8.11 cusr  0.98 csys =  9.09 CPU)
[05:11:10] tests/f/intelligent-host-selection/03-polling.t ................... 
/usr/bin/lsof
1..4
INSTALLED cylctb-20210926T233542Z-Cg5R/f/intelligent-host-selection/03-polling from /tmp/tmp.rQbY4qMdQs/cylctb-20210926T233542Z-Cg5R/f/intelligent-host-selection/03-polling
ok 1 - 03-polling-validate
ok 2 - 03-polling-run
ok 3 - grep-ok: job poll fails
ok 4 - grep-ok: job poll retries & succeeds
ok    31119 ms ( 0.00 usr  0.00 sys +  9.10 cusr  1.29 csys = 10.39 CPU)
[05:11:50] tests/f/job-submission/13-tidy-submits-of-prev-run-remote-host.t .. 
/usr/bin/lsof
1..11
INSTALLED cylctb-20210926T233542Z-Cg5R/f/job-submission/13-tidy-submits-of-prev-run-remote-host from /tmp/tmp.XXY5UZwfuC/cylctb-20210926T233542Z-Cg5R/f/job-submission/13-tidy-submits-of-prev-run-remote-host
ok 1 - 13-tidy-submits-of-prev-run-remote-host-validate
ok 2 - 13-tidy-submits-of-prev-run-remote-host-run
ok 3 - exists-rlogd1
ok 4 - exists-rlogd2
ok 5 - 01-file-exists-ok
ok 6 - 02-file-exists-ok
ok 7 - 13-tidy-submits-of-prev-run-remote-host-run
ok 8 - exists-rlogd1
ok 9 - not-exists-rlogd2
ok 10 - 01-file-exists-ok
ok 11 - 02-file-exists-fail
ok    40048 ms ( 0.01 usr  0.00 sys +  9.50 cusr  1.12 csys = 10.63 CPU)
[05:12:15] tests/f/cylc-clean/01-remote.t .................................... 
/usr/bin/lsof
1..10
INSTALLED cylctb-20210926T233542Z-Cg5R/f/cylc-clean/01-remote from /tmp/tmp.ppqSS5Auaq/cylctb-20210926T233542Z-Cg5R/f/cylc-clean/01-remote
ok 1 - 01-remote-validate
ok 2 - 01-remote-run
ok 3 - run-dir-readlink-pre-clean.remote.stdout-cmp-ok
ok 4 - test-dir-tree-pre-clean.remote
ok 5 - test-dir-tree-pre-clean.remote.stdout-cmp-ok
ok 6 - cylc-clean
ok 7 - run-dir-not-exist-post-clean.local
ok 8 - run-dir-not-exist-post-clean.remote
ok 9 - test-dir-tree-post-clean.remote
ok 10 - test-dir-tree-post-clean.remote.stdout-cmp-ok
ok    25275 ms ( 0.00 usr  0.00 sys + 10.34 cusr  1.22 csys = 11.56 CPU)
[05:12:53] tests/f/events/10-task-event-job-logs-retrieve.t .................. 
/usr/bin/lsof
1..4
INSTALLED cylctb-20210926T233542Z-Cg5R/f/events/10-task-event-job-logs-retrieve from /tmp/tmp.y4QdswG2Ce/cylctb-20210926T233542Z-Cg5R/f/events/10-task-event-job-logs-retrieve
ok 1 - 10-task-event-job-logs-retrieve-validate
ok 2 - 10-task-event-job-logs-retrieve-run
ok 3 - edited-activities.log-cmp-ok
ok 4 - edited-log-cmp-ok
ok    37900 ms ( 0.00 usr  0.01 sys +  7.99 cusr  0.95 csys =  8.95 CPU)
[05:13:13] tests/f/job-submission/17-remote-localtime.t ...................... 
/usr/bin/lsof
1..2
INSTALLED cylctb-20210926T233542Z-Cg5R/f/job-submission/17-remote-localtime from /tmp/tmp.zlVTkA7PXa/cylctb-20210926T233542Z-Cg5R/f/job-submission/17-remote-localtime
ok 1 - 17-remote-localtime-validate
ok 2 - 17-remote-localtime-run
ok    20350 ms ( 0.00 usr  0.00 sys +  7.49 cusr  0.83 csys =  8.32 CPU)
[05:13:39] tests/f/restart/26-remote-kill.t .................................. 
/usr/bin/lsof
1..5
INSTALLED cylctb-20210926T233542Z-Cg5R/f/restart/26-remote-kill from /tmp/tmp.Qq2nlXTOJ0/cylctb-20210926T233542Z-Cg5R/f/restart/26-remote-kill
ok 1 - 26-remote-kill-validate
ok 2 - 26-remote-kill-run
ok 3 - t1-status.out-cmp-ok
ok 4 - 26-remote-kill-restart
ok 5 - t1-status.out-cmp-ok
ok    25648 ms ( 0.00 usr  0.00 sys +  9.54 cusr  1.06 csys = 10.60 CPU)
[05:14:12] tests/f/events/17-task-event-job-logs-retrieve-command.t .......... 
/usr/bin/lsof
1..3
INSTALLED cylctb-20210926T233542Z-Cg5R/f/events/17-task-event-job-logs-retrieve-command from /tmp/tmp.9cAxEV3Shc/cylctb-20210926T233542Z-Cg5R/f/events/17-task-event-job-logs-retrieve-command
ok 1 - 17-task-event-job-logs-retrieve-command-validate
ok 2 - 17-task-event-job-logs-retrieve-command-run
ok 3 - my-rsync.log.edited-cmp-ok
ok    32758 ms ( 0.01 usr  0.00 sys +  7.91 cusr  0.94 csys =  8.86 CPU)
[05:14:31] tests/f/job-submission/09-activity-log-host-bad-submit.t .......... 
/usr/bin/lsof
1..2
INSTALLED cylctb-20210926T233542Z-Cg5R/f/job-submission/09-activity-log-host-bad-submit from /tmp/tmp.nwyBP7payB/cylctb-20210926T233542Z-Cg5R/f/job-submission/09-activity-log-host-bad-submit
ok 1 - 09-activity-log-host-bad-submit-validate
ok 2 - 09-activity-log-host-bad-submit-run
ok    19147 ms ( 0.00 usr  0.00 sys +  8.26 cusr  0.94 csys =  9.20 CPU)
[05:14:53] tests/f/cylc-cat-log/10-remote-no-retrieve.t ...................... 
/usr/bin/lsof
1..5
INSTALLED cylctb-20210926T233542Z-Cg5R/f/cylc-cat-log/10-remote-no-retrieve from /tmp/tmp.f1EeCZPp5a/cylctb-20210926T233542Z-Cg5R/f/cylc-cat-log/10-remote-no-retrieve
ok 1 - 10-remote-no-retrieve-validate
ok 2 - 10-remote-no-retrieve-run
ok 3 - job.out-file-exists-fail
ok 4 - 10-remote-no-retrieve-task-out
ok 5 - grep-ok: 10-remote-no-retrieve-task-out.stdout-grep-ok
ok    22594 ms ( 0.00 usr  0.00 sys +  8.72 cusr  1.07 csys =  9.79 CPU)
[05:15:16] tests/f/cylc-cat-log/11-remote-retrieve.t ......................... 
/usr/bin/lsof
1..7
INSTALLED cylctb-20210926T233542Z-Cg5R/f/cylc-cat-log/11-remote-retrieve from /tmp/tmp.zybF3GTxrq/cylctb-20210926T233542Z-Cg5R/f/cylc-cat-log/11-remote-retrieve
ok 1 - 11-remote-retrieve-validate
ok 2 - 11-remote-retrieve-run
ok 3 - job.out-file-exists-ok
ok 4 - 11-remote-retrieve-out-rem
ok 5 - grep-ok: 11-remote-retrieve-out-rem.stdout-grep-ok
ok 6 - 11-remote-retrieve-out-loc
ok 7 - grep-ok: 11-remote-retrieve-out-loc.stdout-grep-ok
ok    23094 ms ( 0.00 usr  0.00 sys +  9.37 cusr  1.05 csys = 10.42 CPU)
Error: The action has timed out.

From the first failed run of 08-editor-remote.t it failed test 2 which is running the workflow (which runs a single task that prints a message to stdout and stderr).

@hjoliver hjoliver added the could be better Not exactly a bug, but not ideal. label Sep 27, 2021
@hjoliver hjoliver added this to the cylc-8.x milestone Sep 27, 2021
@MetRonnie
Copy link
Member

Note: after #4465, the only tests that seem to fail consistently on the first go are

tests/f/events/11-cycle-task-event-job-logs-retrieve.t                         (Wstat: 0 Tests: 3 Failed: 2)
  Failed tests:  2-3
tests/f/events/33-task-event-job-logs-retrieve-3.t                             (Wstat: 0 Tests: 5 Failed: 2)
  Failed tests:  3-4
tests/f/events/10-task-event-job-logs-retrieve.t                               (Wstat: 0 Tests: 4 Failed: 3)
  Failed tests:  2-4
tests/k/cylc-poll/16-execution-time-limit.t                                    (Wstat: 0 Tests: 4 Failed: 2)
  Failed tests:  3-4
tests/f/job-submission/13-tidy-submits-of-prev-run-remote-host.t               (Wstat: 0 Tests: 11 Failed: 1)
  Failed test:  2

@MetRonnie MetRonnie linked a pull request Nov 1, 2021 that will close this issue
7 tasks
@oliver-sanders oliver-sanders modified the milestones: cylc-8.x, cylc-8.0b3 Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be better Not exactly a bug, but not ideal.
Projects
None yet
3 participants