Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check symlink target. #4890

Merged
merged 6 commits into from
Jul 4, 2022
Merged

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented May 20, 2022

These changes close #4888 - abort cylc install if symlink dirs target already exists.

(venv) oliverh@w-nwp01:~/cylc-src/foo$ cylc install
INSTALLED foo/run1 from /scale_wlg_persistent/filesets/home/oliverh/cylc-src/foo

(venv) oliverh@w-nwp01:~/cylc-src/foo$ ls -lrt ~/cylc-run/foo
total 2
lrwxrwxrwx 1 oliverh oliverh   50 May 20 02:50 run1 -> /nesi/nobackup/niwa99999/oliverh/cylc-run/foo/run1
lrwxrwxrwx 1 oliverh oliverh    4 May 20 02:50 runN -> run1
drwxrwxr-x 2 oliverh oliverh 4096 May 20 02:50 _cylc-install

(venv) oliverh@w-nwp01:~/cylc-src/foo$ rm -rf ~/cylc-run

(venv) oliverh@w-nwp01:~/cylc-src/foo$ cylc install
WorkflowFilesError: Symlink target already exists: /nesi/nobackup/niwa99999/oliverh/cylc-run/foo/run1.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Appropriate tests are included (unit).
  • Appropriate change log entry included.
  • No documentation update required.

@hjoliver hjoliver added the bug Something is wrong :( label May 20, 2022
@hjoliver hjoliver added this to the cylc-8.0rc4 milestone May 20, 2022
@hjoliver hjoliver self-assigned this May 20, 2022
@hjoliver
Copy link
Member Author

hjoliver commented May 24, 2022

Confirmed: this fix also covers job platform init, at run time:

Screenshot from 2022-05-24 18-30-53

Tested with a fresh swarm build using this branch, to get a remote host with independent FS:

  • set up a symlinked run dir for the _remote_background_indep_tcp host, in global config
  • install and run a small workflow that submits a job to the remote host
  • remove cylc-run on the remote host
  • install and run again: the new run should abort as above during host init

@hjoliver
Copy link
Member Author

I'll add or tweak tests tomorrow, before undrafting this.

@MetRonnie
Copy link
Member

@hjoliver Want anyone to take over this PR, if it would help?

@hjoliver
Copy link
Member Author

Thanks @MetRonnie - if you wouldn't mind. I'm currently a bit swamped with local setup stuff, for Cylc 8 on NIWA and NeSI HPC platforms.

@hjoliver
Copy link
Member Author

(We should get this in for 8.0, because it's a likely mode of confusing errors for users).

@MetRonnie MetRonnie self-assigned this Jun 22, 2022
@MetRonnie
Copy link
Member

Confirmed: this fix also covers job platform init, at run time...

Tested with a fresh swarm build using this branch, to get a remote host with independent FS:

  • set up a symlinked run dir for the _remote_background_indep_tcp host, in global config
  • install and run a small workflow that submits a job to the remote host
  • remove cylc-run on the remote host
  • install and run again: the new run should abort as above during host init

Ah, do you want me to create a functional test that does this?

@hjoliver
Copy link
Member Author

Ah, do you want me to create a functional test that does this?

I suppose that would be good. I can't think of an easier (unit or integration) test for this.

@MetRonnie MetRonnie force-pushed the symlink-target-check branch from a1e8696 to fd004af Compare June 30, 2022 17:33
@MetRonnie MetRonnie marked this pull request as ready for review June 30, 2022 17:34
@MetRonnie
Copy link
Member

MetRonnie commented Jun 30, 2022

I've added a functional test. However I've noticed the workflow only stalls instead of shutting down after the remote init failure. Is that how it should be?

@oliver-sanders
Copy link
Member

the workflow only stalls instead of shutting down after the remote init failure. Is that how it should be?

Remote init happens when the first task on an install target tries to submit to a platform on that install target. If the remote init fails then the task goes into the submit-failed state (as remote-init is kinda like the first stage of job submission). Tasks for platforms on other install targets might still be able to submit.

@oliver-sanders oliver-sanders merged commit ca81d57 into cylc:master Jul 4, 2022
@hjoliver hjoliver modified the milestones: cylc-8.0rc4, cylc-8.0.0 Jul 5, 2022
@hjoliver hjoliver deleted the symlink-target-check branch July 6, 2022 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

symlink dirs: check for existing link target?
4 participants