Fix cylc clean while cat-log running #5359

datamel · 2023-02-09T13:51:06Z

cylc clean will fail when a cat-log process is keeping one of the log files open. This is a more pressing issue now that the log view is available in the gui.
This is related to an nfs issue and may not be possible to reproduce on all systems.

The solution involves

changing the cat-log command to include --follow=name, changed in the default.
changing rmtree behaviour, by retrying on error.

closes #5341

Note for debugging this issue, I changed the error message to be passed into the gui.

To test, open the gui and open a log file. With the log view still open, clean the workflow.
This will error on master but not on this branch.

In the gui, I think we should maybe close any open views on clean, perhaps even redirect to the dashboard? But this would be a separate issue.

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg and conda-environment.yml.
Tests are included (or explain why tests are not needed).
CHANGES.md entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

oliver-sanders · 2023-02-16T12:52:21Z

If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

Needs rebasing onto 8.1.x.

MetRonnie · 2023-03-06T15:13:34Z

Note for debugging this issue, I changed the error message to be passed into the gui, so this (if reviewers are happy with the approach) will also close #5324.

Thanks, however this partially addresses that issue. We need to check for other instances of logging that should be moved into the exception message

datamel · 2023-03-07T13:47:15Z

Note for debugging this issue, I changed the error message to be passed into the gui, so this (if reviewers are happy with the approach) will also close #5324.

Thanks, however this partially addresses that issue. We need to check for other instances of logging that should be moved into the exception message

Okay, I've removed the close from the description.

cylc/flow/pathutil.py

cylc/flow/scripts/clean.py

MetRonnie · 2023-03-17T11:19:27Z

Ah, just realised a slight flaw with only testing for ENOTEMPTY - the second time it tries it, the error will be EBUSY instead, so it will only retry once before failing. (Assuming it's a "sillyrenamed" .nfsXXXX... file causing the problem)

Also annoying is how rmtree only reports the filename, not path. If it's a job log file that's being kept open by a running job, there could be hundreds of job log dirs and you don't know which one contains the .nfsXXXX....

Anyway, we can sort these out in a follow-up issue.

Co-authored-by: Ronnie Dutta <[email protected]>

oliver-sanders · 2023-03-22T15:57:25Z

Ah, just realised a slight flaw with only testing for ENOTEMPTY - the second time it tries it, the error will be EBUSY instead

Dammit, confirmed with this script:

#!/usr/bin/env bash                                                      
                                                                         
set -eux                                  
                                          
TD="$HOME/cylc-run/foo/log/scheduler"    
mkdir -p "$TD"         
TF="$TD/log"           
                       
echo -e 'a\nb\nc' > "$TF"    
tree "$TD"             
                       
tail --follow=name -f "$TF" & PROC="$!"    
     
sleep 2    
     
# this will likely fail due to FS locking stuffs    
rm -r "$TD" && echo 'REMOVE PASSED' || echo 'REMOVE FAILED'    
rm -r "$TD" && echo 'REMOVE PASSED' || echo 'REMOVE FAILED'    
rm -r "$TD" && echo 'REMOVE PASSED' || echo 'REMOVE FAILED'                                                                        
                                             
# tidy up after                              
kill "$PROC"                                 
rm -r "$TD" || true

Which reliably gives:

Directory not empty
Device or resource busy
Device or resource busy

Will play a little more but I think we need to add this code too.

cylc/flow/pathutil.py

oliver-sanders · 2023-03-22T17:33:56Z

I used this script to double-check the premise that the tail process dies when the file is removed and to confirm the error message(s) returned:

#!/usr/bin/env bash                                                      
                                                                         
set -eu                                                            
                                                 
TD="$HOME/cylc-run/foo/log/scheduler"            
mkdir -p "$TD"                                   
TF="$TD/log"                                     
                                                 
echo -e 'a\nb\nc' > "$TF"                        
                                                 
# NOTE: for some reason the tail command doesn't die when launched
# as a background process                        
echo 'Start you tail process now:'               
echo "i.e. tail -f --follow=name '$TF'"          
sleep 5                                          
                                                 
# retry removal a few times                      
SLEEP=1                                          
for try in $(seq 1 10); do                       
    sleep "$SLEEP"                               
    echo "Try: $try"                             
    rm -r "$TD" || continue                      
    echo "Success after $((try * SLEEP)) seconds."
    break            
done                 
                     
# tidy up after JIC
rm -r "$TD" 2>/dev/null || true

This is hard to test, would need NFS in a container and whatnot so commenting it here for the record, will link this comment into the docstring.

cylc/flow/pathutil.py

oliver-sanders

Works a charm for me.

$ cylc vip whatever
$ cylc cat-log whatever -m t

$ cylc clean -y whatever

oliver-sanders · 2023-03-27T15:22:52Z

Managed to work out a functional test after a little bodging, can bump that and the changelog to #5432

MetRonnie

I've tested it out. Added a changelog entry

Fix cylc clean while cat-log running

7e45c1f

oliver-sanders added this to the cylc-8.1.2 milestone Feb 16, 2023

oliver-sanders added the bug Something is wrong :( label Feb 16, 2023

oliver-sanders modified the milestones: cylc-8.1.2, cylc-8.1.3 Feb 16, 2023

oliver-sanders marked this pull request as draft February 16, 2023 12:52

oliver-sanders assigned datamel Feb 20, 2023

datamel changed the base branch from master to 8.1.x February 20, 2023 10:33

datamel marked this pull request as ready for review March 7, 2023 13:47

oliver-sanders requested review from MetRonnie and oliver-sanders March 7, 2023 13:48

oliver-sanders reviewed Mar 7, 2023

View reviewed changes

cylc/flow/pathutil.py Show resolved Hide resolved

Update _rmtree

dd818f2

MetRonnie approved these changes Mar 17, 2023

View reviewed changes

cylc/flow/scripts/clean.py Outdated Show resolved Hide resolved

Update cylc/flow/scripts/clean.py

ca03d23

Co-authored-by: Ronnie Dutta <[email protected]>

oliver-sanders reviewed Mar 22, 2023

View reviewed changes

cylc/flow/pathutil.py Outdated Show resolved Hide resolved

oliver-sanders reviewed Mar 22, 2023

View reviewed changes

cylc/flow/pathutil.py Outdated Show resolved Hide resolved

Apply suggestions from code review

f054199

oliver-sanders requested a review from MetRonnie March 22, 2023 17:35

oliver-sanders assigned oliver-sanders and unassigned datamel Mar 22, 2023

oliver-sanders reviewed Mar 23, 2023

View reviewed changes

cylc/flow/pathutil.py Outdated Show resolved Hide resolved

Update cylc/flow/pathutil.py

eebc102

oliver-sanders reviewed Mar 27, 2023

View reviewed changes

cylc/flow/pathutil.py Outdated Show resolved Hide resolved

Update cylc/flow/pathutil.py

df448ec

oliver-sanders approved these changes Mar 27, 2023

View reviewed changes

oliver-sanders mentioned this pull request Mar 27, 2023

clean: test cat-log interaction on NFS filesystems #5432

Merged

MetRonnie added 2 commits March 28, 2023 09:54

Merge branch '8.1.x' into pr-5359

02dd344

Update changelog

04efd7c

MetRonnie approved these changes Mar 28, 2023

View reviewed changes

oliver-sanders merged commit 65683fa into cylc:8.1.x Mar 28, 2023

oliver-sanders mentioned this pull request Mar 29, 2023

cat-log: cancel active subscriptions if clean is issued cylc/cylc-uiserver#417

Open

This was referenced Apr 24, 2023

cat-log: use --follow=name #5341

Closed

Move error information out of printing/logging and into exception messages #5324

Open

MetRonnie mentioned this pull request May 21, 2024

review use of rmtree #3659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cylc clean while cat-log running #5359

Fix cylc clean while cat-log running #5359

datamel commented Feb 9, 2023 •

edited by MetRonnie

Loading

oliver-sanders commented Feb 16, 2023

MetRonnie commented Mar 6, 2023 •

edited

Loading

datamel commented Mar 7, 2023

MetRonnie commented Mar 17, 2023 •

edited

Loading

oliver-sanders commented Mar 22, 2023 •

edited

Loading

oliver-sanders commented Mar 22, 2023

oliver-sanders left a comment

oliver-sanders commented Mar 27, 2023

MetRonnie left a comment •

edited

Loading

Fix cylc clean while cat-log running #5359

Fix cylc clean while cat-log running #5359

Conversation

datamel commented Feb 9, 2023 • edited by MetRonnie Loading

oliver-sanders commented Feb 16, 2023

MetRonnie commented Mar 6, 2023 • edited Loading

datamel commented Mar 7, 2023

MetRonnie commented Mar 17, 2023 • edited Loading

oliver-sanders commented Mar 22, 2023 • edited Loading

oliver-sanders commented Mar 22, 2023

oliver-sanders left a comment

Choose a reason for hiding this comment

oliver-sanders commented Mar 27, 2023

MetRonnie left a comment • edited Loading

Choose a reason for hiding this comment

datamel commented Feb 9, 2023 •

edited by MetRonnie

Loading

MetRonnie commented Mar 6, 2023 •

edited

Loading

MetRonnie commented Mar 17, 2023 •

edited

Loading

oliver-sanders commented Mar 22, 2023 •

edited

Loading

MetRonnie left a comment •

edited

Loading