-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor scraper to exit properly when exceptions are raised #288
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #288 +/- ##
========================================
- Coverage 1.54% 1.53% -0.01%
========================================
Files 11 11
Lines 1102 1105 +3
Branches 162 164 +2
========================================
Hits 17 17
- Misses 1085 1088 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code makes much more sense. I did not reproduced the problem you had on your tests you showed me.
My only concern is that this change shows that we do not delete the build_dir should a problem occur somewhere at the beginning of the function... which is a bit sad.
I suggest we encapsulate the whole method in the try
block.
Since we do not call finish
when an exception occurs, we can get rid of self.zim_file.can_finish = False
statements, so that we do not need to take care of whether self.zim_file
has been initialized or not.
I let you test this as well but on my machine it works as expected
I moved all the code inside the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, but I think we need to fix the CHANGELOG as well, we've done way more than only handle "too many download failed" errors. You might keep existing entry and add a new one (mentioning something around cleaning up properly on exception during scraper run) pointing to this PR number (this is what we usually do when we have a change without a linked issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you
Fix #285
Changes:
1
whenever a exception is raised.self.zim_file.finish()
to anelse
section in thetry-exception
block.finally
section in thetry-exception
block.delete_callback
in utils.py since the one inzimscraperlib.filesystem
does not check for a file's existence before trying to delete it.FileNotFound
. Checking for the existence of the files let's us avoid these errors.