You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this playlist, we have a video (ID FbK-FPwSAFQ) which is now private but we have a cached video in S3, so it probably became private only "recently".
Currently scraper logic is (significantly simplified):
fetch all playlists we have to download
for every playlist, fetch it items (videos)
download these videos (preferably from S3 cache) and add them directly to the ZIM to save disk space
get channels details of every videos successfully downloaded
filter-out videos which failed to download or which do not have accessible channel details (hence private)
create JSON (was HTML) for navigating to videos which have been kept
delete videos which have not been kept
The problem on this video is that it succeeds to download (it is present in S3 cache) and hence added to the ZIM, but it is then filtered-out because private, and the scraper hence tries to delete the video while it is being added to the ZIM by libzim (this is an async task in libzim), hence causing an exit code 139.
I think we should reconsider the cleanup procedure to really delete only video which have not been successfully downloaded.
And we should also avoid adding to the ZIM a private video which will then be inaccessible (but still consuming space, and probably causing copyright problems).
benoit74
changed the title
Scraper tries to delete a video which is currently being added to the ZIM, causing exit code 139
Scraper does not properly filter-out private videos
Oct 14, 2024
https://farm.openzim.org/recipes/cest-pas-sorcier_fr_astronomie has failed two times in a row with an exit code 139, on two different (and beefy and "empty") workers:
I will try to investigate locally.
The text was updated successfully, but these errors were encountered: