-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acceptance criteria - Kingfisher Collect #29
Comments
Then, we can use it to apply a policy. Here's a sample policy: https://github.com/open-contracting-archive/kingfisher-archive/blob/main/ocdskingfisherarchive/crawl.py#L136-L169 Related: open-contracting-archive/kingfisher-archive#44 We want this as a library, so that it can also be used by Kingfisher Collect. open-contracting/kingfisher-collect#531 |
We can test on
These all went wrong in scrape phase, therefore, the task should fail and should not start the process task |
If you update Collect, Mexico quien es quien will work again :) |
Also, Mexico INAI portal no longer exists in Collect (if you update it). |
@hrubyjan Where are the scrapyd log files? |
Assigning only for last question for now. |
Job context contains reference to a given log. For example you can run such command to get a log for scraping Kyrgyzstan data
|
I'll add this information to Admin guide |
Container files are also in the overlay2 directory. |
Can also check the dropped items statistic (following idea from open-contracting/kingfisher-collect#1055) |
Re: notifications, we can have To start, it's okay to not have any policy (other than no files to process, which already causes the process task to fail), and we can later decide on a policy based on what warnings we observe. |
At the end of each phase of data processing we should evaluate whether it ended well, there is something suspicious or this particular phase failed.
For
collect
phase define criteria that willa) prevent a dataset from being published in data registry
b) raise a warning but will not prevent dataset from being published
We should not insist on having some criteria if we will not see some meaningful rules
The text was updated successfully, but these errors were encountered: