-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate ZIM tag values #156
Comments
@benoit74 I'd like to work on this. One possible solution is to convert the list into a set and back to a list again so that duplicates will be removed. self.tags = list(set([*self.tags, "_category:ted", "ted", "_videos:yes"])) WDYT? |
Should probably be done in scraperlib |
Agreed, let's transfer the issue. @dan-niles yes, that's the idea, but to do in scraperlib so that it benefit all scrapers, are you still interested? |
@benoit74 Sure, I'm up for it. I noticed that some scrapers like Since these scrapers eventually end up calling the |
Yep, this makes sense. Good observations! |
Strongly related to #164, should be implemented together |
When computing the list of tags, it could help to deduplicate them, so that they are not "doubled" by mistake.
https://github.com/openzim/ted/blob/60fb82a127b371907c8d24ba70b4e50d29ff5005/src/ted2zim/scraper.py#L93
The text was updated successfully, but these errors were encountered: