-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update old wikidata items #971
Update old wikidata items #971
Conversation
... and updated processing of wikidata_names.json so as to retaing update/fetch time of the translations and drop+refetch those which are older than `wikidata_max_age` but not more than `wikidata_update_limit` items
Full logs: https://github.com/onthegomap/planetiler/actions/runs/10321914925 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, a couple minor improvements. At first I was a bit concerned this would just drop the first N translations by ID whereas it seems like we want it to drop them somewhat randomly - but it looks like since we write the translation file however it's ordered in a hashmap that it will be randomly sorted to begin with so dropping the first N will be random also 👍
planetiler-core/src/main/java/com/onthegomap/planetiler/util/Wikidata.java
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/Planetiler.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/util/Wikidata.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/test/java/com/onthegomap/planetiler/util/WikidataTest.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/Planetiler.java
Outdated
Show resolved
Hide resolved
planetiler-core/src/main/java/com/onthegomap/planetiler/Planetiler.java
Outdated
Show resolved
Hide resolved
e.g. all entries older than `wikidata_max_age` will be fetched again
and load(Path path, Duration maxAge, int updateLimit) made private
and clean-up of load() changes from previous commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor tweak to the default CLI options but otherwise looks good!
planetiler-core/src/main/java/com/onthegomap/planetiler/Planetiler.java
Outdated
Show resolved
Hide resolved
so that without command-line parameters we are fully backward-compatible
Quality Gate passedIssues Measures |
Looks good! Thanks for adding this. |
When Planetiler is used with
--fetch-wikidata
it works roughly like this:wikidata_names.json
does not exist, hence all the translations are fetched from WikiDatawikidata_names.json
exists, so translations are loaded from it and only translations for new OSM elements are fetched from WikiDataGood thing is, that this lowers the load on WikiData and speeds-up tileset generation.
Problem: If some translation changed for existing OSM element, old value from
wikidata_names.json
is used. If we want to get updates, we can deletewikidata_names.json
and fetch all translations once again.This PR tries to partially address the problem without the need for deletion (or manual tweaking of)
wikidata_names.json
:wikidata_max_age
(with default value 0, e.g. "disabled")wikidata_update_limit
(with default value 0, e.g. "disabled")wikidata_update_limit
items which are older thanwikidata_max_age
With the defaults Planetiler works as before.
When called with for example
--wikidata-max-age=P30D --wikidata-update-limit=100000
, it should then work roughly as follows:wikidata_names.json
does not exist, hence all the translations are fetched from WikiDatawikidata_names.json
exists, so translations are loaded from it and only translations for new OSM elements are fetched from WikiDatawikidata_max_age=P30D
all translations are now considered outdatedwikidata_update_limit=100000
(which is roughly 5% of existing translations for full Planet) only up-to 100'000 translations are dropped and fetched from WikiData againOther combination might be
--wikidata-max-age=P30D
and--wikidata-update-limit=0
which would keep using the cached translations for a month but the run after a month will drop all (now outdated) translations and fetch them again from WikiData.