Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update old wikidata items #971

Merged
merged 11 commits into from
Aug 10, 2024

Conversation

phanecak-maptiler
Copy link
Contributor

@phanecak-maptiler phanecak-maptiler commented Aug 2, 2024

When Planetiler is used with --fetch-wikidata it works roughly like this:

  1. Planetiler run 1: wikidata_names.json does not exist, hence all the translations are fetched from WikiData
  2. Some OSM data gets updated, e.g. some new items get added and some existing items get updated
  3. Some WikiData items gets updated, e.g. some new items get added and some existing items get updated
  4. Planetiler run 2: wikidata_names.json exists, so translations are loaded from it and only translations for new OSM elements are fetched from WikiData

Good thing is, that this lowers the load on WikiData and speeds-up tileset generation.

Problem: If some translation changed for existing OSM element, old value from wikidata_names.json is used. If we want to get updates, we can delete wikidata_names.json and fetch all translations once again.

This PR tries to partially address the problem without the need for deletion (or manual tweaking of) wikidata_names.json:

  • It adds new option wikidata_max_age (with default value 0, e.g. "disabled")
  • Is adds new option wikidata_update_limit (with default value 0, e.g. "disabled")
  • It tweaks loading of translations during fetch phase to:
    • Skip up to wikidata_update_limit items which are older than wikidata_max_age
    • This then causes subsequent fetch to load those values once again

With the defaults Planetiler works as before.

When called with for example --wikidata-max-age=P30D --wikidata-update-limit=100000, it should then work roughly as follows:

  1. Planetiler run 1: wikidata_names.json does not exist, hence all the translations are fetched from WikiData
  2. Some OSM data gets updated, e.g. some new items get added and some existing items get updated
  3. Some WikiData items gets updated, e.g. some new items get added and some existing items get updated
  4. Planetiler run 2, say one month after 1st run: wikidata_names.json exists, so translations are loaded from it and only translations for new OSM elements are fetched from WikiData
    • Given wikidata_max_age=P30D all translations are now considered outdated
    • But given wikidata_update_limit=100000 (which is roughly 5% of existing translations for full Planet) only up-to 100'000 translations are dropped and fetched from WikiData again
  5. ...

Other combination might be --wikidata-max-age=P30D and --wikidata-update-limit=0 which would keep using the cached translations for a month but the run after a month will drop all (now outdated) translations and fetch them again from WikiData.

... and updated processing of wikidata_names.json so as to retaing
update/fetch time of the translations and drop+refetch those which
are older than `wikidata_max_age` but not more than
`wikidata_update_limit` items
Copy link

github-actions bot commented Aug 2, 2024

This Branch 7078804 Base 797e250
0:01:10 DEB [archive] - Tile stats:
0:01:10 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (156k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:84k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (137k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:65k)
6. 14/4941/6093 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (100k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (99k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:10 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  155   375   444   584   939   341   435   550   775  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   640   714    1k  1.6k  3.1k  5.7k  3.3k  1.7k   803   948  5.7k
            landuse    0     0     0     0   549   695  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   314   850  1.2k    6k    8k   24k   17k   19k   65k   49k   34k   65k
           waterway    0     0     0     0   112   119     0     0     0  3.1k  2.3k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.2k    4k  9.7k   19k   13k  8.2k  4.3k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.5k  4.7k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.5k   29k   85k   72k   81k   53k   30k   24k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   328   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.8k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   506   503   84k   84k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.8k  6.1k   20k   42k   85k  203k  185k  135k  114k  129k  246k  246k
            gzipped 6.2k  3.6k  7.1k  5.2k  3.1k  4.9k   14k   29k   60k  149k  138k   99k   84k   92k  156k  156k
0:01:10 DEB [archive] -    Max tile: 246k (gzipped: 156k)
0:01:10 DEB [archive] -    Avg tile: 5.4k (gzipped: 4.1k) using weighted average based on OSM traffic
0:01:10 DEB [archive] -     # tiles: 4,115,029
0:01:10 DEB [archive] -  # features: 5,490,323
0:01:10 INF [archive] - Finished in 19s cpu:1m8s avg:3.7
0:01:10 INF [archive] -   read    1x(3% 0.6s wait:17s done:1s)
0:01:10 INF [archive] -   encode  4x(55% 10s wait:2s done:1s)
0:01:10 INF [archive] -   write   1x(22% 4s wait:12s done:1s)
0:01:10 INF [archive] - Finished in 1m11s cpu:3m39s gc:1s avg:3.1
0:01:10 INF [archive] - FINISHED!
0:01:10 INF [archive] - 
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - data errors:
0:01:10 INF [archive] - 	render_snap_fix_input	16,673
0:01:10 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:10 INF [archive] - 	osm_boundary_missing_way	73
0:01:10 INF [archive] - 	merge_snap_fix_input	12
0:01:10 INF [archive] - 	osm_boundary_duplicate_member	2
0:01:10 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:10 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:10 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:10 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - 	overall          1m11s cpu:3m39s gc:1s avg:3.1
0:01:10 INF [archive] - 	lake_centerlines 3s cpu:6s avg:2.1
0:01:10 INF [archive] - 	  read     1x(16% 0.5s done:2s)
0:01:10 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:10 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:10 INF [archive] - 	water_polygons   15s cpu:41s avg:2.8
0:01:10 INF [archive] - 	  read     1x(41% 6s done:7s)
0:01:10 INF [archive] - 	  process  4x(27% 4s wait:4s done:5s)
0:01:10 INF [archive] - 	  write    1x(4% 0.5s wait:9s done:5s)
0:01:10 INF [archive] - 	natural_earth    12s cpu:18s avg:1.5
0:01:10 INF [archive] - 	  read     1x(52% 6s done:5s)
0:01:10 INF [archive] - 	  process  4x(7% 0.8s wait:6s done:5s)
0:01:10 INF [archive] - 	  write    1x(0% 0s wait:6s done:5s)
0:01:10 INF [archive] - 	osm_pass1        2s cpu:7s avg:3.4
0:01:10 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:10 INF [archive] - 	  parse    4x(33% 0.7s)
0:01:10 INF [archive] - 	  process  1x(70% 1s)
0:01:10 INF [archive] - 	osm_pass2        19s cpu:1m14s avg:3.9
0:01:10 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:10 INF [archive] - 	  process  4x(73% 14s)
0:01:10 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:10 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:10 INF [archive] - 	boundaries       0s cpu:0s avg:1.2
0:01:10 INF [archive] - 	agg_stop         0s cpu:0s avg:15.4
0:01:10 INF [archive] - 	sort             1s cpu:3s avg:2.5
0:01:10 INF [archive] - 	  worker  1x(51% 0.7s)
0:01:10 INF [archive] - 	archive          19s cpu:1m8s avg:3.7
0:01:10 INF [archive] - 	  read    1x(3% 0.6s wait:17s done:1s)
0:01:10 INF [archive] - 	  encode  4x(55% 10s wait:2s done:1s)
0:01:10 INF [archive] - 	  write   1x(22% 4s wait:12s done:1s)
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - 	archive	108MB
0:01:10 INF [archive] - 	features	290MB
-rw-r--r-- 1 runner docker 85M Aug  9 15:44 run.jar
0:01:05 DEB [archive] - Tile stats:
0:01:05 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (156k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:84k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (137k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (113k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:65k)
6. 14/4941/6093 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (100k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (99k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:05 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  155   375   444   584   939   341   435   550   775  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   640   714    1k  1.6k  3.1k  5.7k  3.3k  1.7k   803   948  5.7k
            landuse    0     0     0     0   549   695  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   314   850  1.2k    6k    8k   24k   17k   19k   65k   49k   34k   65k
           waterway    0     0     0     0   112   119     0     0     0  3.1k  2.3k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.2k    4k  9.7k   19k   13k  8.2k  4.3k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.5k  4.7k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.5k   29k   85k   72k   81k   53k   30k   24k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   328   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.8k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   506   503   84k   84k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.8k  6.1k   20k   42k   85k  203k  185k  135k  114k  129k  246k  246k
            gzipped 6.2k  3.6k  7.1k  5.2k  3.1k  4.9k   14k   29k   60k  149k  138k   99k   84k   92k  156k  156k
0:01:05 DEB [archive] -    Max tile: 246k (gzipped: 156k)
0:01:05 DEB [archive] -    Avg tile: 5.4k (gzipped: 4.1k) using weighted average based on OSM traffic
0:01:05 DEB [archive] -     # tiles: 4,115,029
0:01:05 DEB [archive] -  # features: 5,490,323
0:01:05 INF [archive] - Finished in 18s cpu:1m8s avg:3.7
0:01:05 INF [archive] -   read    1x(3% 0.5s wait:17s done:1s)
0:01:05 INF [archive] -   encode  4x(55% 10s wait:2s)
0:01:05 INF [archive] -   write   1x(22% 4s wait:13s)
0:01:05 INF [archive] - Finished in 1m5s cpu:3m32s gc:1s avg:3.3
0:01:05 INF [archive] - FINISHED!
0:01:05 INF [archive] - 
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - data errors:
0:01:05 INF [archive] - 	render_snap_fix_input	16,673
0:01:05 INF [archive] - 	osm_multipolygon_missing_way	360
0:01:05 INF [archive] - 	osm_boundary_missing_way	73
0:01:05 INF [archive] - 	merge_snap_fix_input	12
0:01:05 INF [archive] - 	osm_boundary_duplicate_member	2
0:01:05 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:05 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:05 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:05 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - 	overall          1m5s cpu:3m32s gc:1s avg:3.3
0:01:05 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.3
0:01:05 INF [archive] - 	  read     1x(21% 0.5s done:2s)
0:01:05 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:05 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:05 INF [archive] - 	water_polygons   15s cpu:41s avg:2.7
0:01:05 INF [archive] - 	  read     1x(39% 6s done:7s)
0:01:05 INF [archive] - 	  process  4x(26% 4s wait:4s done:5s)
0:01:05 INF [archive] - 	  write    1x(4% 0.5s wait:10s done:5s)
0:01:05 INF [archive] - 	natural_earth    7s cpu:13s avg:2
0:01:05 INF [archive] - 	  read     1x(95% 6s)
0:01:05 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:05 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:05 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.2
0:01:05 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:05 INF [archive] - 	  parse    4x(32% 0.6s)
0:01:05 INF [archive] - 	  process  1x(72% 1s)
0:01:05 INF [archive] - 	osm_pass2        19s cpu:1m14s avg:3.9
0:01:05 INF [archive] - 	  read     1x(0% 0s wait:11s done:8s)
0:01:05 INF [archive] - 	  process  4x(74% 14s)
0:01:05 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:05 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:05 INF [archive] - 	boundaries       0s cpu:0s avg:1.3
0:01:05 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:05 INF [archive] - 	sort             1s cpu:3s avg:2.6
0:01:05 INF [archive] - 	  worker  1x(52% 0.7s)
0:01:05 INF [archive] - 	archive          18s cpu:1m8s avg:3.7
0:01:05 INF [archive] - 	  read    1x(3% 0.5s wait:17s done:1s)
0:01:05 INF [archive] - 	  encode  4x(55% 10s wait:2s)
0:01:05 INF [archive] - 	  write   1x(22% 4s wait:13s)
0:01:05 INF [archive] - ----------------------------------------
0:01:05 INF [archive] - 	archive	108MB
0:01:05 INF [archive] - 	features	290MB
-rw-r--r-- 1 runner docker 85M Aug  9 15:45 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/10321914925

Copy link
Contributor

@msbarry msbarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, a couple minor improvements. At first I was a bit concerned this would just drop the first N translations by ID whereas it seems like we want it to drop them somewhat randomly - but it looks like since we write the translation file however it's ordered in a hashmap that it will be randomly sorted to begin with so dropping the first N will be random also 👍

Copy link
Contributor

@msbarry msbarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor tweak to the default CLI options but otherwise looks good!

so that without command-line parameters we are fully backward-compatible
Copy link

sonarcloud bot commented Aug 9, 2024

@msbarry msbarry merged commit 8512b77 into onthegomap:main Aug 10, 2024
12 checks passed
@msbarry
Copy link
Contributor

msbarry commented Aug 10, 2024

Looks good! Thanks for adding this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants