Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet/no-build-tool java profile usability improvements #914

Merged
merged 10 commits into from
Jun 11, 2024
Merged

Conversation

msbarry
Copy link
Contributor

@msbarry msbarry commented Jun 11, 2024

Improvements and bug fixes found while using planetiler no-build-tool java profiles in https://github.com/onthegomap/planetiler-examples

improvements:

  • move yaml profile validator to planetiler-core and hook it into PlanetilerRunner so you can run with --tests spec.yaml to run the profile against a set of example test case features
  • add features.anyGeometry(layer) to FeatureCollector API that creates a line, point, or polygon vector tile feature based on the geometry type of the source feature
  • add inheritAttrsFromSource and inheritAttrsFromSourceWithMinzoom to FeatureCollector API
  • let ForwardingProfile handlers also return a boolean expression from filter() method that limits the features they are called with (and skip entire parquet files that are not of interest)
  • add Expression.matchSource and Expression.matchSourceLayer boolean expressions that generate optimized indexes for matching features at runtime
  • make Expression.matchField and Expression.matchAny handle structured attributes, for example: matchField("names.primary", "Massachusetts Turnpike")
  • let ForwardingProfile skip or include only certain layers automatically with --only-layers and --exclude-layers option.
  • throw only runtime exceptions from PlanetilerRunner#run so that profiles don't need to handle them
  • add OSM_ATTRIBUTION constant to Profile with default recommended openstreetmap attribution
  • add default name implementation to Profile so profiles only need to implement processFeature method - and they can use lambda method syntax
  • use Distributor utility when reading parquet files to spread features from final row groups across other threads as they finish

fixes:

  • make setAttrWithMinSize work correctly for partial-length line features
  • fix unwrapping Struct and ZoomFunction attribute values for partial-length line features
  • Struct#rawValue was including nested structs for maps and lists, changed to return raw objects

@msbarry msbarry changed the title Parquet usability improvements Parquet/no-build-tool java profile usability improvements Jun 11, 2024
Copy link

github-actions bot commented Jun 11, 2024

This Branch 67edc07 Base dd6fc44
0:01:10 DEB [archive] - Tile stats:
0:01:10 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k)
6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:10 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  154   374   443   583   938   339   433   548   773  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   639   712    1k  1.5k  3.1k  5.6k  3.3k  1.7k   795   936  5.6k
            landuse    0     0     0     0   548   694  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   243   782  1.2k  5.9k    8k   24k   17k   19k   65k   48k   34k   65k
           waterway    0     0     0     0   111   118     0     0     0  3.1k  2.4k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.2k    4k  9.7k   19k   13k  8.2k  4.3k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.4k  4.6k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.5k   29k   85k   72k   81k   53k   30k   24k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   664   327   273   220   220   664
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.7k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   501   498   83k   83k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k    6k   20k   42k   85k  203k  185k  135k  114k  128k  244k  244k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   60k  149k  138k   98k   83k   92k  154k  154k
0:01:10 DEB [archive] -    Max tile: 244k (gzipped: 154k)
0:01:10 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:10 DEB [archive] -     # tiles: 4,115,036
0:01:10 DEB [archive] -  # features: 5,485,775
0:01:10 INF [archive] - Finished in 19s cpu:1m8s avg:3.6
0:01:10 INF [archive] -   read    1x(3% 0.6s wait:17s done:1s)
0:01:10 INF [archive] -   encode  4x(55% 10s wait:2s done:1s)
0:01:10 INF [archive] -   write   1x(21% 4s wait:12s done:1s)
0:01:10 INF [archive] - Finished in 1m11s cpu:3m33s gc:1s avg:3
0:01:10 INF [archive] - FINISHED!
0:01:10 INF [archive] - 
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - data errors:
0:01:10 INF [archive] - 	render_snap_fix_input	16,669
0:01:10 INF [archive] - 	osm_multipolygon_missing_way	359
0:01:10 INF [archive] - 	osm_boundary_missing_way	73
0:01:10 INF [archive] - 	merge_snap_fix_input	12
0:01:10 INF [archive] - 	osm_boundary_duplicate_member	2
0:01:10 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:10 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:10 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:10 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - 	overall          1m11s cpu:3m33s gc:1s avg:3
0:01:10 INF [archive] - 	lake_centerlines 3s cpu:6s avg:1.8
0:01:10 INF [archive] - 	  read     1x(14% 0.5s done:3s)
0:01:10 INF [archive] - 	  process  4x(0% 0s done:3s)
0:01:10 INF [archive] - 	  write    1x(0% 0s done:3s)
0:01:10 INF [archive] - 	water_polygons   15s cpu:39s avg:2.7
0:01:10 INF [archive] - 	  read     1x(43% 6s done:7s)
0:01:10 INF [archive] - 	  process  4x(26% 4s wait:4s done:5s)
0:01:10 INF [archive] - 	  write    1x(4% 0.5s wait:9s done:5s)
0:01:10 INF [archive] - 	natural_earth    12s cpu:18s avg:1.5
0:01:10 INF [archive] - 	  read     1x(52% 6s done:5s)
0:01:10 INF [archive] - 	  process  4x(7% 0.8s wait:6s done:5s)
0:01:10 INF [archive] - 	  write    1x(0% 0s wait:6s done:5s)
0:01:10 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.2
0:01:10 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:10 INF [archive] - 	  parse    4x(34% 0.6s)
0:01:10 INF [archive] - 	  process  1x(69% 1s)
0:01:10 INF [archive] - 	osm_pass2        18s cpu:1m11s avg:3.9
0:01:10 INF [archive] - 	  read     1x(0% 0s wait:10s done:8s)
0:01:10 INF [archive] - 	  process  4x(75% 13s)
0:01:10 INF [archive] - 	  write    1x(2% 0.4s wait:18s)
0:01:10 INF [archive] - 	ne_lakes         0s cpu:0s avg:11.5
0:01:10 INF [archive] - 	boundaries       0s cpu:0s avg:1
0:01:10 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:10 INF [archive] - 	sort             1s cpu:4s avg:2.6
0:01:10 INF [archive] - 	  worker  1x(51% 0.7s)
0:01:10 INF [archive] - 	archive          19s cpu:1m8s avg:3.6
0:01:10 INF [archive] - 	  read    1x(3% 0.6s wait:17s done:1s)
0:01:10 INF [archive] - 	  encode  4x(55% 10s wait:2s done:1s)
0:01:10 INF [archive] - 	  write   1x(21% 4s wait:12s done:1s)
0:01:10 INF [archive] - ----------------------------------------
0:01:10 INF [archive] - 	archive	108MB
0:01:10 INF [archive] - 	features	281MB
-rw-r--r-- 1 runner docker 84M Jun 11 12:42 run.jar
0:01:04 DEB [archive] - Tile stats:
0:01:04 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k)
6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:04 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  154   374   443   583   938   339   433   548   773  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   639   712    1k  1.5k  3.1k  5.6k  3.3k  1.7k   795   936  5.6k
            landuse    0     0     0     0   548   694  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   243   782  1.2k  5.9k    8k   24k   17k   19k   65k   48k   34k   65k
           waterway    0     0     0     0   111   118     0     0     0  3.1k  2.4k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.2k    4k  9.7k   19k   13k  8.2k  4.3k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.4k  4.6k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.5k   29k   85k   72k   81k   53k   30k   24k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   664   327   273   220   220   664
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.7k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   501   498   83k   83k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k    6k   20k   42k   85k  203k  185k  135k  114k  128k  244k  244k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   60k  149k  138k   98k   83k   92k  154k  154k
0:01:04 DEB [archive] -    Max tile: 244k (gzipped: 154k)
0:01:04 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:04 DEB [archive] -     # tiles: 4,115,036
0:01:04 DEB [archive] -  # features: 5,485,775
0:01:04 INF [archive] - Finished in 19s cpu:1m10s avg:3.7
0:01:04 INF [archive] -   read    1x(3% 0.5s wait:17s done:1s)
0:01:04 INF [archive] -   encode  4x(54% 10s wait:2s done:1s)
0:01:04 INF [archive] -   write   1x(21% 4s wait:13s done:1s)
0:01:04 INF [archive] - Finished in 1m4s cpu:3m30s gc:1s avg:3.3
0:01:04 INF [archive] - FINISHED!
0:01:04 INF [archive] - 
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - data errors:
0:01:04 INF [archive] - 	render_snap_fix_input	16,669
0:01:04 INF [archive] - 	osm_multipolygon_missing_way	359
0:01:04 INF [archive] - 	osm_boundary_missing_way	73
0:01:04 INF [archive] - 	merge_snap_fix_input	12
0:01:04 INF [archive] - 	osm_boundary_duplicate_member	2
0:01:04 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:04 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:04 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	1
0:01:04 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - 	overall          1m4s cpu:3m30s gc:1s avg:3.3
0:01:04 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.3
0:01:04 INF [archive] - 	  read     1x(23% 0.5s done:2s)
0:01:04 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:04 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:04 INF [archive] - 	water_polygons   15s cpu:42s avg:2.8
0:01:04 INF [archive] - 	  read     1x(43% 6s done:7s)
0:01:04 INF [archive] - 	  process  4x(27% 4s wait:4s done:5s)
0:01:04 INF [archive] - 	  write    1x(4% 0.5s wait:10s done:5s)
0:01:04 INF [archive] - 	natural_earth    6s cpu:12s avg:1.9
0:01:04 INF [archive] - 	  read     1x(96% 6s)
0:01:04 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:04 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:04 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.3
0:01:04 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:04 INF [archive] - 	  parse    4x(34% 0.7s)
0:01:04 INF [archive] - 	  process  1x(70% 1s)
0:01:04 INF [archive] - 	osm_pass2        17s cpu:1m9s avg:4
0:01:04 INF [archive] - 	  read     1x(0% 0s wait:10s done:7s)
0:01:04 INF [archive] - 	  process  4x(75% 13s)
0:01:04 INF [archive] - 	  write    1x(2% 0.4s wait:17s)
0:01:04 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:04 INF [archive] - 	boundaries       0s cpu:0s avg:1.3
0:01:04 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:04 INF [archive] - 	sort             1s cpu:4s avg:2.6
0:01:04 INF [archive] - 	  worker  1x(48% 0.7s)
0:01:04 INF [archive] - 	archive          19s cpu:1m10s avg:3.7
0:01:04 INF [archive] - 	  read    1x(3% 0.5s wait:17s done:1s)
0:01:04 INF [archive] - 	  encode  4x(54% 10s wait:2s done:1s)
0:01:04 INF [archive] - 	  write   1x(21% 4s wait:13s done:1s)
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - 	archive	108MB
0:01:04 INF [archive] - 	features	281MB
-rw-r--r-- 1 runner docker 84M Jun 11 12:44 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/9465725196

@msbarry msbarry marked this pull request as ready for review June 11, 2024 12:24
Copy link

sonarcloud bot commented Jun 11, 2024

@msbarry msbarry merged commit bd5e527 into main Jun 11, 2024
12 checks passed
bdon added a commit to protomaps/basemaps that referenced this pull request Jun 13, 2024
Since onthegomap/planetiler#914 the registerSourceHandler is called implicitly
@bdon bdon mentioned this pull request Jun 13, 2024
bdon added a commit to protomaps/basemaps that referenced this pull request Jun 13, 2024
* Fix build
Since onthegomap/planetiler#914 the registerSourceHandler is called implicitly
* fix build: explicit source handling
* rename processFeature -> processOsm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant