Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tile stats #656

Merged
merged 30 commits into from
Sep 22, 2023
Merged

Tile stats #656

merged 30 commits into from
Sep 22, 2023

Conversation

msbarry
Copy link
Contributor

@msbarry msbarry commented Sep 6, 2023

Expose more detailed tile size statistics from planetiler:

  • Run planetiler with --output-layerstats to output an extra <output file>.layerstats.tsv.gz file (~5% of the original archive size) with a row per layer per tile that can be analyzed using duckdb (see layerstats/README.md)
  • Run java -jar planetiler.jar stats --input=<pmtiles or mbtiles file> --output=layerstats.tsv.gz on an existing archive to compute stats for it.
  • By default planetiler now emits top 10 largest tiles, max layer size by zoom, and weighted average tile size at the end of every run.
  • For either command, add --tile-weights=weights.tsv.gz to point planetiler to a file with z, x, y, loads columns to customize the weights used for weighted-average tile sizes, or use --download-osm-tile-weights to download pre-computed top-1m tiles from opemnstreetmap.org traffic.
  • Generate a custom top-tiles file from openstreetmap.org traffic by running java -jar planetiler.jar top-osm-tiles --days=<# days to fetch> --top=<# tiles to include> --output=weights.tsv.gz

Fixes #391

@github-actions
Copy link

github-actions bot commented Sep 6, 2023

Base e473c42 This Branch f26c7b9
0:01:52 DEB [archive] - Tile stats:
0:01:52 DEB [archive] - z0 avg:7.9k max:7.9k
0:01:52 DEB [archive] - z1 avg:4k max:4k
0:01:52 DEB [archive] - z2 avg:9.4k max:9.4k
0:01:52 DEB [archive] - z3 avg:4k max:6.4k
0:01:52 DEB [archive] - z4 avg:1.6k max:4.6k
0:01:52 DEB [archive] - z5 avg:1.4k max:7.2k
0:01:52 DEB [archive] - z6 avg:973 max:22k
0:01:52 DEB [archive] - z7 avg:769 max:58k
0:01:52 DEB [archive] - z8 avg:418 max:127k
0:01:52 DEB [archive] - z9 avg:282 max:298k
0:01:52 DEB [archive] - z10 avg:161 max:256k
0:01:52 DEB [archive] - z11 avg:106 max:136k
0:01:52 DEB [archive] - z12 avg:85 max:114k
0:01:52 DEB [archive] - z13 avg:72 max:128k
0:01:52 DEB [archive] - z14 avg:68 max:304k
0:01:52 DEB [archive] - all avg:70 max:304k
0:01:52 DEB [archive] -  # features: 5,440,450
0:01:52 DEB [archive] -     # tiles: 4,115,061
0:01:52 INF [archive] - Finished in 30s cpu:58s gc:1s avg:1.9
0:01:52 INF [archive] -   read    1x(2% 0.6s wait:28s)
0:01:52 INF [archive] -   encode  2x(64% 19s)
0:01:52 INF [archive] -   write   1x(13% 4s wait:24s)
0:01:52 INF - Finished in 1m52s cpu:3m22s gc:4s avg:1.8
0:01:52 INF - FINISHED!
0:01:52 INF - 
0:01:52 INF - ----------------------------------------
0:01:52 INF - data errors:
0:01:52 INF - 	render_snap_fix_input	16,475
0:01:52 INF - 	osm_boundary_missing_way	63
0:01:52 INF - 	osm_multipolygon_missing_way	57
0:01:52 INF - 	merge_snap_fix_input	14
0:01:52 INF - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:52 INF - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	2
0:01:52 INF - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:52 INF - ----------------------------------------
0:01:52 INF - 	overall          1m52s cpu:3m22s gc:4s avg:1.8
0:01:52 INF - 	lake_centerlines 3s cpu:6s avg:1.7
0:01:52 INF - 	  read     1x(45% 1s done:1s)
0:01:52 INF - 	  process  2x(6% 0.2s wait:2s done:1s)
0:01:52 INF - 	  write    1x(0% 0s wait:2s done:1s)
0:01:52 INF - 	water_polygons   26s cpu:48s gc:2s avg:1.8
0:01:52 INF - 	  read     1x(55% 15s sys:1s wait:3s)
0:01:52 INF - 	  process  2x(30% 8s wait:12s)
0:01:52 INF - 	  write    1x(2% 0.5s wait:26s)
0:01:52 INF - 	natural_earth    12s cpu:13s avg:1.1
0:01:52 INF - 	  read     1x(63% 7s sys:1s done:4s)
0:01:52 INF - 	  process  2x(10% 1s wait:7s done:3s)
0:01:52 INF - 	  write    1x(0% 0s wait:8s done:3s)
0:01:52 INF - 	osm_pass1        4s cpu:6s avg:1.6
0:01:52 INF - 	  read     1x(1% 0s wait:3s)
0:01:52 INF - 	  parse    1x(61% 2s wait:1s)
0:01:52 INF - 	  process  1x(40% 1s wait:1s)
0:01:52 INF - 	osm_pass2        35s cpu:1m8s avg:2
0:01:52 INF - 	  read     1x(0% 0s wait:19s done:16s)
0:01:52 INF - 	  process  2x(78% 27s)
0:01:52 INF - 	  write    1x(1% 0.4s wait:34s)
0:01:52 INF - 	boundaries       0s cpu:0s avg:1.1
0:01:52 INF - 	sort             2s cpu:3s avg:1.5
0:01:52 INF - 	  worker  1x(88% 2s)
0:01:52 INF - 	archive          30s cpu:58s gc:1s avg:1.9
0:01:52 INF - 	  read    1x(2% 0.6s wait:28s)
0:01:52 INF - 	  encode  2x(64% 19s)
0:01:52 INF - 	  write   1x(13% 4s wait:24s)
0:01:52 INF - ----------------------------------------
0:01:52 INF - 	archive	109MB
0:01:52 INF - 	features	283MB
-rw-r--r-- 1 runner docker 66M Sep 22 01:21 run.jar
0:01:53 DEB [archive] - Tile stats:
0:01:53 DEB [archive] - Biggest tiles (gzipped)
1. 9/154/190 (204k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:173k)
2. 10/308/381 (180k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:141k)
3. 10/308/380 (179k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:129k)
4. 14/4942/6092 (173k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (building:142k)
5. 14/4940/6092 (135k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:212k)
6. 14/4941/6093 (128k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:133k)
7. 14/4940/6091 (125k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.44409 (building:204k)
8. 14/4941/6092 (124k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (building:94k)
9. 14/4942/6091 (122k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:175k)
10. 14/4940/6093 (119k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.44409 (building:164k)
0:01:53 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  154   374   445   581   936   340   432   545   545  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   25k   15k   13k   17k   15k   12k   25k
              place    0     0   377   377   377   586   675   963  1.6k  3.3k  5.7k  3.1k  1.7k   789   947  5.7k
            landuse    0     0     0     0   679   745  1.3k    6k   17k   44k   59k   50k   38k   23k   14k   59k
     transportation    0     0     0     0  1.1k  1.9k  2.9k  9.5k   12k   32k   22k   23k   64k   47k   33k   64k
           waterway    0     0     0     0   111   118     0     0     0  3.4k  2.3k    2k  1.6k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.1k  3.7k  9.4k   18k   12k  7.4k  4.3k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   359   454  1.2k  1.7k  4.9k  3.9k  3.7k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0   23k   67k  173k  141k   81k   53k   30k   24k  173k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   487   462   434   445   549    1k    1k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   674   327   273   220   220   674
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.7k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   501   498   67k   67k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   60k  212k  212k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   55k   55k
          full tile 7.9k    4k  9.4k  6.4k  4.6k  7.2k   22k   58k  127k  298k  256k  136k  114k  128k  304k  304k
            gzipped 6.2k  3.6k  7.1k  5.2k  3.5k  5.4k   15k   38k   84k  204k  180k  100k   84k   93k  173k  204k
0:01:53 DEB [archive] -    Max tile: 304k (gzipped: 204k)
0:01:53 DEB [archive] -    Avg tile: 68 (gzipped: 87) no tile weights, use --download-osm-tile-weights for weighted average
0:01:53 DEB [archive] -     # tiles: 4,115,061
0:01:53 DEB [archive] -  # features: 5,440,450
0:01:53 INF [archive] - Finished in 30s cpu:58s gc:1s avg:1.9
0:01:53 INF [archive] -   read    1x(2% 0.6s wait:28s)
0:01:53 INF [archive] -   encode  2x(63% 19s)
0:01:53 INF [archive] -   write   1x(13% 4s wait:24s)
0:01:53 INF - Finished in 1m54s cpu:3m21s gc:4s avg:1.8
0:01:53 INF - FINISHED!
0:01:53 INF - 
0:01:53 INF - ----------------------------------------
0:01:53 INF - data errors:
0:01:53 INF - 	render_snap_fix_input	16,475
0:01:53 INF - 	osm_boundary_missing_way	63
0:01:53 INF - 	osm_multipolygon_missing_way	57
0:01:53 INF - 	merge_snap_fix_input	14
0:01:53 INF - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:53 INF - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	2
0:01:53 INF - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:53 INF - ----------------------------------------
0:01:53 INF - 	overall          1m54s cpu:3m21s gc:4s avg:1.8
0:01:53 INF - 	lake_centerlines 3s cpu:6s avg:1.8
0:01:53 INF - 	  read     1x(44% 2s done:2s)
0:01:53 INF - 	  process  2x(6% 0.2s wait:2s done:2s)
0:01:53 INF - 	  write    1x(0% 0s wait:2s done:1s)
0:01:53 INF - 	water_polygons   27s cpu:48s gc:2s avg:1.8
0:01:53 INF - 	  read     1x(54% 15s wait:4s)
0:01:53 INF - 	  process  2x(30% 8s wait:12s)
0:01:53 INF - 	  write    1x(2% 0.5s wait:26s)
0:01:53 INF - 	natural_earth    13s cpu:13s avg:1
0:01:53 INF - 	  read     1x(59% 8s sys:1s done:4s)
0:01:53 INF - 	  process  2x(9% 1s wait:9s done:4s)
0:01:53 INF - 	  write    1x(0% 0s wait:9s done:4s)
0:01:53 INF - 	osm_pass1        3s cpu:6s avg:1.7
0:01:53 INF - 	  read     1x(1% 0s wait:3s)
0:01:53 INF - 	  parse    1x(65% 2s)
0:01:53 INF - 	  process  1x(41% 1s wait:2s)
0:01:53 INF - 	osm_pass2        34s cpu:1m5s avg:1.9
0:01:53 INF - 	  read     1x(0% 0s wait:18s done:15s)
0:01:53 INF - 	  process  2x(78% 26s)
0:01:53 INF - 	  write    1x(1% 0.4s wait:33s)
0:01:53 INF - 	boundaries       0s cpu:0s avg:1.3
0:01:53 INF - 	sort             2s cpu:3s avg:1.2
0:01:53 INF - 	  worker  1x(73% 2s)
0:01:53 INF - 	archive          30s cpu:58s gc:1s avg:1.9
0:01:53 INF - 	  read    1x(2% 0.6s wait:28s)
0:01:53 INF - 	  encode  2x(63% 19s)
0:01:53 INF - 	  write   1x(13% 4s wait:24s)
0:01:53 INF - ----------------------------------------
0:01:53 INF - 	archive	109MB
0:01:53 INF - 	features	283MB
-rw-r--r-- 1 runner docker 66M Sep 22 01:19 run.jar

https://github.com/onthegomap/planetiler/actions/runs/6268905270

ℹ️ Base Logs e473c42
0:00:00 DEB - argument: config=null (path to config file)
0:00:00 DEB - argument: area=rhode island (name of the extract to download if osm_url/osm_path not specified (i.e. 'monaco' 'rhode island' 'australia' or 'planet'))
0:00:00 INF - argument: stats=use in-memory stats
0:00:00 DEB - argument: madvise=true (default value for whether to use linux madvise(random) to improve memory-mapped read performance for temporary storage)
0:00:00 DEB - argument: storage=mmap (default storage type for temporary data, one of [ram, mmap, direct])
0:00:00 DEB - argument: threads=2 (num threads)
0:00:00 DEB - argument: write_threads=1 (number of threads to use when writing temp features)
0:00:00 DEB - argument: process_threads=2 (number of threads to use when processing input features)
0:00:00 DEB - argument: bounds=Env[-74.07 : -17.84, 21.34 : 43.55] (bounds)
0:00:00 DEB - argument: polygon=null (a .poly file that limits output to tiles intersecting the shape)
0:00:00 DEB - argument: minzoom=0 (minimum zoom level)
0:00:00 DEB - argument: maxzoom=14 (maximum zoom level up to 15)
0:00:00 DEB - argument: render_maxzoom=14 (maximum rendering zoom level up to 15)
0:00:00 DEB - argument: feature_read_threads=1 (number of threads to use when reading features at tile write time)
0:00:00 DEB - argument: tile_write_threads=1 (number of threads used to write tiles - only supported by [csv, tsv, proto, pbf, json])
0:00:00 DEB - argument: loginterval=10 seconds (time between logs)
0:00:00 DEB - argument: force=false (overwriting output file and ignore disk/RAM warnings)
0:00:00 DEB - argument: append=false (append to the output file - only supported by [csv, tsv, proto, pbf, json])
0:00:00 DEB - argument: gzip_temp=false (gzip temporary feature storage (uses more CPU, but less disk space))
0:00:00 DEB - argument: mmap_temp=true (use memory-mapped IO for temp feature files)
0:00:00 DEB - argument: sort_max_readers=6 (maximum number of concurrent read threads to use when sorting chunks)
0:00:00 DEB - argument: sort_max_writers=6 (maximum number of concurrent write threads to use when sorting chunks)
0:00:00 DEB - argument: nodemap_type=sparsearray (type of node location map, one of [noop, sortedtable, sparsearray, array])
0:00:00 DEB - argument: nodemap_storage=mmap (storage for node location map, one of [ram, mmap, direct])
0:00:00 DEB - argument: nodemap_madvise=true (use linux madvise(random) for node locations)
0:00:00 DEB - argument: multipolygon_geometry_storage=mmap (storage for multipolygon geometries, one of [ram, mmap, direct])
0:00:00 DEB - argument: multipolygon_geometry_madvise=true (use linux madvise(random) for temporary multipolygon geometry storage)
0:00:00 DEB - argument: http_user_agent=Planetiler downloader (https://github.com/onthegomap/planetiler) (User-Agent header to set when downloading files over HTTP)
0:00:00 DEB - argument: http_timeout=30 seconds (Timeout to use when downloading files over HTTP)
0:00:00 DEB - argument: http_retries=1 (Retries to use when downloading files over HTTP)
0:00:00 DEB - argument: download_chunk_size_mb=100 (Size of file chunks to download in parallel in megabytes)
0:00:00 DEB - argument: download_threads=1 (Number of parallel threads to use when downloading each file)
0:00:00 DEB - argument: download_max_bandwidth= (Maximum bandwidth to consume when downloading files in units mb/s, mbps, kbps, etc.)
0:00:00 DEB - argument: min_feature_size_at_max_zoom=0.0625 (Default value for the minimum size in tile pixels of features to emit at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: min_feature_size=1.0 (Default value for the minimum size in tile pixels of features to emit below the maximum zoom level)
0:00:00 DEB - argument: simplify_tolerance_at_max_zoom=0.0625 (Default value for the tile pixel tolerance to use when simplifying features at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: simplify_tolerance=0.1 (Default value for the tile pixel tolerance to use when simplifying features below the maximum zoom level)
0:00:00 DEB - argument: osm_lazy_reads=true (Read OSM blocks from disk in worker threads)
0:00:00 DEB - argument: skip_filled_tiles=false (Skip writing tiles containing only polygon fills to the output)
0:00:00 DEB - argument: tile_warning_size_mb=1.0 (Maximum size in megabytes of a tile to emit a warning about)
0:00:00 DEB - argument: color=null (Color the terminal output)
0:00:00 DEB - argument: keep_unzipped=true (keep unzipped sources by default after reading)
0:00:00 DEB - argument: tile_compression=gzip (the tile compression, one of [gzip, none])
0:00:00 DEB - argument: tmpdir=data/tmp (temp directory)
0:00:00 DEB - argument: only_download=false (download source data then exit)
0:00:00 DEB - argument: download=false (download sources)
0:00:00 DEB - argument: temp_nodes=data/tmp/node.db (temp node db location)
0:00:00 DEB - argument: temp_multipolygons=data/tmp/multipolygon.db (temp multipolygon db location)
0:00:00 DEB - argument: temp_features=data/tmp/feature.db (temp feature db location)
0:00:00 DEB - argument: osm_parse_node_bounds=false (parse bounds from OSM nodes instead of header)
0:00:00 DEB - argument: only_fetch_wikidata=false (fetch wikidata translations then quit)
0:00:00 DEB - argument: fetch_wikidata=false (fetch wikidata translations then continue)
0:00:00 DEB - argument: use_wikidata=true (use wikidata translations)
0:00:00 DEB - argument: wikidata_cache=data/sources/wikidata_names.json (wikidata cache file)
0:00:00 DEB - argument: lake_centerlines_path=data/sources/lake_centerline.shp.zip (lake_centerlines shapefile path)
0:00:00 DEB - argument: free_lake_centerlines_after_read=false (delete lake_centerlines input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: water_polygons_path=data/sources/water-polygons-split-3857.zip (water_polygons shapefile path)
0:00:00 DEB - argument: free_water_polygons_after_read=false (delete water_polygons input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: natural_earth_path=data/sources/natural_earth_vector.sqlite.zip (natural_earth sqlite db path)
0:00:00 DEB - argument: free_natural_earth_after_read=false (delete natural_earth input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: natural_earth_keep_unzipped=true (keep unzipped natural_earth after reading)
0:00:00 DEB - argument: osm_path=data/sources/rhode_island.osm.pbf (osm OSM input file path)
0:00:00 DEB - argument: free_osm_after_read=false (delete osm input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: output=data/out.mbtiles (output tile archive path)
0:00:00 DEB - argument: version=false (show version then exit)
0:00:00 INF - Planetiler build git hash: e473c429c442d8a044f11e59e4990e2a8dbbdd14
0:00:00 INF - Planetiler build version: 0.6-SNAPSHOT
0:00:00 INF - Planetiler build timestamp: 2023-09-22T01:19:20.756Z
0:00:00 DEB - argument: transliterate=true (attempt to transliterate latin names)
0:00:00 DEB - argument: languages=am,ar,az,be,bg,br,bs,ca,co,cs,cy,da,de,el,en,eo,es,et,eu,fi,fr,fy,ga,gd,he,hi,hr,hu,hy,id,is,it,ja,ja_kana,ja_rm,ja-Latn,ja-Hira,ka,kk,kn,ko,ko-Latn,ku,la,lb,lt,lv,mk,mt,ml,nl,no,oc,pl,pt,rm,ro,ru,sk,sl,sq,sr,sr-Latn,sv,ta,te,th,tr,uk,zh (languages to use)
0:00:00 DEB - argument: only_layers= (Include only certain layers)
0:00:00 DEB - argument: exclude_layers= (Exclude certain layers)
0:00:00 DEB - argument: boundary_country_names=true (boundary layer: add left/right codes of neighboring countries)
0:00:00 DEB - argument: boundary_osm_only=false (boundary layer: only use OSM, even at low zoom levels)
0:00:00 DEB - argument: transportation_z13_paths=false (transportation(_name) layer: show all paths on z13)
0:00:00 DEB - argument: building_merge_z13=true (building layer: merge nearby buildings at z13)
0:00:00 DEB - argument: transportation_name_brunnel=false (transportation_name layer: set to false to omit brunnel and help merge long highways)
0:00:00 DEB - argument: transportation_name_size_for_shield=false (transportation_name layer: allow road names on shorter segments (ie. they will have a shield))
0:00:00 DEB - argument: transportation_name_limit_merge=false (transportation_name layer: limit merge so we don't combine different relations to help merge long highways)
0:00:00 DEB - argument: transportation_name_minor_refs=false (transportation_name layer: include name and refs from minor road networks if not present on a way)
0:00:00 DEB - argument: help=false (show arguments then exit)
0:00:00 INF - Building OpenMapTilesProfile profile into file:///home/runner/work/planetiler/planetiler/data/out.mbtiles in these phases:
0:00:00 INF -   lake_centerlines: Process features in data/sources/lake_centerline.shp.zip
0:00:00 INF -   water_polygons: Process features in data/sources/water-polygons-split-3857.zip
0:00:00 INF -   natural_earth: Process features in data/sources/natural_earth_vector.sqlite.zip
0:00:00 INF -   osm_pass1: Pre-process OpenStreetMap input (store node locations then relation members)
0:00:00 INF -   osm_pass2: Process OpenStreetMap nodes, ways, then relations
0:00:00 INF -   sort: Sort rendered features by tile ID
0:00:00 INF -   archive: Encode each tile and write to TileArchiveConfig[format=MBTILES, scheme=FILE, uri=file:///home/runner/work/planetiler/planetiler/data/out.mbtiles, options={}]
0:00:00 INF - no wikidata translations found, run with --fetch-wikidata to download
0:00:00 DEB - ✓ 197M storage on / (/dev/root) requested for read phase disk, 19G available
0:00:00 DEB -  - 44M used for temporary node location cache
0:00:00 DEB -  - 6.7M used for temporary multipolygon geometry cache
0:00:00 DEB -  - 146M used for temporary feature storage
0:00:00 DEB - ✓ 219M storage on / (/dev/root) requested for write phase disk, 19G available
0:00:00 DEB -  - 146M used for temporary feature storage
0:00:00 DEB -  - 73M used for archive output
0:00:00 DEB - ✓ 313M JVM heap requested for read phase, 4.2G available
0:00:00 DEB -  - 300M used for sparsearray node location in-memory index
0:00:00 DEB -  - 13M used for temporary profile storage
0:00:00 DEB - ✓ 51M storage on / (/dev/root) requested for read phase, 19G available
0:00:00 DEB -  - 44M used for sparsearray node location cache
0:00:00 DEB -  - 6.7M used for multipolygon way geometries
0:00:00 DEB - ✓ 51M temporary files and 2.9G of free memory for OS to cache them
0:00:00 DEB - argument: archive_name=OpenMapTiles ('name' attribute for tileset metadata)
0:00:00 DEB - argument: archive_description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org ('description' attribute for tileset metadata)
0:00:00 DEB - argument: archive_attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a> ('attribution' attribute for tileset metadata)
0:00:00 DEB - argument: archive_version=3.14.0 ('version' attribute for tileset metadata)
0:00:00 DEB - argument: archive_type=baselayer ('type' attribute for tileset metadata)
0:00:00 DEB - argument: archive_format=pbf ('format' attribute for tileset metadata)
0:00:00 DEB - argument: compact=true (mbtiles: reduce the DB size by separating and deduping the tile data)
0:00:00 DEB - argument: no_index=false (mbtiles: skip adding index to sqlite DB)
0:00:00 DEB - argument: vacuum_analyze=false (mbtiles: vacuum analyze sqlite DB after writing)
0:00:00 INF - Using merge sort feature map, chunk size=1431mb max workers=2
0:00:01 INF [lake_centerlines] - 
0:00:01 INF [lake_centerlines] - Starting...
0:00:04 INF [lake_centerlines] -  read: [  59k 100%  30k/s ] write: [    0    0/s ] 0    
    cpus: 1.7 gc:  3% heap: 174M/4.2G direct: 237k postGC: 78M
    ->     (0/3) -> read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:04 INF [lake_centerlines] - Finished in 3s cpu:6s avg:1.7
0:00:04 INF [lake_centerlines] -   read     1x(45% 1s done:1s)
0:00:04 INF [lake_centerlines] -   process  2x(6% 0.2s wait:2s done:1s)
0:00:04 INF [lake_centerlines] -   write    1x(0% 0s wait:2s done:1s)
0:00:04 INF [water_polygons] - 
0:00:04 INF [water_polygons] - Starting...
0:00:14 INF [water_polygons] -  read: [ 2.4k  17%  244/s ] write: [  94k 9.3k/s ] 1.4G 
    cpus: 1.9 gc: 10% heap: 1.1G/4.2G direct: 54M postGC: 1G
    ->     (0/3) -> read(57%) ->    (0/1k) -> process(35% 20%) ->  (1k/53k) -> write( 0%)
0:00:24 INF [water_polygons] -  read: [ 5.7k  39%  328/s ] write: [ 324k  22k/s ] 1.4G 
    cpus: 1.7 gc:  7% heap: 2G/4.2G direct: 54M postGC: 1.5G
    ->     (0/3) -> read(70%) ->    (0/1k) -> process(21% 19%) -> (409/53k) -> write( 0%)
0:00:30 INF [water_polygons] -  read: [  14k 100% 1.3k/s ] write: [ 4.3M 625k/s ] 193M 
    cpus: 1.8 gc:  9% heap: 2.2G/4.2G direct: 54M postGC: 1.9G
    ->     (0/3) -> read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:30 INF [water_polygons] - Finished in 26s cpu:48s gc:2s avg:1.8
0:00:30 INF [water_polygons] -   read     1x(55% 15s sys:1s wait:3s)
0:00:30 INF [water_polygons] -   process  2x(30% 8s wait:12s)
0:00:30 INF [water_polygons] -   write    1x(2% 0.5s wait:26s)
0:00:30 INF [natural_earth] - 
0:00:30 INF [natural_earth] - Starting...
0:00:42 INF [natural_earth] -  read: [ 349k 100%  43k/s ] write: [  181   22/s ] 193M 
    cpus: 1.5 gc:  1% heap: 140M/4.2G direct: 54M postGC: 168M
    ->     (0/3) -> read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:42 INF [natural_earth] - Finished in 12s cpu:13s avg:1.1
0:00:42 INF [natural_earth] -   read     1x(63% 7s sys:1s done:4s)
0:00:42 INF [natural_earth] -   process  2x(10% 1s wait:7s done:3s)
0:00:42 INF [natural_earth] -   write    1x(0% 0s wait:8s done:3s)
0:00:42 INF [osm_pass1] - 
0:00:42 INF [osm_pass1] - Starting...
0:00:45 INF [osm_pass1:process] - Finished nodes: 4,641,468 (1.7M/s) in 3s cpu:4s avg:1.4
0:00:45 INF [osm_pass1:process] - Finished ways: 345,544 (464k/s) in 0.7s cpu:1s avg:2
0:00:46 INF [osm_pass1:process] - Finished relations: 5,863 (52k/s) in 0.1s cpu:0.2s avg:2
0:00:46 INF [osm_pass1] -  nodes: [ 4.6M 1.2M/s ] 480M  ways: [ 345k  96k/s ] rels: [ 5.8k 1.6k/s ] blocks: [  626  174/s ]
    cpus: 1.6 gc:  0% heap: 3.3G/4.2G direct: 54M postGC: 234M hppc: 461k
    read( -%) ->     (0/4) -> parse( -%) ->     (0/4) -> process( -%)
0:00:46 DEB [osm_pass1] - Processed 626 blocks:
0:00:46 DEB [osm_pass1] -   nodes: 4,641,468 (1.7M/s) in 3s cpu:4s avg:1.4
0:00:46 DEB [osm_pass1] -   ways: 345,544 (464k/s) in 0.7s cpu:1s avg:2
0:00:46 DEB [osm_pass1] -   relations: 5,863 (52k/s) in 0.1s cpu:0.2s avg:2
0:00:46 INF [osm_pass1] - Finished in 4s cpu:6s avg:1.6
0:00:46 INF [osm_pass1] -   read     1x(1% 0s wait:3s)
0:00:46 INF [osm_pass1] -   parse    1x(61% 2s wait:1s)
0:00:46 INF [osm_pass1] -   process  1x(40% 1s wait:1s)
0:00:46 INF [osm_pass2] - 
0:00:46 INF [osm_pass2] - Starting...
0:00:48 DEB [osm_pass2:process] - Sorting long long multimap...
0:00:48 INF [osm_pass2:process] - Finished nodes: 4,641,468 (1.7M/s) in 3s cpu:5s avg:2
0:00:48 DEB [osm_pass2:process] - Sorted long long multimap 0s cpu:0s avg:2
0:00:48 WAR [osm_pass2:process] - No GB polygon for inferring route network types
0:00:56 INF [osm_pass2] -  nodes: [ 4.6M 100% 463k/s ] 480M  ways: [ 157k  46%  15k/s ] rels: [    0   0%    0/s ] features: [ 4.8M  58k/s ] 1.6G  blocks: [  600  96%   59/s ]
    cpus: 2 gc:  1% heap: 2.8G/4.2G direct: 54M postGC: 946M relInfo: 420k mpGeoms: 474k 
    read( 0%) ->   (11/13) -> process(63% 67%) -> (290/53k) -> write( 2%)
0:01:06 INF [osm_pass2] -  nodes: [ 4.6M 100%    0/s ] 480M  ways: [ 292k  85%  13k/s ] rels: [    0   0%    0/s ] features: [ 5.2M  37k/s ] 1.6G  blocks: [  616  98%    1/s ]
    cpus: 2 gc:  1% heap: 1.8G/4.2G direct: 54M postGC: 952M relInfo: 420k mpGeoms: 17M  
    read( -%) ->    (8/13) -> process(82% 82%) -> (368/53k) -> write( 1%)
0:01:07 INF [osm_pass2:process] - Finished ways: 345,544 (18k/s) in 19s cpu:37s avg:2
0:01:16 INF [osm_pass2] -  nodes: [ 4.6M 100%    0/s ] 480M  ways: [ 345k 100% 5.3k/s ] rels: [ 4.5k  77%  450/s ] features: [ 5.4M  15k/s ] 1.6G  blocks: [  625 100%   <1/s ]
    cpus: 2 gc:  1% heap: 2.7G/4.2G direct: 54M postGC: 951M relInfo: 420k mpGeoms: 18M  
    read( -%) ->    (0/13) -> process(77% 83%) -> (975/53k) -> write( 1%)
0:01:16 INF [osm_pass2:process] - Finished relations: 5,863 (629/s) in 9s cpu:18s avg:2
0:01:20 INF [osm_pass2] -  nodes: [ 4.6M 100%    0/s ] 480M  ways: [ 345k 100%    0/s ] rels: [ 5.8k 100%  299/s ] features: [ 5.4M 2.2k/s ] 283M  blocks: [  626 100%   <1/s ]
    cpus: 1.9 gc:  1% heap: 3.2G/4.2G direct: 54M postGC: 942M relInfo: 420k mpGeoms: 18M  
    read( -%) ->    (0/13) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:01:20 DEB [osm_pass2] - Processed 626 blocks:
0:01:20 DEB [osm_pass2] -   nodes: 4,641,468 (1.7M/s) in 3s cpu:5s avg:2
0:01:20 DEB [osm_pass2] -   ways: 345,544 (18k/s) in 19s cpu:37s avg:2
0:01:20 DEB [osm_pass2] -   relations: 5,863 (629/s) in 9s cpu:18s avg:2
0:01:20 INF [osm_pass2] - Finished in 35s cpu:1m8s avg:2
0:01:20 INF [osm_pass2] -   read     1x(0% 0s wait:19s done:16s)
0:01:20 INF [osm_pass2] -   process  2x(78% 27s)
0:01:20 INF [osm_pass2] -   write    1x(1% 0.4s wait:34s)
0:01:20 INF [boundaries] - 
0:01:20 INF [boundaries] - Starting...
0:01:20 INF [boundaries] - Creating polygons for 1 boundaries
0:01:20 WAR [boundaries] - Unable to form closed polygon for OSM relation 148838 (likely missing edges)
0:01:20 INF [boundaries] - Finished creating 0 country polygons
0:01:20 INF [boundaries] - Finished in 0s cpu:0s avg:1.1
0:01:20 INF - Deleting node.db to make room for output file
0:01:20 INF [sort] - 
0:01:20 INF [sort] - Starting...
0:01:20 INF [sort] - Grouped 8 chunks into 1
0:01:22 INF [sort] -  chunks: [   1 /   1 100% ] 283M 
    cpus: 1.5 gc:  7% heap: 1.3G/4.2G direct: 54M postGC: 1.1G
    ->     (0/3) -> worker( -%)
0:01:22 INF [sort] - Finished in 2s cpu:3s avg:1.5
0:01:22 INF [sort] -   worker  1x(88% 2s)
0:01:22 INF - read:0s write:0s sort:0s
0:01:22 INF [archive] - 
0:01:22 INF [archive] - Starting...
0:01:22 DEB [archive:write] - Execute mbtiles: create table metadata (name text, value text);
0:01:23 DEB [archive:write] - Execute mbtiles: create unique index name on metadata (name);
0:01:23 DEB [archive:write] - Execute mbtiles: create table tiles_shallow (
  zoom_level integer,
  tile_column integer,
  tile_row integer,
  tile_data_id integer

  , primary key(zoom_level,tile_column,tile_row)

) without rowid

0:01:23 DEB [archive:write] - Execute mbtiles: create table tiles_data (
  tile_data_id integer primary key,
  tile_data blob
)

0:01:23 DEB [archive:write] - Execute mbtiles: create view tiles AS
select
  tiles_shallow.zoom_level as zoom_level,
  tiles_shallow.tile_column as tile_column,
  tiles_shallow.tile_row as tile_row,
  tiles_data.tile_data as tile_data
from tiles_shallow
join tiles_data on tiles_shallow.tile_data_id = tiles_data.tile_data_id

0:01:23 DEB [archive:write] - Set mbtiles metadata: format=pbf
0:01:23 DEB [archive:write] - Set mbtiles metadata: center=-45.955,32.445,3
0:01:23 DEB [archive:write] - Set mbtiles metadata: bounds=-74.07,21.34,-17.84,43.55
0:01:23 DEB [archive:write] - Set mbtiles metadata: json={"vector_layers":[{"id":"aerodrome_label","fields":{"name_int":"String","iata":"String","ele_ft":"Number","name_de":"String","name":"String","icao":"String","name:en":"String","class":"String","name_en":"String","name:latin":"String","ele":"Number"},"minzoom":10,"maxzoom":14},{"id":"aeroway","fields":{"ref":"String","class":"String"},"minzoom":10,"maxzoom":14},{"id":"boundary","fields":{"disputed":"Number","admin_level":"Number","maritime":"Number"},"minzoom":0,"maxzoom":14},{"id":"building","fields":{"colour":"String","render_height":"Number","render_min_height":"Number","hide_3d":"Boolean"},"minzoom":13,"maxzoom":14},{"id":"housenumber","fields":{"housenumber":"String"},"minzoom":14,"maxzoom":14},{"id":"landcover","fields":{"subclass":"String","class":"String","_numpoints":"Number"},"minzoom":7,"maxzoom":14},{"id":"landuse","fields":{"class":"String"},"minzoom":4,"maxzoom":14},{"id":"mountain_peak","fields":{"name_int":"String","customary_ft":"Number","ele_ft":"Number","name_de":"Str... 2358 more characters
0:01:23 DEB [archive:write] - Set mbtiles metadata: name=OpenMapTiles
0:01:23 DEB [archive:write] - Set mbtiles metadata: description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org
0:01:23 DEB [archive:write] - Set mbtiles metadata: attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a>
0:01:23 DEB [archive:write] - Set mbtiles metadata: version=3.14.0
0:01:23 DEB [archive:write] - Set mbtiles metadata: type=baselayer
0:01:23 DEB [archive:write] - Set mbtiles metadata: minzoom=0
0:01:23 DEB [archive:write] - Set mbtiles metadata: maxzoom=14
0:01:23 DEB [archive:write] - Set mbtiles metadata: compression=gzip
0:01:23 DEB [archive:write] - Set mbtiles metadata: planetiler:version=0.6-SNAPSHOT
0:01:23 DEB [archive:write] - Set mbtiles metadata: planetiler:githash=e473c429c442d8a044f11e59e4990e2a8dbbdd14
0:01:23 DEB [archive:write] - Set mbtiles metadata: planetiler:buildtime=2023-09-22T01:19:20.756Z
0:01:23 DEB [archive:write] - Set mbtiles metadata: planetiler:osm:osmosisreplicationtime=2023-09-21T20:21:26Z
0:01:23 DEB [archive:write] - Set mbtiles metadata: planetiler:osm:osmosisreplicationseq=3829
0:01:23 DEB [archive:write] - Set mbtiles metadata: planetiler:osm:osmosisreplicationurl=http://download.geofabrik.de/north-america/us/rhode-island-updates
0:01:25 INF [archive:write] - Starting z0
0:01:25 INF [archive:write] - Finished z0 in 0s cpu:0s avg:0, now starting z1
0:01:25 INF [archive:write] - Finished z1 in 0s cpu:0s avg:0, now starting z2
0:01:25 INF [archive:write] - Finished z2 in 0s cpu:0s avg:0, now starting z3
0:01:25 INF [archive:write] - Finished z3 in 0s cpu:0s avg:0, now starting z4
0:01:25 INF [archive:write] - Finished z4 in 0s cpu:0s avg:0, now starting z5
0:01:25 INF [archive:write] - Finished z5 in 0s cpu:0s avg:55.1, now starting z6
0:01:25 INF [archive:write] - Finished z6 in 0s cpu:0s avg:0, now starting z7
0:01:25 INF [archive:write] - Finished z7 in 0.9s cpu:2s avg:2, now starting z8
0:01:27 INF [archive:write] - Finished z8 in 2s cpu:4s avg:2, now starting z9
0:01:31 INF [archive:write] - Finished z9 in 3s cpu:6s avg:2, now starting z10
0:01:32 INF [archive:write] - Finished z10 in 1s cpu:2s avg:2, now starting z11
0:01:32 INF [archive] -  features: [ 174k   3%  17k/s ] 283M  tiles: [  18k 1.8k/s ] 2.7M 
    cpus: 2 gc:  2% heap: 1.8G/4.2G direct: 54M postGC: 1.4G
    read( 1%) -> (214/217) -> encode(60% 61%) -> (215/216) -> write( 1%)
    last tile: 11/616/762 (z11 4%) https://www.openstreetmap.org/#map=11/41.77131/-71.71875
0:01:34 INF [archive:write] - Finished z11 in 3s cpu:6s avg:2, now starting z12
0:01:38 INF [archive:write] - Finished z12 in 4s cpu:8s avg:2, now starting z13
0:01:43 INF [archive] -  features: [ 856k  16%  68k/s ] 283M  tiles: [ 291k  27k/s ] 13M  
    cpus: 2 gc:  2% heap: 2.2G/4.2G direct: 54M postGC: 1.5G
    read( 1%) -> (214/217) -> encode(69% 68%) -> (215/216) -> write( 3%)
    last tile: 13/2470/3047 (z13 5%) https://www.openstreetmap.org/#map=13/41.80408/-71.45508
0:01:46 INF [archive:write] - Finished z13 in 8s cpu:16s avg:2, now starting z14
0:01:52 DEB [archive:write] - Shallow tiles written: 4,115,061
0:01:52 DEB [archive:write] - Tile data written: 17,918 (100% omitted)
0:01:52 DEB [archive:write] - Unique tile hashes: 9,044
0:01:52 INF [archive:write] - Finished z14 in 5s cpu:10s avg:1.9
0:01:52 INF [archive] -  features: [ 5.4M 100% 506k/s ] 283M  tiles: [ 4.1M 422k/s ] 109M 
    cpus: 1.9 gc:  3% heap: 2G/4.2G direct: 54M postGC: 1.5G
    read( -%) ->   (0/217) -> encode( -%  -%) ->   (0/216) -> write( -%)
    last tile: 14/7380/5985 (z14 100%) https://www.openstreetmap.org/#map=14/43.56447/-17.84180
0:01:52 DEB [archive] - Tile stats:
0:01:52 DEB [archive] - z0 avg:7.9k max:7.9k
0:01:52 DEB [archive] - z1 avg:4k max:4k
0:01:52 DEB [archive] - z2 avg:9.4k max:9.4k
0:01:52 DEB [archive] - z3 avg:4k max:6.4k
0:01:52 DEB [archive] - z4 avg:1.6k max:4.6k
0:01:52 DEB [archive] - z5 avg:1.4k max:7.2k
0:01:52 DEB [archive] - z6 avg:973 max:22k
0:01:52 DEB [archive] - z7 avg:769 max:58k
0:01:52 DEB [archive] - z8 avg:418 max:127k
0:01:52 DEB [archive] - z9 avg:282 max:298k
0:01:52 DEB [archive] - z10 avg:161 max:256k
0:01:52 DEB [archive] - z11 avg:106 max:136k
0:01:52 DEB [archive] - z12 avg:85 max:114k
0:01:52 DEB [archive] - z13 avg:72 max:128k
0:01:52 DEB [archive] - z14 avg:68 max:304k
0:01:52 DEB [archive] - all avg:70 max:304k
0:01:52 DEB [archive] -  # features: 5,440,450
0:01:52 DEB [archive] -     # tiles: 4,115,061
0:01:52 INF [archive] - Finished in 30s cpu:58s gc:1s avg:1.9
0:01:52 INF [archive] -   read    1x(2% 0.6s wait:28s)
0:01:52 INF [archive] -   encode  2x(64% 19s)
0:01:52 INF [archive] -   write   1x(13% 4s wait:24s)
0:01:52 INF - Finished in 1m52s cpu:3m22s gc:4s avg:1.8
0:01:52 INF - FINISHED!
0:01:52 INF - 
0:01:52 INF - ----------------------------------------
0:01:52 INF - data errors:
0:01:52 INF - 	render_snap_fix_input	16,475
0:01:52 INF - 	osm_boundary_missing_way	63
0:01:52 INF - 	osm_multipolygon_missing_way	57
0:01:52 INF - 	merge_snap_fix_input	14
0:01:52 INF - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:52 INF - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	2
0:01:52 INF - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:52 INF - ----------------------------------------
0:01:52 INF - 	overall          1m52s cpu:3m22s gc:4s avg:1.8
0:01:52 INF - 	lake_centerlines 3s cpu:6s avg:1.7
0:01:52 INF - 	  read     1x(45% 1s done:1s)
0:01:52 INF - 	  process  2x(6% 0.2s wait:2s done:1s)
0:01:52 INF - 	  write    1x(0% 0s wait:2s done:1s)
0:01:52 INF - 	water_polygons   26s cpu:48s gc:2s avg:1.8
0:01:52 INF - 	  read     1x(55% 15s sys:1s wait:3s)
0:01:52 INF - 	  process  2x(30% 8s wait:12s)
0:01:52 INF - 	  write    1x(2% 0.5s wait:26s)
0:01:52 INF - 	natural_earth    12s cpu:13s avg:1.1
0:01:52 INF - 	  read     1x(63% 7s sys:1s done:4s)
0:01:52 INF - 	  process  2x(10% 1s wait:7s done:3s)
0:01:52 INF - 	  write    1x(0% 0s wait:8s done:3s)
0:01:52 INF - 	osm_pass1        4s cpu:6s avg:1.6
0:01:52 INF - 	  read     1x(1% 0s wait:3s)
0:01:52 INF - 	  parse    1x(61% 2s wait:1s)
0:01:52 INF - 	  process  1x(40% 1s wait:1s)
0:01:52 INF - 	osm_pass2        35s cpu:1m8s avg:2
0:01:52 INF - 	  read     1x(0% 0s wait:19s done:16s)
0:01:52 INF - 	  process  2x(78% 27s)
0:01:52 INF - 	  write    1x(1% 0.4s wait:34s)
0:01:52 INF - 	boundaries       0s cpu:0s avg:1.1
0:01:52 INF - 	sort             2s cpu:3s avg:1.5
0:01:52 INF - 	  worker  1x(88% 2s)
0:01:52 INF - 	archive          30s cpu:58s gc:1s avg:1.9
0:01:52 INF - 	  read    1x(2% 0.6s wait:28s)
0:01:52 INF - 	  encode  2x(64% 19s)
0:01:52 INF - 	  write   1x(13% 4s wait:24s)
0:01:52 INF - ----------------------------------------
0:01:52 INF - 	archive	109MB
0:01:52 INF - 	features	283MB
-rw-r--r-- 1 runner docker 66M Sep 22 01:21 run.jar
ℹ️ This Branch Logs f26c7b9
0:00:00 DEB - argument: config=null (path to config file)
0:00:00 DEB - argument: area=rhode island (name of the extract to download if osm_url/osm_path not specified (i.e. 'monaco' 'rhode island' 'australia' or 'planet'))
0:00:00 INF - argument: stats=use in-memory stats
0:00:00 DEB - argument: madvise=true (default value for whether to use linux madvise(random) to improve memory-mapped read performance for temporary storage)
0:00:00 DEB - argument: storage=mmap (default storage type for temporary data, one of [ram, mmap, direct])
0:00:00 DEB - argument: threads=2 (num threads)
0:00:00 DEB - argument: write_threads=1 (number of threads to use when writing temp features)
0:00:00 DEB - argument: process_threads=2 (number of threads to use when processing input features)
0:00:00 DEB - argument: bounds=Env[-74.07 : -17.84, 21.34 : 43.55] (bounds)
0:00:00 DEB - argument: polygon=null (a .poly file that limits output to tiles intersecting the shape)
0:00:00 DEB - argument: minzoom=0 (minimum zoom level)
0:00:00 DEB - argument: maxzoom=14 (maximum zoom level up to 15)
0:00:00 DEB - argument: render_maxzoom=14 (maximum rendering zoom level up to 15)
0:00:00 DEB - argument: tmpdir=data/tmp (temp directory)
0:00:00 DEB - argument: feature_read_threads=1 (number of threads to use when reading features at tile write time)
0:00:00 DEB - argument: tile_write_threads=1 (number of threads used to write tiles - only supported by [csv, tsv, proto, pbf, json])
0:00:00 DEB - argument: loginterval=10 seconds (time between logs)
0:00:00 DEB - argument: force=false (overwriting output file and ignore disk/RAM warnings)
0:00:00 DEB - argument: append=false (append to the output file - only supported by [csv, tsv, proto, pbf, json])
0:00:00 DEB - argument: gzip_temp=false (gzip temporary feature storage (uses more CPU, but less disk space))
0:00:00 DEB - argument: mmap_temp=true (use memory-mapped IO for temp feature files)
0:00:00 DEB - argument: sort_max_readers=6 (maximum number of concurrent read threads to use when sorting chunks)
0:00:00 DEB - argument: sort_max_writers=6 (maximum number of concurrent write threads to use when sorting chunks)
0:00:00 DEB - argument: nodemap_type=sparsearray (type of node location map, one of [noop, sortedtable, sparsearray, array])
0:00:00 DEB - argument: nodemap_storage=mmap (storage for node location map, one of [ram, mmap, direct])
0:00:00 DEB - argument: nodemap_madvise=true (use linux madvise(random) for node locations)
0:00:00 DEB - argument: multipolygon_geometry_storage=mmap (storage for multipolygon geometries, one of [ram, mmap, direct])
0:00:00 DEB - argument: multipolygon_geometry_madvise=true (use linux madvise(random) for temporary multipolygon geometry storage)
0:00:00 DEB - argument: http_user_agent=Planetiler downloader (https://github.com/onthegomap/planetiler) (User-Agent header to set when downloading files over HTTP)
0:00:00 DEB - argument: http_timeout=30 seconds (Timeout to use when downloading files over HTTP)
0:00:00 DEB - argument: http_retries=1 (Retries to use when downloading files over HTTP)
0:00:00 DEB - argument: download_chunk_size_mb=100 (Size of file chunks to download in parallel in megabytes)
0:00:00 DEB - argument: download_threads=1 (Number of parallel threads to use when downloading each file)
0:00:00 DEB - argument: download_max_bandwidth= (Maximum bandwidth to consume when downloading files in units mb/s, mbps, kbps, etc.)
0:00:00 DEB - argument: min_feature_size_at_max_zoom=0.0625 (Default value for the minimum size in tile pixels of features to emit at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: min_feature_size=1.0 (Default value for the minimum size in tile pixels of features to emit below the maximum zoom level)
0:00:00 DEB - argument: simplify_tolerance_at_max_zoom=0.0625 (Default value for the tile pixel tolerance to use when simplifying features at the maximum zoom level to allow for overzooming)
0:00:00 DEB - argument: simplify_tolerance=0.1 (Default value for the tile pixel tolerance to use when simplifying features below the maximum zoom level)
0:00:00 DEB - argument: osm_lazy_reads=true (Read OSM blocks from disk in worker threads)
0:00:00 DEB - argument: skip_filled_tiles=false (Skip writing tiles containing only polygon fills to the output)
0:00:00 DEB - argument: tile_warning_size_mb=1.0 (Maximum size in megabytes of a tile to emit a warning about)
0:00:00 DEB - argument: color=null (Color the terminal output)
0:00:00 DEB - argument: keep_unzipped=true (keep unzipped sources by default after reading)
0:00:00 DEB - argument: tile_compression=gzip (the tile compression, one of [none, gzip])
0:00:00 DEB - argument: output_layerstats=false (output a tsv.gz file for each tile/layer size)
0:00:00 DEB - argument: debug_url=https://onthegomap.github.io/planetiler-demo/#{z}/{lat}/{lon} (debug url to use for displaying tiles with {z} {lat} {lon} placeholders)
0:00:00 DEB - argument: tile_weights=data/tile_weights.tsv.gz (tsv.gz file with columns z,x,y,loads to generate weighted average tile size stat)
0:00:00 DEB - argument: only_download=false (download source data then exit)
0:00:00 DEB - argument: download=false (download sources)
0:00:00 DEB - argument: download_osm_tile_weights=false (download OSM tile weights file)
0:00:00 DEB - argument: temp_nodes=data/tmp/node.db (temp node db location)
0:00:00 DEB - argument: temp_multipolygons=data/tmp/multipolygon.db (temp multipolygon db location)
0:00:00 DEB - argument: temp_features=data/tmp/feature.db (temp feature db location)
0:00:00 DEB - argument: osm_parse_node_bounds=false (parse bounds from OSM nodes instead of header)
0:00:00 DEB - argument: only_fetch_wikidata=false (fetch wikidata translations then quit)
0:00:00 DEB - argument: fetch_wikidata=false (fetch wikidata translations then continue)
0:00:00 DEB - argument: use_wikidata=true (use wikidata translations)
0:00:00 DEB - argument: wikidata_cache=data/sources/wikidata_names.json (wikidata cache file)
0:00:00 DEB - argument: lake_centerlines_path=data/sources/lake_centerline.shp.zip (lake_centerlines shapefile path)
0:00:00 DEB - argument: free_lake_centerlines_after_read=false (delete lake_centerlines input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: water_polygons_path=data/sources/water-polygons-split-3857.zip (water_polygons shapefile path)
0:00:00 DEB - argument: free_water_polygons_after_read=false (delete water_polygons input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: natural_earth_path=data/sources/natural_earth_vector.sqlite.zip (natural_earth sqlite db path)
0:00:00 DEB - argument: free_natural_earth_after_read=false (delete natural_earth input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: natural_earth_keep_unzipped=true (keep unzipped natural_earth after reading)
0:00:00 DEB - argument: osm_path=data/sources/rhode_island.osm.pbf (osm OSM input file path)
0:00:00 DEB - argument: free_osm_after_read=false (delete osm input file after reading to make space for output (reduces peak disk usage))
0:00:00 DEB - argument: output=data/out.mbtiles (output tile archive path)
0:00:00 DEB - argument: version=false (show version then exit)
0:00:00 INF - Planetiler build git hash: f26c7b9091d34a97b4d21e6455c44885e2cec81c
0:00:00 INF - Planetiler build version: 0.6-SNAPSHOT
0:00:00 INF - Planetiler build timestamp: 2023-09-22T01:18:46.026Z
0:00:00 DEB - argument: transliterate=true (attempt to transliterate latin names)
0:00:00 DEB - argument: languages=am,ar,az,be,bg,br,bs,ca,co,cs,cy,da,de,el,en,eo,es,et,eu,fi,fr,fy,ga,gd,he,hi,hr,hu,hy,id,is,it,ja,ja_kana,ja_rm,ja-Latn,ja-Hira,ka,kk,kn,ko,ko-Latn,ku,la,lb,lt,lv,mk,mt,ml,nl,no,oc,pl,pt,rm,ro,ru,sk,sl,sq,sr,sr-Latn,sv,ta,te,th,tr,uk,zh (languages to use)
0:00:00 DEB - argument: only_layers= (Include only certain layers)
0:00:00 DEB - argument: exclude_layers= (Exclude certain layers)
0:00:00 DEB - argument: boundary_country_names=true (boundary layer: add left/right codes of neighboring countries)
0:00:00 DEB - argument: boundary_osm_only=false (boundary layer: only use OSM, even at low zoom levels)
0:00:00 DEB - argument: transportation_z13_paths=false (transportation(_name) layer: show all paths on z13)
0:00:00 DEB - argument: building_merge_z13=true (building layer: merge nearby buildings at z13)
0:00:00 DEB - argument: transportation_name_brunnel=false (transportation_name layer: set to false to omit brunnel and help merge long highways)
0:00:00 DEB - argument: transportation_name_size_for_shield=false (transportation_name layer: allow road names on shorter segments (ie. they will have a shield))
0:00:00 DEB - argument: transportation_name_limit_merge=false (transportation_name layer: limit merge so we don't combine different relations to help merge long highways)
0:00:00 DEB - argument: transportation_name_minor_refs=false (transportation_name layer: include name and refs from minor road networks if not present on a way)
0:00:00 DEB - argument: help=false (show arguments then exit)
0:00:00 DEB - argument: layer_stats=/home/runner/work/planetiler/planetiler/data/out.mbtiles.layerstats.tsv.gz (layer stats output path)
0:00:00 INF - Building OpenMapTilesProfile profile into file:///home/runner/work/planetiler/planetiler/data/out.mbtiles in these phases:
0:00:00 INF -   lake_centerlines: Process features in data/sources/lake_centerline.shp.zip
0:00:00 INF -   water_polygons: Process features in data/sources/water-polygons-split-3857.zip
0:00:00 INF -   natural_earth: Process features in data/sources/natural_earth_vector.sqlite.zip
0:00:00 INF -   osm_pass1: Pre-process OpenStreetMap input (store node locations then relation members)
0:00:00 INF -   osm_pass2: Process OpenStreetMap nodes, ways, then relations
0:00:00 INF -   sort: Sort rendered features by tile ID
0:00:00 INF -   archive: Encode each tile and write to TileArchiveConfig[format=MBTILES, scheme=FILE, uri=file:///home/runner/work/planetiler/planetiler/data/out.mbtiles, options={}]
0:00:00 INF - no wikidata translations found, run with --fetch-wikidata to download
0:00:00 DEB - ✓ 197M storage on / (/dev/root) requested for read phase disk, 19G available
0:00:00 DEB -  - 44M used for temporary node location cache
0:00:00 DEB -  - 6.7M used for temporary multipolygon geometry cache
0:00:00 DEB -  - 146M used for temporary feature storage
0:00:00 DEB - ✓ 219M storage on / (/dev/root) requested for write phase disk, 19G available
0:00:00 DEB -  - 146M used for temporary feature storage
0:00:00 DEB -  - 73M used for archive output
0:00:00 DEB - ✓ 313M JVM heap requested for read phase, 4.2G available
0:00:00 DEB -  - 300M used for sparsearray node location in-memory index
0:00:00 DEB -  - 13M used for temporary profile storage
0:00:00 DEB - ✓ 51M storage on / (/dev/root) requested for read phase, 19G available
0:00:00 DEB -  - 44M used for sparsearray node location cache
0:00:00 DEB -  - 6.7M used for multipolygon way geometries
0:00:00 DEB - ✓ 51M temporary files and 2.9G of free memory for OS to cache them
0:00:00 DEB - argument: archive_name=OpenMapTiles ('name' attribute for tileset metadata)
0:00:00 DEB - argument: archive_description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org ('description' attribute for tileset metadata)
0:00:00 DEB - argument: archive_attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a> ('attribution' attribute for tileset metadata)
0:00:00 DEB - argument: archive_version=3.14.0 ('version' attribute for tileset metadata)
0:00:00 DEB - argument: archive_type=baselayer ('type' attribute for tileset metadata)
0:00:00 DEB - argument: archive_format=pbf ('format' attribute for tileset metadata)
0:00:00 DEB - argument: compact=true (mbtiles: reduce the DB size by separating and deduping the tile data)
0:00:00 DEB - argument: no_index=false (mbtiles: skip adding index to sqlite DB)
0:00:00 DEB - argument: vacuum_analyze=false (mbtiles: vacuum analyze sqlite DB after writing)
0:00:00 INF - Using merge sort feature map, chunk size=1431mb max workers=2
0:00:01 INF [lake_centerlines] - 
0:00:01 INF [lake_centerlines] - Starting...
0:00:04 INF [lake_centerlines] -  read: [  59k 100%  29k/s ] write: [    0    0/s ] 0    
    cpus: 1.7 gc:  3% heap: 483M/4.2G direct: 237k postGC: 81M
    ->     (0/3) -> read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:04 INF [lake_centerlines] - Finished in 3s cpu:6s avg:1.8
0:00:04 INF [lake_centerlines] -   read     1x(44% 2s done:2s)
0:00:04 INF [lake_centerlines] -   process  2x(6% 0.2s wait:2s done:2s)
0:00:04 INF [lake_centerlines] -   write    1x(0% 0s wait:2s done:1s)
0:00:04 INF [water_polygons] - 
0:00:04 INF [water_polygons] - Starting...
0:00:14 INF [water_polygons] -  read: [ 2.5k  17%  251/s ] write: [  94k 9.3k/s ] 1.4G 
    cpus: 1.9 gc: 11% heap: 1G/4.2G direct: 54M postGC: 1G
    ->     (0/3) -> read(58%) ->    (0/1k) -> process(30% 21%) ->  (1k/53k) -> write( 0%)
0:00:24 INF [water_polygons] -  read: [ 5.7k  39%  320/s ] write: [ 324k  22k/s ] 1.4G 
    cpus: 1.7 gc:  7% heap: 2.2G/4.2G direct: 54M postGC: 1.5G
    ->     (0/3) -> read(68%) ->    (0/1k) -> process(18% 25%) -> (409/53k) -> write( 0%)
0:00:31 INF [water_polygons] -  read: [  14k 100% 1.2k/s ] write: [ 4.3M 578k/s ] 193M 
    cpus: 1.8 gc:  9% heap: 2.4G/4.2G direct: 54M postGC: 1.9G
    ->     (0/3) -> read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:31 INF [water_polygons] - Finished in 27s cpu:48s gc:2s avg:1.8
0:00:31 INF [water_polygons] -   read     1x(54% 15s wait:4s)
0:00:31 INF [water_polygons] -   process  2x(30% 8s wait:12s)
0:00:31 INF [water_polygons] -   write    1x(2% 0.5s wait:26s)
0:00:31 INF [natural_earth] - 
0:00:31 INF [natural_earth] - Starting...
0:00:44 INF [natural_earth] -  read: [ 349k 100%  37k/s ] write: [  181   19/s ] 193M 
    cpus: 1.3 gc:  1% heap: 2.2G/4.2G direct: 54M postGC: 1.9G
    ->     (0/3) -> read( -%) ->    (0/1k) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:00:44 INF [natural_earth] - Finished in 13s cpu:13s avg:1
0:00:44 INF [natural_earth] -   read     1x(59% 8s sys:1s done:4s)
0:00:44 INF [natural_earth] -   process  2x(9% 1s wait:9s done:4s)
0:00:44 INF [natural_earth] -   write    1x(0% 0s wait:9s done:4s)
0:00:44 INF [osm_pass1] - 
0:00:44 INF [osm_pass1] - Starting...
0:00:46 INF [osm_pass1:process] - Finished nodes: 4,641,468 (1.9M/s) in 2s cpu:4s avg:1.7
0:00:47 INF [osm_pass1:process] - Finished ways: 345,544 (438k/s) in 0.8s cpu:1s avg:1.8
0:00:47 INF [osm_pass1:process] - Finished relations: 5,863 (49k/s) in 0.1s cpu:0.2s avg:1.9
0:00:47 INF [osm_pass1] -  nodes: [ 4.6M 1.3M/s ] 480M  ways: [ 345k 101k/s ] rels: [ 5.8k 1.7k/s ] blocks: [  626  184/s ]
    cpus: 1.7 gc:  3% heap: 1.2G/4.2G direct: 54M postGC: 932M hppc: 461k
    read( -%) ->     (0/4) -> parse( -%) ->     (0/4) -> process( -%)
0:00:47 DEB [osm_pass1] - Processed 626 blocks:
0:00:47 DEB [osm_pass1] -   nodes: 4,641,468 (1.9M/s) in 2s cpu:4s avg:1.7
0:00:47 DEB [osm_pass1] -   ways: 345,544 (438k/s) in 0.8s cpu:1s avg:1.8
0:00:47 DEB [osm_pass1] -   relations: 5,863 (49k/s) in 0.1s cpu:0.2s avg:1.9
0:00:47 INF [osm_pass1] - Finished in 3s cpu:6s avg:1.7
0:00:47 INF [osm_pass1] -   read     1x(1% 0s wait:3s)
0:00:47 INF [osm_pass1] -   parse    1x(65% 2s)
0:00:47 INF [osm_pass1] -   process  1x(41% 1s wait:2s)
0:00:47 INF [osm_pass2] - 
0:00:47 INF [osm_pass2] - Starting...
0:00:50 DEB [osm_pass2:process] - Sorting long long multimap...
0:00:50 INF [osm_pass2:process] - Finished nodes: 4,641,468 (1.7M/s) in 3s cpu:5s avg:1.9
0:00:50 DEB [osm_pass2:process] - Sorted long long multimap 0s cpu:0s avg:2
0:00:50 WAR [osm_pass2:process] - No GB polygon for inferring route network types
0:00:57 INF [osm_pass2] -  nodes: [ 4.6M 100% 464k/s ] 480M  ways: [ 162k  47%  16k/s ] rels: [    0   0%    0/s ] features: [ 4.9M  59k/s ] 1.6G  blocks: [  600  96%   59/s ]
    cpus: 1.9 gc:  0% heap: 3.2G/4.2G direct: 54M postGC: 945M relInfo: 420k mpGeoms: 474k 
    read( 0%) ->   (11/13) -> process(61% 57%) -> (823/53k) -> write( 2%)
0:01:07 INF [osm_pass2] -  nodes: [ 4.6M 100%    0/s ] 480M  ways: [ 320k  93%  15k/s ] rels: [    0   0%    0/s ] features: [ 5.3M  43k/s ] 1.6G  blocks: [  620  99%    1/s ]
    cpus: 2 gc:  1% heap: 3.4G/4.2G direct: 54M postGC: 954M relInfo: 420k mpGeoms: 18M  
    read( -%) ->    (4/13) -> process(88% 90%) -> (400/53k) -> write( 1%)
0:01:08 INF [osm_pass2:process] - Finished ways: 345,544 (19k/s) in 18s cpu:35s avg:1.9
0:01:17 INF [osm_pass2] -  nodes: [ 4.6M 100%    0/s ] 480M  ways: [ 345k 100% 2.5k/s ] rels: [ 4.4k  77%  448/s ] features: [ 5.4M 8.2k/s ] 1.6G  blocks: [  625 100%   <1/s ]
    cpus: 2 gc:  1% heap: 1.7G/4.2G direct: 54M postGC: 953M relInfo: 420k mpGeoms: 18M  
    read( -%) ->    (0/13) -> process(82% 82%) -> (1.2k/53k) -> write( 0%)
0:01:18 INF [osm_pass2:process] - Finished relations: 5,863 (583/s) in 10s cpu:20s avg:2
0:01:21 INF [osm_pass2] -  nodes: [ 4.6M 100%    0/s ] 480M  ways: [ 345k 100%    0/s ] rels: [ 5.8k 100%  366/s ] features: [ 5.4M 2.7k/s ] 283M  blocks: [  626 100%   <1/s ]
    cpus: 2 gc:  1% heap: 1.3G/4.2G direct: 54M postGC: 940M relInfo: 420k mpGeoms: 18M  
    read( -%) ->    (0/13) -> process( -%  -%) ->   (0/53k) -> write( -%)
0:01:21 DEB [osm_pass2] - Processed 626 blocks:
0:01:21 DEB [osm_pass2] -   nodes: 4,641,468 (1.7M/s) in 3s cpu:5s avg:1.9
0:01:21 DEB [osm_pass2] -   ways: 345,544 (19k/s) in 18s cpu:35s avg:1.9
0:01:21 DEB [osm_pass2] -   relations: 5,863 (583/s) in 10s cpu:20s avg:2
0:01:21 INF [osm_pass2] - Finished in 34s cpu:1m5s avg:1.9
0:01:21 INF [osm_pass2] -   read     1x(0% 0s wait:18s done:15s)
0:01:21 INF [osm_pass2] -   process  2x(78% 26s)
0:01:21 INF [osm_pass2] -   write    1x(1% 0.4s wait:33s)
0:01:21 INF [boundaries] - 
0:01:21 INF [boundaries] - Starting...
0:01:21 INF [boundaries] - Creating polygons for 1 boundaries
0:01:21 WAR [boundaries] - Unable to form closed polygon for OSM relation 148838 (likely missing edges)
0:01:21 INF [boundaries] - Finished creating 0 country polygons
0:01:21 INF [boundaries] - Finished in 0s cpu:0s avg:1.3
0:01:21 INF - Deleting node.db to make room for output file
0:01:21 INF [sort] - 
0:01:21 INF [sort] - Starting...
0:01:21 INF [sort] - Grouped 8 chunks into 1
0:01:23 INF [sort] -  chunks: [   1 /   1 100% ] 283M 
    cpus: 1.2 gc:  0% heap: 1.8G/4.2G direct: 54M postGC: 940M
    ->     (0/3) -> worker( -%)
0:01:23 INF [sort] - Finished in 2s cpu:3s avg:1.2
0:01:23 INF [sort] -   worker  1x(73% 2s)
0:01:23 INF - read:0s write:0s sort:0s
0:01:23 INF [archive] - 
0:01:23 INF [archive] - Starting...
0:01:24 WAR [archive] - Unable to load tile weights from data/tile_weights.tsv.gz, will fall back to unweighted average: java.nio.file.NoSuchFileException: data/tile_weights.tsv.gz
0:01:24 DEB [archive:write] - Execute mbtiles: create table metadata (name text, value text);
0:01:24 DEB [archive:write] - Execute mbtiles: create unique index name on metadata (name);
0:01:24 DEB [archive:write] - Execute mbtiles: create table tiles_shallow (
  zoom_level integer,
  tile_column integer,
  tile_row integer,
  tile_data_id integer

  , primary key(zoom_level,tile_column,tile_row)

) without rowid

0:01:24 DEB [archive:write] - Execute mbtiles: create table tiles_data (
  tile_data_id integer primary key,
  tile_data blob
)

0:01:24 DEB [archive:write] - Execute mbtiles: create view tiles AS
select
  tiles_shallow.zoom_level as zoom_level,
  tiles_shallow.tile_column as tile_column,
  tiles_shallow.tile_row as tile_row,
  tiles_data.tile_data as tile_data
from tiles_shallow
join tiles_data on tiles_shallow.tile_data_id = tiles_data.tile_data_id

0:01:24 DEB [archive:write] - Set mbtiles metadata: format=pbf
0:01:24 DEB [archive:write] - Set mbtiles metadata: center=-45.955,32.445,3
0:01:24 DEB [archive:write] - Set mbtiles metadata: bounds=-74.07,21.34,-17.84,43.55
0:01:24 DEB [archive:write] - Set mbtiles metadata: json={"vector_layers":[{"id":"aerodrome_label","fields":{"name_int":"String","iata":"String","ele_ft":"Number","name_de":"String","name":"String","icao":"String","name:en":"String","class":"String","ele":"Number","name_en":"String","name:latin":"String"},"minzoom":10,"maxzoom":14},{"id":"aeroway","fields":{"ref":"String","class":"String"},"minzoom":10,"maxzoom":14},{"id":"boundary","fields":{"disputed":"Number","admin_level":"Number","maritime":"Number"},"minzoom":0,"maxzoom":14},{"id":"building","fields":{"colour":"String","render_height":"Number","render_min_height":"Number","hide_3d":"Boolean"},"minzoom":13,"maxzoom":14},{"id":"housenumber","fields":{"housenumber":"String"},"minzoom":14,"maxzoom":14},{"id":"landcover","fields":{"subclass":"String","class":"String","_numpoints":"Number"},"minzoom":7,"maxzoom":14},{"id":"landuse","fields":{"class":"String"},"minzoom":4,"maxzoom":14},{"id":"mountain_peak","fields":{"customary_ft":"Number","name_int":"String","ele_ft":"Number","name_de":"Str... 2358 more characters
0:01:24 DEB [archive:write] - Set mbtiles metadata: name=OpenMapTiles
0:01:24 DEB [archive:write] - Set mbtiles metadata: description=A tileset showcasing all layers in OpenMapTiles. https://openmaptiles.org
0:01:24 DEB [archive:write] - Set mbtiles metadata: attribution=<a href="https://www.openmaptiles.org/" target="_blank">&copy; OpenMapTiles</a> <a href="https://www.openstreetmap.org/copyright" target="_blank">&copy; OpenStreetMap contributors</a>
0:01:24 DEB [archive:write] - Set mbtiles metadata: version=3.14.0
0:01:24 DEB [archive:write] - Set mbtiles metadata: type=baselayer
0:01:24 DEB [archive:write] - Set mbtiles metadata: minzoom=0
0:01:24 DEB [archive:write] - Set mbtiles metadata: maxzoom=14
0:01:24 DEB [archive:write] - Set mbtiles metadata: compression=gzip
0:01:24 DEB [archive:write] - Set mbtiles metadata: planetiler:version=0.6-SNAPSHOT
0:01:24 DEB [archive:write] - Set mbtiles metadata: planetiler:githash=f26c7b9091d34a97b4d21e6455c44885e2cec81c
0:01:24 DEB [archive:write] - Set mbtiles metadata: planetiler:buildtime=2023-09-22T01:18:46.026Z
0:01:24 DEB [archive:write] - Set mbtiles metadata: planetiler:osm:osmosisreplicationtime=2023-09-21T20:21:26Z
0:01:24 DEB [archive:write] - Set mbtiles metadata: planetiler:osm:osmosisreplicationseq=3829
0:01:24 DEB [archive:write] - Set mbtiles metadata: planetiler:osm:osmosisreplicationurl=http://download.geofabrik.de/north-america/us/rhode-island-updates
0:01:26 INF [archive:write] - Starting z0
0:01:26 INF [archive:write] - Finished z0 in 0s cpu:0s avg:0, now starting z1
0:01:26 INF [archive:write] - Finished z1 in 0s cpu:0s avg:0, now starting z2
0:01:26 INF [archive:write] - Finished z2 in 0s cpu:0s avg:0, now starting z3
0:01:26 INF [archive:write] - Finished z3 in 0s cpu:0s avg:0, now starting z4
0:01:26 INF [archive:write] - Finished z4 in 0s cpu:0s avg:0, now starting z5
0:01:26 INF [archive:write] - Finished z5 in 0s cpu:0s avg:0, now starting z6
0:01:26 INF [archive:write] - Finished z6 in 0s cpu:0s avg:0, now starting z7
0:01:27 INF [archive:write] - Finished z7 in 0.8s cpu:2s avg:2, now starting z8
0:01:29 INF [archive:write] - Finished z8 in 2s cpu:4s avg:2, now starting z9
0:01:31 INF [archive:write] - Finished z9 in 2s cpu:5s avg:2, now starting z10
0:01:33 INF [archive:write] - Finished z10 in 2s cpu:3s avg:2, now starting z11
0:01:34 INF [archive] -  features: [ 171k   3%  17k/s ] 283M  tiles: [  18k 1.8k/s ] 2.9M 
    cpus: 1.9 gc:  2% heap: 2.8G/4.2G direct: 54M postGC: 1.4G
    read( 1%) -> (214/217) -> encode(57% 60%) -> (215/216) -> write( 1%)
    last tile: 11/616/762 (z11 4%) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086
0:01:36 INF [archive:write] - Finished z11 in 3s cpu:5s avg:2, now starting z12
0:01:39 INF [archive:write] - Finished z12 in 4s cpu:7s avg:2, now starting z13
0:01:44 INF [archive] -  features: [ 856k  16%  68k/s ] 283M  tiles: [ 291k  27k/s ] 15M  
    cpus: 2 gc:  3% heap: 3G/4.2G direct: 54M postGC: 1.5G
    read( 1%) -> (214/217) -> encode(72% 69%) -> (215/216) -> write( 3%)
    last tile: 13/2470/3047 (z13 5%) https://onthegomap.github.io/planetiler-demo/#13.5/41.78769/-71.43311
0:01:48 INF [archive:write] - Finished z13 in 9s cpu:17s avg:2, now starting z14
0:01:53 DEB [archive:write] - Shallow tiles written: 4,115,061
0:01:53 DEB [archive:write] - Tile data written: 17,717 (100% omitted)
0:01:53 DEB [archive:write] - Unique tile hashes: 8,843
0:01:53 INF [archive:write] - Finished z14 in 5s cpu:10s avg:1.9
0:01:53 INF [archive] -  features: [ 5.4M 100% 485k/s ] 283M  tiles: [ 4.1M 404k/s ] 109M 
    cpus: 1.9 gc:  3% heap: 2.9G/4.2G direct: 54M postGC: 1.5G
    read( -%) ->   (0/217) -> encode( -%  -%) ->   (0/216) -> write(37%)
    last tile: 14/7380/5985 (z14 100%) https://onthegomap.github.io/planetiler-demo/#14.5/43.55651/-17.83081
0:01:53 DEB [archive] - Tile stats:
0:01:53 DEB [archive] - Biggest tiles (gzipped)
1. 9/154/190 (204k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:173k)
2. 10/308/381 (180k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:141k)
3. 10/308/380 (179k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:129k)
4. 14/4942/6092 (173k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (building:142k)
5. 14/4940/6092 (135k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:212k)
6. 14/4941/6093 (128k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:133k)
7. 14/4940/6091 (125k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.44409 (building:204k)
8. 14/4941/6092 (124k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (building:94k)
9. 14/4942/6091 (122k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:175k)
10. 14/4940/6093 (119k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.44409 (building:164k)
0:01:53 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  154   374   445   581   936   340   432   545   545  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   25k   15k   13k   17k   15k   12k   25k
              place    0     0   377   377   377   586   675   963  1.6k  3.3k  5.7k  3.1k  1.7k   789   947  5.7k
            landuse    0     0     0     0   679   745  1.3k    6k   17k   44k   59k   50k   38k   23k   14k   59k
     transportation    0     0     0     0  1.1k  1.9k  2.9k  9.5k   12k   32k   22k   23k   64k   47k   33k   64k
           waterway    0     0     0     0   111   118     0     0     0  3.4k  2.3k    2k  1.6k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.1k  3.7k  9.4k   18k   12k  7.4k  4.3k  3.4k  4.4k   18k
transportation_name    0     0     0     0     0     0   359   454  1.2k  1.7k  4.9k  3.9k  3.7k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0   23k   67k  173k  141k   81k   53k   30k   24k  173k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   487   462   434   445   549    1k    1k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   674   327   273   220   220   674
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.7k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   501   498   67k   67k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   60k  212k  212k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   55k   55k
          full tile 7.9k    4k  9.4k  6.4k  4.6k  7.2k   22k   58k  127k  298k  256k  136k  114k  128k  304k  304k
            gzipped 6.2k  3.6k  7.1k  5.2k  3.5k  5.4k   15k   38k   84k  204k  180k  100k   84k   93k  173k  204k
0:01:53 DEB [archive] -    Max tile: 304k (gzipped: 204k)
0:01:53 DEB [archive] -    Avg tile: 68 (gzipped: 87) no tile weights, use --download-osm-tile-weights for weighted average
0:01:53 DEB [archive] -     # tiles: 4,115,061
0:01:53 DEB [archive] -  # features: 5,440,450
0:01:53 INF [archive] - Finished in 30s cpu:58s gc:1s avg:1.9
0:01:53 INF [archive] -   read    1x(2% 0.6s wait:28s)
0:01:53 INF [archive] -   encode  2x(63% 19s)
0:01:53 INF [archive] -   write   1x(13% 4s wait:24s)
0:01:53 INF - Finished in 1m54s cpu:3m21s gc:4s avg:1.8
0:01:53 INF - FINISHED!
0:01:53 INF - 
0:01:53 INF - ----------------------------------------
0:01:53 INF - data errors:
0:01:53 INF - 	render_snap_fix_input	16,475
0:01:53 INF - 	osm_boundary_missing_way	63
0:01:53 INF - 	osm_multipolygon_missing_way	57
0:01:53 INF - 	merge_snap_fix_input	14
0:01:53 INF - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:53 INF - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	2
0:01:53 INF - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:53 INF - ----------------------------------------
0:01:53 INF - 	overall          1m54s cpu:3m21s gc:4s avg:1.8
0:01:53 INF - 	lake_centerlines 3s cpu:6s avg:1.8
0:01:53 INF - 	  read     1x(44% 2s done:2s)
0:01:53 INF - 	  process  2x(6% 0.2s wait:2s done:2s)
0:01:53 INF - 	  write    1x(0% 0s wait:2s done:1s)
0:01:53 INF - 	water_polygons   27s cpu:48s gc:2s avg:1.8
0:01:53 INF - 	  read     1x(54% 15s wait:4s)
0:01:53 INF - 	  process  2x(30% 8s wait:12s)
0:01:53 INF - 	  write    1x(2% 0.5s wait:26s)
0:01:53 INF - 	natural_earth    13s cpu:13s avg:1
0:01:53 INF - 	  read     1x(59% 8s sys:1s done:4s)
0:01:53 INF - 	  process  2x(9% 1s wait:9s done:4s)
0:01:53 INF - 	  write    1x(0% 0s wait:9s done:4s)
0:01:53 INF - 	osm_pass1        3s cpu:6s avg:1.7
0:01:53 INF - 	  read     1x(1% 0s wait:3s)
0:01:53 INF - 	  parse    1x(65% 2s)
0:01:53 INF - 	  process  1x(41% 1s wait:2s)
0:01:53 INF - 	osm_pass2        34s cpu:1m5s avg:1.9
0:01:53 INF - 	  read     1x(0% 0s wait:18s done:15s)
0:01:53 INF - 	  process  2x(78% 26s)
0:01:53 INF - 	  write    1x(1% 0.4s wait:33s)
0:01:53 INF - 	boundaries       0s cpu:0s avg:1.3
0:01:53 INF - 	sort             2s cpu:3s avg:1.2
0:01:53 INF - 	  worker  1x(73% 2s)
0:01:53 INF - 	archive          30s cpu:58s gc:1s avg:1.9
0:01:53 INF - 	  read    1x(2% 0.6s wait:28s)
0:01:53 INF - 	  encode  2x(63% 19s)
0:01:53 INF - 	  write   1x(13% 4s wait:24s)
0:01:53 INF - ----------------------------------------
0:01:53 INF - 	archive	109MB
0:01:53 INF - 	features	283MB
-rw-r--r-- 1 runner docker 66M Sep 22 01:19 run.jar

@bdon
Copy link
Contributor

bdon commented Sep 6, 2023

I prefer the output format with the least dependencies possible. I assume that is tsv.gz or NDJSON.

  • TSV is more compact, but NDJSON is resilient to schema changes. Since we don't have anything settled TSV seems fine.
  • I think the output should be directly compressed, as long as the next step is to open directly in DuckDB.

@msbarry
Copy link
Contributor Author

msbarry commented Sep 7, 2023

Sounds good to me. Any thoughts on the formats? Most recently I've been generated a single tsv.gz with columns:

  • z
  • x
  • y
  • hilbert
  • tile_bytes
  • gzipped_tile_bytes
  • deduped_tile_id
  • layer
  • features
  • layer_bytes
  • layer_attr_bytes (want to indicate if geometry vs. attributes are contributing to the size, but I'm not convinced this is super useful as-is vs. just visually inspecting the tile)
  • layer_attr_values
  • min_lon (these duplicate z,x,y data but make it easier to generate debug URLs to look at a tile, or filter within a bounding box)
  • max_lon
  • min_lat
  • max_lat

You can either split them out into 2 tables (z,x,y->id, and id,layer->stats) or duplicate all of the z,x,y tile IDs into one denormalized table. Split makes it easy to query tile-level stats but duckdb struggles with joining to use tile coordinates with layer-level stats (too big to add an index), and denormalized makes it easier to work with layer-level stats, but duckdb struggles grouping by tile ID to get tile level stats.

The denormalized version also gets a little big (4GB for the planet vs. <2GB for split).

@msbarry
Copy link
Contributor Author

msbarry commented Sep 7, 2023

And in terms of outputting a high-level summary, the most useful ones have been:

  • a single "weighted average tile size" (raw and gzipped) number based on weighting tile sizes by top n most-visited osm tiles
  • the total archive size
  • a table of max layer size by layer and zoom
  • a list of the top N biggest tiles, along with which layers cause them to be big and z/lat/lon to generate a link to visually inspect them (there's probably overlap between biggest tiles and tiles with biggest layers - might want to merge the lists or something)

@bdon
Copy link
Contributor

bdon commented Sep 8, 2023

min_lon (these duplicate z,x,y data but make it easier to generate debug URLs to look at a tile, or filter within a bounding box)

Can we solve this via a DuckDB function definition instead of repeating information in the stats?

@msbarry
Copy link
Contributor Author

msbarry commented Sep 8, 2023

min_lon (these duplicate z,x,y data but make it easier to generate debug URLs to look at a tile, or filter within a bounding box)

Can we solve this via a DuckDB function definition instead of repeating information in the stats?

Yeah, it looks like that could work - would just need to put it in documentation somewhere for people to copy/paste:

D create or replace macro lat(z,y) as round((180/pi())*atan(0.5*(exp(pi()-2*pi()*(y/(2**z)))-exp(-(pi()-2*pi()*(y/(2**z)))))), 5);
D create or replace macro lon(z,x) as (180/pi())*atan(0.5*(exp(pi()-2*pi()*(y/(2**z)))-exp(-(pi()-2*pi()*(y/(2**z))))));
D create or replace macro url(z,x,y) as concat('https://onthegomap.github.io/planetiler-demo/#', z+0.5, '/', lat(z,y+0.5), '/', lon(z,x+0.5));
D select z,x,y,layer,round(layer_bytes/1000) as kbs, url(z,x,y) from stats order by layer_bytes desc limit 10;

returns:

┌───────┬───────┬───────┬─────────────┬────────┬───────────────────────────────────────────────────────────────────────┐
│   z   │   x   │   y   │    layer    │  kbs   │                             url(z, x, y)                              │
│ int64 │ int64 │ int64 │   varchar   │ double │                                varchar                                │
├───────┼───────┼───────┼─────────────┼────────┼───────────────────────────────────────────────────────────────────────┤
│    14 │ 13722 │  7013 │ housenumber │ 2412.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/25.05575/121.51978 │
│    14 │ 13723 │  7014 │ housenumber │ 1850.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/25.03584/121.54175 │
│    14 │ 13723 │  7013 │ housenumber │ 1827.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/25.05575/121.54175 │
│    14 │  6435 │  8361 │ building    │ 1761.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/-3.72175/-38.59497 │
│    14 │ 13724 │  7014 │ housenumber │ 1626.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/25.03584/121.56372 │
│    14 │ 13667 │  7134 │ housenumber │ 1597.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/22.62415/120.31128 │
│    14 │  6435 │  8364 │ building    │ 1550.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/-3.78752/-38.59497 │
│    14 │  6435 │  8363 │ building    │ 1510.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/-3.7656/-38.59497  │
│    14 │ 13683 │  7058 │ housenumber │ 1472.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/24.15678/120.66284 │
│    14 │ 13724 │  7013 │ housenumber │ 1460.0 │ https://onthegomap.github.io/planetiler-demo/#14.5/25.05575/121.56372 │
├───────┴───────┴───────┴─────────────┴────────┴───────────────────────────────────────────────────────────────────────┤
│ 10 rows                                                                                                    6 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

There's a benefit to having it persisted in parquet file to allow efficient bbox filtering, but if you're going to go through duckdb to create a parquet file you could just add it there.

@msbarry
Copy link
Contributor Author

msbarry commented Sep 8, 2023

@bdon what do you think about the different format options?

  • tile-level (z,x,y->tile size)
  • deduped-tile-level (z,x,y->id, id->tile sizes)
  • layer-level (z,x,y,layer->layer sizes)
  • or layer/attribute level (z,x,y,layer,attr->size that attribute contributes to layer)

@bdon
Copy link
Contributor

bdon commented Sep 8, 2023

  • I don't think we care about deduplicated data, since deduped data is uninteresting for stats 99.9% of the time
  • layer/attribute level is the most interesting, and the most useful for situations like we're doing in protomaps/basemaps where many types of things are combined into the landuse, pois and roads layer

so #4 would be great (though unclear to me what the TSV looks like), then #3, #1, #2

@msbarry
Copy link
Contributor Author

msbarry commented Sep 8, 2023

OK that makes sense. Would you omit "deduped_tile_id" column as well? Or "hilbert"?

I was thinking 4 would be something like: z, x, y, layer, attribute/geom, size so you could see the size contributed by geometry vs. individual attributes. It would be expensive to compute and large, but might make sense if you're looking at a small number of tiles or low-zoom area. But it sound like maybe you're looking for total size of features that have certain attributes set?

Unfortunately it's hard to get tile-level stats from 3 since duckdb runs out of memory on the huge "group by" unless you limit to a small set of tiles first.

One other route to go might be to have a row per tile, and included nested data in it... for example if you import a file:

{"id":1,"layers":[{"id":"layer1","size":100}, {"id":"layer2","size":200}]}
{"id":2,"layers":[{"id":"layer2","size":300}, {"id":"layer3","size":400}]}

Then you can do:

D select * from stats;
┌───────┬────────────────────────────────────────────────────────────┐
│  id   │                           layers                           │
│ int64 │             struct(id varchar, size bigint)[]              │
├───────┼────────────────────────────────────────────────────────────┤
│     1 │ [{'id': layer1, 'size': 100}, {'id': layer2, 'size': 200}] │
│     2 │ [{'id': layer2, 'size': 300}, {'id': layer3, 'size': 400}] │
└───────┴────────────────────────────────────────────────────────────┘
D select id, unnest(layers).id, unnest(layers).size from stats;
┌───────┬─────────────────────┬───────────────────────┐
│  id   │ (unnest(layers)).id │ (unnest(layers)).size │
│ int64 │       varchar       │         int64         │
├───────┼─────────────────────┼───────────────────────┤
│     1 │ layer1              │                   100 │
│     1 │ layer2              │                   200 │
│     2 │ layer2              │                   300 │
│     2 │ layer3              │                   400 │
└───────┴─────────────────────┴───────────────────────┘

I'm not sure how performant querying that would be though, might be worth a test.

@msbarry
Copy link
Contributor Author

msbarry commented Sep 8, 2023

I tried outputting newline-delimited json that looks like:

{"z":0,"x":0,"y":0,"hilbert":0,"total_bytes":142374,"gzipped_bytes":82957,"layers":[{"name":"water","features":2,"total_bytes":8435,"attr_bytes":18,"attr_values":2},{"name":"landcover","features":11,"total_bytes":1578,"attr_bytes":27,"attr_values":2},{"name":"place","features":42,"total_bytes":115614,"attr_bytes":82249,"attr_values":4363},{"name":"water_name","features":6,"total_bytes":7042,"attr_bytes":5541,"attr_values":238},{"name":"boundary","features":32,"total_bytes":9689,"attr_bytes":807,"attr_values":37}]}
{"z":1,"x":0,"y":0,"hilbert":1,"total_bytes":103276,"gzipped_bytes":64171,"layers":[{"name":"water","features":2,"total_bytes":4139,"attr_bytes":18,"attr_values":2},{"name":"landcover","features":3,"total_bytes":468,"attr_bytes":27,"attr_values":2},{"name":"place","features":29,"total_bytes":68351,"attr_bytes":47302,"attr_values":2493},{"name":"water_name","features":7,"total_bytes":7136,"attr_bytes":5562,"attr_values":245},{"name":"boundary","features":13,"total_bytes":23165,"attr_bytes":277,"attr_values":14}]}
{"z":1,"x":0,"y":1,"hilbert":2,"total_bytes":38635,"gzipped_bytes":24721,"layers":[{"name":"water","features":2,"total_bytes":1340,"attr_bytes":18,"attr_values":2},{"name":"landcover","features":8,"total_bytes":982,"attr_bytes":27,"attr_values":2},{"name":"place","features":10,"total_bytes":25423,"attr_bytes":17164,"attr_values":915},{"name":"water_name","features":3,"total_bytes":3102,"attr_bytes":2431,"attr_values":97},{"name":"boundary","features":5,"total_bytes":7772,"attr_bytes":37,"attr_values":5}]}
{"z":1,"x":1,"y":1,"hilbert":3,"total_bytes":91365,"gzipped_bytes":52859,"layers":[{"name":"water","features":2,"total_bytes":2206,"attr_bytes":18,"attr_values":2},{"name":"landcover","features":1,"total_bytes":521,"attr_bytes":27,"attr_values":2},{"name":"place","features":23,"total_bytes":66179,"attr_bytes":48121,"attr_values":2174},{"name":"water_name","features":4,"total_bytes":6010,"attr_bytes":4716,"attr_values":196},{"name":"boundary","features":6,"total_bytes":16432,"attr_bytes":66,"attr_values":6}]}
{"z":1,"x":1,"y":0,"hilbert":4,"total_bytes":275506,"gzipped_bytes":166904,"layers":[{"name":"water","features":2,"total_bytes":3647,"attr_bytes":18,"attr_values":2},{"name":"place","features":76,"total_bytes":191026,"attr_bytes":129687,"attr_values":7421},{"name":"water_name","features":5,"total_bytes":5869,"attr_bytes":4640,"attr_values":193},{"name":"boundary","features":55,"total_bytes":74950,"attr_bytes":1212,"attr_values":55}]}

The raw json.gz file was 3.3GB, duckdb file was 1.8GB and exported parquet file was 1.4GB. You can query tile-level stats instantly:

select z,x,y,gzipped_bytes from 'nested_stats.parquet' order by gzipped_bytes desc limit 10;

But a layer-level query takes 40 seconds:

with unnested as (select z,x,y,unnest(layers) as layer from stats)
select z,x,y,layer.name, layer.total_bytes from unnested order by layer.total_bytes desc limit 10;

vs. this query which is instant with option 3:

select z,x,y,layer,layer_bytes from flat order by layer_bytes desc limit 10;

But if you know you are going to be doing layer-level analysis you could create an unnested table once (~60 seconds), then use it for instant layer-level queries afterwards:

create table unnested as (select z,x,y,unnest(layers) layer from stats); -- ~60 seconds
select z,x,y,layer.name,layer.total_bytes from unnested order by layer.total_bytes desc limit 10; -- instant

This approach seems better if we're targeting duckdb because both tile and layer level queries are possible from one output file (duckdb doesn't blow up) but nested json is a little less standard/easy to work with in other tools, and the structured json output is ~1.5x bigger than tsv.gz

@bdon
Copy link
Contributor

bdon commented Sep 12, 2023

@msbarry I would omit both of those, and simply not include any tile that occurs more than once in the statistics (anything I'm missing here?)

My PoC is here: protomaps/go-pmtiles#75

for now it just outputs a .tsv.gz with z,x,y,compressed_length contents

next steps:

  • should we decide on a naming convention for these files? if I run pmtiles stats foo.pmtiles it will create the sidecar foo.pmtiles.stats.tsv.gz
  • How do you determine the bytes by each layer, by looking at the protobuf message length? I'm using https://github.com/paulmach/orb which unmarshals MVT straight to GeoJSON-like Simple Features, so I may need to use the raw protobuf-generated functions
  • would it make sense for the duckdb utility functions and rough spec to live in the planetiler repo at a dir like /stats? if so I can make go-pmtiles just follow that.

@msbarry
Copy link
Contributor Author

msbarry commented Sep 12, 2023

  • should we decide on a naming convention for these files? if I run pmtiles stats foo.pmtiles it will create the sidecar foo.pmtiles.stats.tsv.gz

That sounds good to me, I could foresee wanting different flavors of stats so maybe foo.pmtiles.tilestats.tsv.gz or foo.pmtiles.layerstats.tsv.gz ?

  • How do you determine the bytes by each layer, by looking at the protobuf message length? I'm using https://github.com/paulmach/orb which unmarshals MVT straight to GeoJSON-like Simple Features, so I may need to use the raw protobuf-generated functions

Yeah you need to parse raw protobuf. I'm pretty sure the tile is just a series of concatenated layers, and the java library at least lets you get the serialized byte length of a parsed proto struct. You could probably also decompress with a very simple schema to only separate the layers (it just passes through all the unrecognized fields)

message Tile {
  message Layer {
    required string name = 1;
  }
  repeated Layer layers = 3;
}
  • would it make sense for the duckdb utility functions and rough spec to live in the planetiler repo at a dir like /stats? if so I can make go-pmtiles just follow that.

Yeah, anywhere common would work but would definitely be useful to have some common queries listed out (especially the z/x/y -> lat/lon macro)

@bdon
Copy link
Contributor

bdon commented Sep 12, 2023

  • I realized the term "tilestats" is taken by https://github.com/mapbox/mapbox-geostats and well used by default output from tippecanoe so we should go with "layerstats"
  • I prefer we have one omnibus stats file instead of multiple flavors, as long is it's fast in DuckDB
  • maybe we could have .layerstats imply .tsv.gz? I don't think the .gz bit is useful if we're not expanding this in Finder (it would be really big) - so just planet.pmtiles.layerstats
  • If gzipping is the bottleneck maybe we should use zstd, lzw or lzma? prefer something that's in the go stdlib (lzw, gzip) but open to others

@msbarry
Copy link
Contributor Author

msbarry commented Sep 12, 2023

👍

  • I prefer we have one omnibus stats file instead of multiple flavors, as long is it's fast in DuckDB

Sounds good, after more testing I think layer to tile grouping should be reasonably fast enough. Just thinking if we went down to attribute stats that would be a lot bigger?

  • maybe we could have .layerstats imply .tsv.gz? I don't think the .gz bit is useful if we're not expanding this in Finder (it would be really big) - so just planet.pmtiles.layerstats

The benefit of naming .tsv.gz is that it tells duckdb how to parse it without any explicit instructions (select * from 'planet.pmtiles.layerstats.tsv.gz')

  • If gzipping is the bottleneck maybe we should use zstd, lzw or lzma? prefer something that's in the go stdlib (lzw, gzip) but open to others

Gzipping takes ~3m for the planet, maybe less with a smaller format so probably not a big enough deal to worry about...

@msbarry
Copy link
Contributor Author

msbarry commented Sep 13, 2023

For the actual stats format I think we definitely need:

  • z
  • x
  • y
  • gzipped_tile_bytes
  • layer
  • layer_bytes

then maybe:

  • hilbert ID
  • layer_attr_bytes
  • features
  • tile_bytes (can be derived from sum(layer_bytes))
  • layer_attr_values
  • deduped_tile_id (or tile hash, something to indicate deduplication)

then I think we can skip the {min,max}_{lon,lat} values

Are there any other of those "maybes" you think would be useful to include @bdon ?

@bdon
Copy link
Contributor

bdon commented Sep 13, 2023

  • If we have deduped_tile_id it should be an opaque type not not necessarily a hash (I will use offset uint64)

@bdon
Copy link
Contributor

bdon commented Sep 13, 2023

more notes while implementing:

  • instead of gzipped_tile_bytes why not archived_bytes, that makes it clear it is the size of the tile at-rest in the archive, not the result of gzipping the raw tile data which might be different depending on gzip params and implementation
  • should we sort the layers alphabetically? it would be ideal if the output of planetiler and go-pmtiles was identical (that means if we store a dedupe ID hash it needs to be re-calculated by go-pmtiles to match what planetiler does)

@msbarry msbarry marked this pull request as ready for review September 17, 2023 11:00
layerstats/README.md Outdated Show resolved Hide resolved
layerstats/README.md Outdated Show resolved Hide resolved
```

NOTE: this group by uses a lot of memory so you need to be running in file-backed
mode `duckdb analysis.duckdb` (not in-memory mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a one-liner to change the .tsv.gz into a .duckdb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be

duckdb analysis.duckdb -cmd "CREATE TABLE layerstats AS SELECT * FROM 'output.pmtiles.layerstats.tsv.gz';"

to drop you into a shell after importing the file, of -c "create... to just create the file - given the shortcut's not much shorter than the individual steps I'm inclined to just leave them as separate steps for clarity.

@msbarry
Copy link
Contributor Author

msbarry commented Sep 22, 2023

@sonarcloud
Copy link

sonarcloud bot commented Sep 22, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 8 Code Smells

88.5% 88.5% Coverage
0.0% 0.0% Duplication

@msbarry msbarry merged commit 1f23b55 into main Sep 22, 2023
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Expose more detailed tile size statistics
2 participants