Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryException: Out of Memory Error #50

Closed
StephanGeorg opened this issue Mar 5, 2024 · 7 comments · Fixed by #51
Closed

OutOfMemoryException: Out of Memory Error #50

StephanGeorg opened this issue Mar 5, 2024 · 7 comments · Fixed by #51

Comments

@StephanGeorg
Copy link

StephanGeorg commented Mar 5, 2024

Command

> quackosm DE.pbf --osm-tags-filter-file='addr_filter.json' --keep-all-tags --output DE-addr.parquet`
> quackosm -v         
QuackOSM 0.4.4

Context:
addr_filter.json

{
  "addr:housenumber": true,
  "addr:street": true,
  "addr:city": true,
  "addr:postcode": true,
  "addr:country": true,
  "addr:state": true,
  "addr:district": true,
  "addr:place": true,
  "addr:suburb": true,
  "addr:province": true,
  "addr:neighbourhood": true,
  "addr:hamlet": true,
  "addr:full": true,
  "addr:conscriptionnumber": true,
  "addr:subdistrict": true,
  "addr:municipality": true,
  "addr:unit": true,
  "addr:floor": true,
  "addr:interpolation": true,
  "addr:streetnumber": true,
  "addr:housename": true
}

DE.pbf
https://download.geofabrik.de/europe/germany-latest.osm.pbf

Traceback

➜  osm quackosm DE.pbf --osm-tags-filter-file='addr_filter.json' --keep-all-tags --output DE-addr.parquet
⠦ [   1/33] Reading nodes • 0:00:28
⠋ [   2/33] Filtering nodes - intersection • 0:00:00
⠴ [   3/33] Filtering nodes - tags • 0:00:04
⠙ [   4/33] Calculating distinct filtered nodes ids • 0:00:00
⠼ [   5/33] Reading ways • 0:01:25
⠹ [   6/33] Unnesting ways • 0:00:23
⠙ [   7/33] Filtering ways - valid refs • 0:00:16
⠋ [   8/33] Filtering ways - intersection • 0:00:00
⠦ [   9/33] Filtering ways - tags • 0:00:06
⠴ [  10/33] Calculating distinct filtered ways ids • 0:00:00
⠹ [  11/33] Reading relations • 0:00:08
⠧ [  12/33] Unnesting relations • 0:00:07
⠋ [  13/33] Filtering relations - valid refs • 0:00:00
⠋ [  14/33] Filtering relations - intersection • 0:00:00
⠋ [  15/33] Filtering relations - tags • 0:00:00
⠋ [  16/33] Calculating distinct filtered relations ids • 0:00:00
⠋ [  17/33] Loading required ways - by relations • 0:00:00
⠋ [  18/33] Calculating distinct required ways ids • 0:00:00
⠙ [  19/33] Saving filtered nodes with geometries • 0:00:08
⠼ [  20/33] Saving required nodes with structs • 0:00:31
⠧ [  21/33] Grouping filtered ways • 0:00:13
⠸ [  22/33] Saving filtered ways with linestrings  50% ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 2/4 • 0:00:51 < 0:00:49 • 24.19 s/it
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/stephan/.local/lib/python3.10/site-packages/quackosm/cli.py:563 in main                    │
│                                                                                                  │
│   560 │   │   warnings.simplefilter("ignore")                                                    │
│   561 │   │   logging.disable(logging.CRITICAL)                                                  │
│   562 │   │   if pbf_file:                                                                       │
│ ❱ 563 │   │   │   geoparquet_path = convert_pbf_to_gpq(                                          │
│   564 │   │   │   │   pbf_path=pbf_file,                                                         │
│   565 │   │   │   │   tags_filter=osm_tags_filter or osm_tags_filter_file,  # type: ignore       │
│   566 │   │   │   │   keep_all_tags=keep_all_tags,                                               │
│                                                                                                  │
│ ╭──────────────────────────── locals ─────────────────────────────╮                              │
│ │                    explode_tags = None                          │                              │
│ │                  filter_osm_ids = None                          │                              │
│ │                geom_filter_file = None                          │                              │
│ │             geom_filter_geocode = None                          │                              │
│ │             geom_filter_geojson = None                          │                              │
│ │       geom_filter_index_geohash = None                          │                              │
│ │            geom_filter_index_h3 = None                          │                              │
│ │            geom_filter_index_s2 = None                          │                              │
│ │                 geom_filter_wkt = None                          │                              │
│ │           geometry_filter_value = None                          │                              │
│ │                    ignore_cache = False                         │                              │
│ │                   keep_all_tags = True                          │                              │
│ │   number_of_geometries_provided = 0                             │                              │
│ │              osm_extract_source = <OsmExtractSource.any: 'any'> │                              │
│ │                 osm_tags_filter = None                          │                              │
│ │            osm_tags_filter_file = {                             │                              │
│ │                                   │   'addr:housenumber': True, │                              │
│ │                                   │   'addr:street': True,      │                              │
│ │                                   │   'addr:city': True,        │                              │
│ │                                   │   'addr:postcode': True,    │                              │
│ │                                   │   'addr:country': True,     │                              │
│ │                                   │   'addr:state': True,       │                              │
│ │                                   │   'addr:district': True,    │                              │
│ │                                   │   'addr:place': True,       │                              │
│ │                                   │   'addr:suburb': True,      │                              │
│ │                                   │   'addr:province': True,    │                              │
│ │                                   │   ... +11                   │                              │
│ │                                   }                             │                              │
│ │ osm_way_polygon_features_config = None                          │                              │
│ │                        pbf_file = PosixPath('DE.pbf')           │                              │
│ │                result_file_path = PosixPath('DE-addr.parquet')  │                              │
│ │                         version = None                          │                              │
│ │               working_directory = PosixPath('files')            │                              │
│ ╰─────────────────────────────────────────────────────────────────╯                              │
│                                                                                                  │
│ /home/stephan/.local/lib/python3.10/site-packages/quackosm/functions.py:217 in                   │
│ convert_pbf_to_gpq                                                                               │
│                                                                                                  │
│   214 │   │   │ 2140 rows (20 shown)                                                         3   │
│   215 │   │   └───────────────────────────────────────────────────────────────────────────────   │
│   216 │   """                                                                                    │
│ ❱ 217 │   return PbfFileReader(                                                                  │
│   218 │   │   tags_filter=tags_filter,                                                           │
│   219 │   │   geometry_filter=geometry_filter,                                                   │
│   220 │   │   working_directory=working_directory,                                               │
│                                                                                                  │
│ ╭──────────────────────────── locals ─────────────────────────────╮                              │
│ │                    explode_tags = None                          │                              │
│ │                  filter_osm_ids = None                          │                              │
│ │                 geometry_filter = None                          │                              │
│ │                    ignore_cache = False                         │                              │
│ │                   keep_all_tags = True                          │                              │
│ │ osm_way_polygon_features_config = None                          │                              │
│ │                        pbf_path = PosixPath('DE.pbf')           │                              │
│ │                result_file_path = PosixPath('DE-addr.parquet')  │                              │
│ │                     tags_filter = {                             │                              │
│ │                                   │   'addr:housenumber': True, │                              │
│ │                                   │   'addr:street': True,      │                              │
│ │                                   │   'addr:city': True,        │                              │
│ │                                   │   'addr:postcode': True,    │                              │
│ │                                   │   'addr:country': True,     │                              │
│ │                                   │   'addr:state': True,       │                              │
│ │                                   │   'addr:district': True,    │                              │
│ │                                   │   'addr:place': True,       │                              │
│ │                                   │   'addr:suburb': True,      │                              │
│ │                                   │   'addr:province': True,    │                              │
│ │                                   │   ... +11                   │                              │
│ │                                   }                             │                              │
│ │               working_directory = PosixPath('files')            │                              │
│ ╰─────────────────────────────────────────────────────────────────╯                              │
│                                                                                                  │
│ /home/stephan/.local/lib/python3.10/site-packages/quackosm/pbf_file_reader.py:204 in             │
│ convert_pbf_to_gpq                                                                               │
│                                                                                                  │
│    201 │   │   │   │   │   keep_all_tags=keep_all_tags,                                          │
│    202 │   │   │   │   │   explode_tags=explode_tags,                                            │
│    203 │   │   │   │   )                                                                         │
│ ❱  204 │   │   │   │   parsed_geoparquet_file = self._parse_pbf_file(                            │
│    205 │   │   │   │   │   pbf_path=pbf_path,                                                    │
│    206 │   │   │   │   │   result_file_path=Path(result_file_path),                              │
│    207 │   │   │   │   │   filter_osm_ids=filter_osm_ids,                                        │
│                                                                                                  │
│ ╭─────────────────────────────────────── locals ───────────────────────────────────────╮         │
│ │     explode_tags = False                                                             │         │
│ │   filter_osm_ids = []                                                                │         │
│ │     ignore_cache = False                                                             │         │
│ │    keep_all_tags = True                                                              │         │
│ │         pbf_path = PosixPath('DE.pbf')                                               │         │
│ │ result_file_path = PosixPath('DE-addr.parquet')                                      │         │
│ │             self = <quackosm.pbf_file_reader.PbfFileReader object at 0x7fa97e09a320> │         │
│ ╰──────────────────────────────────────────────────────────────────────────────────────╯         │
│                                                                                                  │
│ /home/stephan/.local/lib/python3.10/site-packages/quackosm/pbf_file_reader.py:494 in             │
│ _parse_pbf_file                                                                                  │
│                                                                                                  │
│    491 │   │   │   │   ],                                                                        │
│    492 │   │   │   )                                                                             │
│    493 │   │   │                                                                                 │
│ ❱  494 │   │   │   filtered_ways_with_linestrings = self._get_filtered_ways_with_linestrings(    │
│    495 │   │   │   │   osm_parquet_files=converted_osm_parquet_files,                            │
│    496 │   │   │   │   ways_refs_with_nodes_structs=ways_refs_with_nodes_structs,                │
│    497 │   │   │   )                                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       converted_osm_parquet_files = ConvertedOSMParquetFiles(                                │ │
│ │                                     │   nodes_valid_with_tags=<repr-error 'Connection Error: │ │
│ │                                     Connection has already been closed'>,                    │ │
│ │                                     │   nodes_filtered_ids=<repr-error 'Connection Error:    │ │
│ │                                     Connection has already been closed'>,                    │ │
│ │                                     │   ways_all_with_tags=<repr-error 'Connection Error:    │ │
│ │                                     Connection has already been closed'>,                    │ │
│ │                                     │   ways_with_unnested_nodes_refs=<repr-error            │ │
│ │                                     'Connection Error: Connection has already been closed'>, │ │
│ │                                     │   ways_required_ids=<repr-error 'Connection Error:     │ │
│ │                                     Connection has already been closed'>,                    │ │
│ │                                     │   ways_filtered_ids=<repr-error 'Connection Error:     │ │
│ │                                     Connection has already been closed'>,                    │ │
│ │                                     │   relations_all_with_tags=<repr-error 'Connection      │ │
│ │                                     Error: Connection has already been closed'>,             │ │
│ │                                     │   relations_with_unnested_way_refs=<repr-error         │ │
│ │                                     'Connection Error: Connection has already been closed'>, │ │
│ │                                     │   relations_filtered_ids=<repr-error 'Connection       │ │
│ │                                     Error: Connection has already been closed'>              │ │
│ │                                     )                                                        │ │
│ │                          elements = <repr-error 'Connection Error: Connection has already    │ │
│ │                                     been closed'>                                            │ │
│ │                      explode_tags = False                                                    │ │
│ │                    filter_osm_ids = []                                                       │ │
│ │ filtered_nodes_with_geometry_path = PosixPath('/home/stephan/data/osm/files/tmpyg0s3anf/fil… │ │
│ │                      ignore_cache = False                                                    │ │
│ │                     keep_all_tags = True                                                     │ │
│ │                          pbf_path = PosixPath('DE.pbf')                                      │ │
│ │                  result_file_path = PosixPath('DE-addr.parquet')                             │ │
│ │                              self = <quackosm.pbf_file_reader.PbfFileReader object at        │ │
│ │                                     0x7fa97e09a320>                                          │ │
│ │      ways_refs_with_nodes_structs = <repr-error 'Connection Error: Connection has already    │ │
│ │                                     been closed'>                                            │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/stephan/.local/lib/python3.10/site-packages/quackosm/pbf_file_reader.py:1172 in            │
│ _get_filtered_ways_with_linestrings                                                              │
│                                                                                                  │
│   1169 │   │   │   self._delete_directories(grouped_ways_tmp_path)                               │
│   1170 │   │                                                                                     │
│   1171 │   │   with TaskProgressBar("Saving filtered ways with linestrings", "22") as bar:       │
│ ❱ 1172 │   │   │   self._construct_ways_linestrings(                                             │
│   1173 │   │   │   │   bar=bar,                                                                  │
│   1174 │   │   │   │   groups=groups,                                                            │
│   1175 │   │   │   │   destination_dir_path=destination_dir_path,                                │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                          bar = <quackosm._rich_progress.TaskProgressBar object at            │ │
│ │                                0x7fa9755a8490>                                               │ │
│ │         destination_dir_path = PosixPath('/home/stephan/data/osm/files/tmpyg0s3anf/filtered… │ │
│ │            grouped_ways_path = PosixPath('/home/stephan/data/osm/files/tmpyg0s3anf/filtered… │ │
│ │        grouped_ways_tmp_path = PosixPath('/home/stephan/data/osm/files/tmpyg0s3anf/filtered… │ │
│ │                       groups = 3                                                             │ │
│ │            osm_parquet_files = ConvertedOSMParquetFiles(                                     │ │
│ │                                │   nodes_valid_with_tags=<repr-error 'Connection Error:      │ │
│ │                                Connection has already been closed'>,                         │ │
│ │                                │   nodes_filtered_ids=<repr-error 'Connection Error:         │ │
│ │                                Connection has already been closed'>,                         │ │
│ │                                │   ways_all_with_tags=<repr-error 'Connection Error:         │ │
│ │                                Connection has already been closed'>,                         │ │
│ │                                │   ways_with_unnested_nodes_refs=<repr-error 'Connection     │ │
│ │                                Error: Connection has already been closed'>,                  │ │
│ │                                │   ways_required_ids=<repr-error 'Connection Error:          │ │
│ │                                Connection has already been closed'>,                         │ │
│ │                                │   ways_filtered_ids=<repr-error 'Connection Error:          │ │
│ │                                Connection has already been closed'>,                         │ │
│ │                                │   relations_all_with_tags=<repr-error 'Connection Error:    │ │
│ │                                Connection has already been closed'>,                         │ │
│ │                                │   relations_with_unnested_way_refs=<repr-error 'Connection  │ │
│ │                                Error: Connection has already been closed'>,                  │ │
│ │                                │   relations_filtered_ids=<repr-error 'Connection Error:     │ │
│ │                                Connection has already been closed'>                          │ │
│ │                                )                                                             │ │
│ │                         self = <quackosm.pbf_file_reader.PbfFileReader object at             │ │
│ │                                0x7fa97e09a320>                                               │ │
│ │ ways_refs_with_nodes_structs = <repr-error 'Connection Error: Connection has already been    │ │
│ │                                closed'>                                                      │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/stephan/.local/lib/python3.10/site-packages/quackosm/pbf_file_reader.py:1297 in            │
│ _construct_ways_linestrings                                                                      │
│                                                                                                  │
│   1294 │   │   │   │   FROM ({current_ways_group_relation.sql_query()})                          │
│   1295 │   │   │   │   GROUP BY id                                                               │
│   1296 │   │   │   """)                                                                          │
│ ❱ 1297 │   │   │   self._save_parquet_file(                                                      │
│   1298 │   │   │   │   relation=ways_with_linestrings,                                           │
│   1299 │   │   │   │   file_path=destination_dir_path / f"group={group}",                        │
│   1300 │   │   │   )                                                                             │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │                         bar = <quackosm._rich_progress.TaskProgressBar object at             │ │
│ │                               0x7fa9755a8490>                                                │ │
│ │     current_ways_group_path = PosixPath('/home/stephan/data/osm/files/tmpyg0s3anf/filtered_… │ │
│ │ current_ways_group_relation = <repr-error 'Connection Error: Connection has already been     │ │
│ │                               closed'>                                                       │ │
│ │        destination_dir_path = PosixPath('/home/stephan/data/osm/files/tmpyg0s3anf/filtered_… │ │
│ │                       group = 2                                                              │ │
│ │           grouped_ways_path = PosixPath('/home/stephan/data/osm/files/tmpyg0s3anf/filtered_… │ │
│ │                      groups = 3                                                              │ │
│ │                        self = <quackosm.pbf_file_reader.PbfFileReader object at              │ │
│ │                               0x7fa97e09a320>                                                │ │
│ │       ways_with_linestrings = <repr-error 'Connection Error: Connection has already been     │ │
│ │                               closed'>                                                       │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /home/stephan/.local/lib/python3.10/site-packages/quackosm/pbf_file_reader.py:1076 in            │
│ _save_parquet_file                                                                               │
│                                                                                                  │
│   1073 │   def _save_parquet_file(                                                               │
│   1074 │   │   self, relation: "duckdb.DuckDBPyRelation", file_path: Path                        │
│   1075 │   ) -> "duckdb.DuckDBPyRelation":                                                       │
│ ❱ 1076 │   │   self.connection.sql(f"""                                                          │
│   1077 │   │   │   COPY (                                                                        │
│   1078 │   │   │   │   SELECT * FROM ({relation.sql_query()})                                    │
│   1079 │   │   │   ) TO '{file_path}' (                                                          │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ file_path = PosixPath('/home/stephan/data/osm/files/tmpyg0s3anf/filtered_ways_with_linestri… │ │
│ │  relation = <repr-error 'Connection Error: Connection has already been closed'>              │ │
│ │      self = <quackosm.pbf_file_reader.PbfFileReader object at 0x7fa97e09a320>                │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryException: Out of Memory Error: Failed to allocate block of 1048576 bytes
@RaczeQ
Copy link
Collaborator

RaczeQ commented Mar 7, 2024

I've published a 0.4.5 version that should automatically downscale the process on this particular step 😉
If you could check if it works know, that would be great.

@StephanGeorg
Copy link
Author

Unfortunately not: I killed the process after 50 minutes running on 16 core 64GB Ubuntu machine:

image

State was:

image

filtered_ways_with_linestrings contains single subfolder group=0

image

@RaczeQ
Copy link
Collaborator

RaczeQ commented Mar 7, 2024

It seems like the query now takes a long time, but it's not overflowing the memory.
I'll try to investigate how to speed this up.

@StephanGeorg
Copy link
Author

I have noticed the same behavior with version 0.4.4 as well. I killed/restarted the process a few times before it crashed.

@RaczeQ
Copy link
Collaborator

RaczeQ commented Mar 15, 2024

Version 0.5.0 has been deployed, it should work now 😄

@StephanGeorg
Copy link
Author

Awesome. I will test it on Monday.

@StephanGeorg
Copy link
Author

Yes, I can confirm: everything works as expected.

But first it failed again (process ran indefinitely) after updating to 0.5.0.

  • I re-downloaded germany pbf from Geofabrik: Failed again
  • I downloaded it from alternative source: Failed again
  • I ran it on my MacBook: Worked
  • I removed quackosm and all dependencies and re-installed it. Finally worked on my Linux machine.

thank you. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants