pyrosm.OSM#
- class pyrosm.OSM(filepath, bounding_box=None, keep_metadata=True, complete_relations=False, engine='in_memory', workers=None)#
OpenStreetMap PBF reader object.
- Parameters:
filepath (str) – Filepath to input OSM dataset (
*.osm.pbf)bounding_box (list | shapely geometry) – Filtering OSM data spatially is allowed by passing a bounding box either as a list [minx, miny, maxx, maxy] or as a Shapely Polygon/MultiPolygon or closed LineString/LinearRing.
keep_metadata (bool (default: True)) – Whether to keep the OSM element metadata columns (timestamp, version, changeset) in the returned GeoDataFrames. Set to False to drop them and reduce memory use when the metadata is not needed; the per-node metadata is then also skipped while parsing, which lowers peak memory on node-heavy files. History (.osh.pbf) parsing keeps the metadata it requires regardless of this flag.
complete_relations (bool (default: False)) – When reading with a bounding_box, a relation (e.g. a multipolygon or boundary) is normally assembled from only the member ways that fall inside the box, so a relation straddling the edge of the box comes out with a partial geometry. Set this to True to fetch each such relation’s full member set (member ways and their nodes, even outside the box) so the geometry is complete. This adds two extra streaming passes over the file (only when a relation actually has missing members), so it is opt-in. It has no effect on a whole-file read (no bounding_box), which already holds every member. Only member ways are completed; relations whose members are themselves relations (super-relations) are not.
engine (str (default: 'in_memory')) –
Which reader backend to use. ‘in_memory’ (the default) parses the whole file into memory. ‘out_of_core’ decodes the file in a single streaming pass with bounded peak memory, spilling intermediate data to disk.
The out-of-core backend reads on a single core by default. To decode in parallel pass workers=”auto” (or an explicit workers=N); see below. The worker processes start with spawn on macOS and Windows, which re-imports the program’s entry point, so a parallel OSM(…) read must run under an if __name__ == “__main__”: guard:
if __name__ == "__main__": osm = OSM(fp, engine="out_of_core", workers="auto") buildings = osm.get_buildings()
Without the guard a parallel read still completes – it falls back to a single process and warns – but it is not parallel. On Linux (fork) no guard is needed, and the default single-core read needs no guard anywhere.
History reads – an .osh.pbf file, or any feature call with a timestamp – are served by the in-memory reader even when engine=’out_of_core’: selecting the latest version of each element at/before the timestamp uses pyrosm’s get_latest_version, which pandas evaluates eagerly over the whole multi-version frame, so history is read in memory.
workers (int | str (default: None)) – Number of worker processes the ‘out_of_core’ engine uses to decode the file. By default (None) the engine reads on a single core, and the first out-of-core read reports how many CPU cores are available and how to opt into parallelism. Pass workers=”auto” to let pyrosm choose the count automatically – a single core for small files and one worker per CPU core for larger files – or workers=N for an explicit count (a count above the available CPU cores is reduced to the core count, with a warning). Parallel reads need the if __name__ == “__main__”: guard on macOS/Windows; pass workers=1 to read on a single core silently. Has no effect on the ‘in_memory’ engine.