Changelog#
v0.10.0rc1 (Jun 22, 2026)#
First release candidate for v0.10.0. This release adds an opt-in out-of-core reading engine that decodes large PBF files in a single streaming pass with bounded memory, with optional parallel decoding and automatic result caching. The default in-memory reader is unchanged.
NEW: Add an opt-in out-of-core reading engine, selected with
OSM(filepath, engine="out_of_core")(the defaultengine="in_memory"is unchanged). It decodes the PBF in a single streaming pass – each block’s node coordinates and matching features are spilled to per-block files on disk, then only the coordinates the kept features reference are gathered to assemble geometries – so peak memory is bounded by the working set rather than the whole file, making whole-country extracts readable on modest machines, resolving the out-of-memory errors and kernel crashes previously reported when reading large extracts (#111, #147, #166, #205). Every feature method (get_network,get_buildings,get_pois,get_landuse,get_natural,get_boundaries,get_data_by_custom_criteria) and the optionscustom_filter,bounding_box,complete_relations,extra_attributes/tags_to_keepandkeep_metadatabehave as on the in-memory reader and return column-for-column identical GeoDataFrames;get_network(nodes=True)returns the graph-export nodes and edges. History (.osh.pbf) files and timestamped reads are served by the in-memory reader (#321, #322, #323, #324, #325, #327, #329)NEW: The out-of-core engine caches each layer’s result to a GeoParquet file under a temporary directory, keyed by the source file and the read parameters, and reuses it on identical later reads – even in a later Python session – instead of re-decoding the PBF. Caching uses the optional
pyarrowpackage; without it the reader still works but returns an in-memory GeoDataFrame without caching. Passingoutput="path"streams a layer straight to a GeoParquet file instead (#330, #331)NEW: Control out-of-core parallelism with a
workersparameter onOSM(...). By default the engine reads on a single core and the first out-of-core read reports how many CPU cores are available; passworkers="auto"to let pyrosm pick the count by file size (one worker per CPU core above ~70 MB, capped at the data-blob count), orworkers=Nfor an explicit count (reduced to the CPU-core count, with a warning, if it exceeds it). On macOS and Windows a parallel read launched from a standalone script must run under anif __name__ == "__main__":guard; without it the read falls back to a single process with a warning (#326, #328, #334)NEW: Add
OSMhelper methods for managing the out-of-core engine’s temporary files.OSM.list_cache()andOSM.clear_cache()list and remove the cached layer GeoParquet files, andOSM.list_downloads()andOSM.clear_downloads()list and remove the*.osm.pbfextractsget_datadownloads to the temp directory.list_cache/clear_cacheandclear_downloadstake an optional source file to scope the operation to a single file; the bundled package datasets are never touched (#336)
v0.9.1 (Jun 18, 2026)#
This release adds reading all tagged data without a filter and correct relation geometries for bounding-box reads, speeds up area-feature (Polygon) geometry building, and fixes incomplete-boundary handling and ambiguous region-name lookups.
NEW: Add a
complete_relationsparameter toOSM(...)(defaultFalse). When reading with abounding_box, a relation (e.g. a multipolygon or boundary) is assembled from only the member ways inside the box, so a relation straddling the box edge gets a partial geometry. Withcomplete_relations=Truethe reader fetches each such relation’s full member set (member ways and their nodes, even outside the box) so the geometry is correct, applying to every relation-returning layer (get_buildings,get_landuse,get_natural,get_boundaries,get_pois,get_data_by_custom_criteria). It is opt-in: the default reproduces the existing output, and completion adds two streaming passes over the file only when a relation actually has missing members. The fetched member ways are kept out of the normal way features, so other layers (e.g.get_network) are unaffected. Only member ways are completed; relations whose members are themselves relations (super-relations) are not. A whole-file read (nobounding_box) already holds every member, so the option is a no-op there. When a bounding-box read returns relations the box cut andcomplete_relationswas not enabled, aUserWarningreports how many relations were returned with incomplete geometry and points to the option (#236)NEW:
get_data_by_custom_criterianow acceptscustom_filter=None(the new default) to read every tagged element without enumerating tag keys – tagged nodes as Points, ways as Lines/Polygons and relations as (Multi)Polygons/Lines; standalone untagged ways are dropped (matching GDAL’s OSM driver). Whentags_as_columnsis not given it defaults to the union of the per-feature default tag columns, so common keys become columns and the rest land in the JSONtagscolumn, andkeep_metadata/keep_nodes/keep_ways/keep_relationsstill apply (#113)FIXED:
get_boundaries()no longer force-closes incomplete administrative boundaries into polygons. A boundary relation whose member ways run off the PBF extent cannot form a closed ring, and was previously closed with a spurious straight edge bridging the gap (the stray lines crossing boundary plots); such incomplete boundaries are now dropped, matching how osmium and GDAL skip areas they cannot assemble (#154)FIXED:
get_data()no longer silently returns the wrong extract when a dataset name is shared by multiple regions – e.g.get_data("georgia")(the US state vs. the country) now raises aValueErrorlisting the region-qualified alternatives instead of returning the first match. Region-qualified names are accepted (get_data("usa/georgia")vs.get_data("europe/georgia")), and names that resolve to the same file (e.g. a UK county reachable via bothgreat_britainandunited_kingdom) still resolve as before (#162)CHANGED: Vectorise closed-area (Polygon) way geometry construction – build the geometries for closed area ways (e.g.
get_buildings,get_landuse,get_natural) with a single batchedshapely.polygons/shapely.linearringscall instead of a per-way Python loop, falling back to the exact per-way builder for the rest (open ways, closed ways tagged as linear features, ways with dropped nodes), so the output is identical. Cutsget_buildingswall-clock ~28% on an Estonia extract (52.0 s -> 37.4 s), complementing the network-geometry vectorisation added in v0.9.0 (#315)
Thanks for all the contributors who helped to improve the library either via PRs or reporting bugs:
v0.9.0 (Jun 16, 2026)#
This release lowers the read path’s memory use and speeds it up, adds fetching data by bounding box and by place name, and writes OSM data back to PBF (editing attributes/tags and cropping).
NEW: Add a
tags_to_keepparameter to the feature methods (get_network,get_buildings,get_pois,get_landuse,get_natural,get_boundaries). When given, only those OSM tag keys are kept as columns (replacing the default tag-column set), reducing memory; structural columns, filtering andextra_attributesare unaffected and the default behaviour is unchanged (#87)NEW: Add a
keep_metadataparameter toOSM(...)(defaultTrue). Setkeep_metadata=Falseto drop the element metadata columns (timestamp,version,changeset) from the returned GeoDataFrames and skip decoding the per-node metadata while parsing, lowering memory use and parse time on node-heavy files; the default behaviour is unchanged and history (.osh.pbf) files keep the metadata they require (#87, #150)CHANGED: Replace the per-node coordinate lookup (previously a dict-of-dicts) with a compact
cykhashid->index map plus contiguous column arrays, cutting the read path’s peak memory (~12% on a 138 MB extract) and making coordinate lookups during geometry building a little faster, with no change to the returned data (#53)CHANGED: Build network way geometries with a single batched
shapely.linestringscall across ways (and skip the from/to ids and node-attribute records that a plainget_networkdiscards), cutting network geometry construction ~37% and the multi-layer wall-clock ~17% on a 138 MB extract, with no change to the returned data (#53)CHANGED: Stream PBF blocks through a generator (each block is parsed then discarded) instead of holding the whole decompressed file in a list, cutting the read path’s peak memory – ~20% for a whole-file read and ~50-70% for bounding-box reads, which no longer pay the whole-file cost. Output is unchanged; bounding-box reads re-stream the file once to complete boundary geometries, which adds roughly 10% time to those reads (#53)
CHANGED: When splitting node tags into columns, build only the tag-columns that actually occur in the data instead of materialising every candidate column and dropping the all-empty ones, cutting
get_pois(and other node-feature) assembly ~37% and the multi-layer wall-clock ~14% on a 138 MB extract, with no change to the returned data (#53)CHANGED: Apply the same build-only-occurring-columns approach when splitting way and relation tags into columns (
get_network,get_buildings,get_landuse,get_naturaland custom-criteria reads), so candidate tag-columns absent from the data are no longer materialised and dropped, cutting the way/relation tag-assembly step ~12-16% on a 138 MB extract (the full feature call improves less, as geometry construction dominates these features), with no change to the returned data (#53)NEW: Add
OSM.write_pbf(data, output_path)to write the OSM data back to a valid, re-readable PBF after modifying attributes/tags in pandas (e.g. fillmaxspeed, add atravel_timetag). The whole dataset that was read is written; each row ofdata(a GeoDataFrame or list of them) updates the tags of the matching element byosm_type``+``id, and rows whoseidis not in the source are added as new elements synthesized from their geometry (Point->node, LineString->way, hole-less Polygon->closed way; negative ids). Topology and coordinates come from the parsed data, so the output round-trips coordinates exactly and is read by pyrosm, osmium, GDAL and r5py/R5 (a modified pedestrian/car network exported from pyrosm builds a routable R5 network). v1 applies edits and additions, not deletions (#286, #285)NEW: Add
OSM.to_pbf(output_path=None, keep_relations=True, workers=1, compact=False, repack=False)to crop the source.osm.pbfby the object’sbounding_boxand write a valid, re-readable PBF to disk (a temp file when no path is given). Cropping is memory-efficient (streams the file blob-by-blob, holds only id sets) and “complete-ways” (a way is kept when >=1 node is inside the box and keeps its full node list); coordinates round-trip exactly.workers>1parallelizes the per-block work over a single process pool and produces byte-identical output (with a sequential fallback for files too small to amortize pool startup).compact=Trueprunes each output block’s string table to only the strings its kept elements reference, producing a smaller file (~18% smaller on a Helsinki-region crop) at the cost of some extra per-block work.repack=Truegoes further and re-chunks the kept elements into canonical, densely packed blocks (asosmium/Osmosis produce) for the smallest output, at the cost of speed (the re-pack write is sequential, thoughworkersstill parallelizes the selection); it supersedescompact. The defaultscompact=False/repack=Falsekeep the faster current behaviour, and the written OSM data – coordinates, tags and element metadata – is identical for every combination (#284, #6)CHANGED: Import
pyrosmlazily soimport pyrosmno longer eagerly loads geopandas/shapely (~2 s);OSM,get_dataandget_pathare still importable as before and load on first use (#284, #6)NEW: Add
pyrosm.get_data_by_bbox(bbox, crop=True, download=True, update=False, directory=None, output_path=None)to download – and by default crop – the OSM data covering a bounding box. It finds the smallest Geofabrik extract whose extent fully covers the box, downloads it, and (withcrop=True, the default) crops it to the box, returning the cropped file namedbbox_<minx>_<miny>_<maxx>_<maxy>.osm.pbf;crop=Falsereturns the full extract, anddownload=Falsereturns the covering extract’s PBF URL without downloading. The bounding box may be a[minx, miny, maxx, maxy]list/tuple/array in lon/lat, a Shapely geometry, or a GeoDataFrame/GeoSeries. It is backed by a vendored snapshot of Geofabrik’sindex-v1.json, so the extract lookup works offline (refresh it withscripts/update_geofabrik_index.py);update=Truerefreshes the index and re-downloads (#165, #197)NEW: Add
pyrosm.geocode(query)andpyrosm.get_data_by_geocoding(query, crop=True, download=True, update=False, directory=None, output_path=None)to fetch data by place name.geocodereturns a Shapely polygon for a place (e.g."Brighton and Hove, UK") via OpenStreetMap’s Nominatim service – its boundary polygon when available, otherwise its bounding-box rectangle.get_data_by_geocodinggeocodes the place, then downloads – and by default crops – the smallest Geofabrik extract that covers it, returning the cropped file named after the place (e.g.brighton-and-hove-uk.osm.pbf);crop=Falsereturns the full extract anddownload=Falsereturns the covering extract’s PBF URL. No new dependencies (stdliburllib/jsonwith the bundledcertifi, andshapely) (#165)
Thanks for all the contributors who helped to improve the library either via PRs or reporting bugs:
v0.8.0 (Jun 10, 2026)#
This is a major release that changes the PBF parsing backend.
CHANGED: Replace the Pyrobuf PBF backend with Google’s Protobuf (its fast C
upbbackend) for parsing the protocol-buffer messages. Pyrobuf is unmaintained and its source build fails with modernsetuptools(breakingpip install pyrosm); Google’s Protobuf is actively maintained and ships wheels and conda-forge packages for Python 3.10–3.14. Parsing speed is unchanged — see the backend benchmark. v0.8.0 is the first release built on Google’s Protobuf; v0.7.0 was the last to use Pyrobuf. (#276)NEW: Automate PyPI releases — a GitHub Actions
releaseworkflow builds binary wheels (cibuildwheel; Linux/macOS/Windows × CPython 3.10–3.14) and an sdist, then publishes them to PyPI via Trusted Publishing and creates a GitHub release when avX.Y.Ztag is pushed (#288, #287)NEW: Expose relation members under the
memberskey of the JSONtagscolumn (each{member_id, member_type, member_role}), so relations carry their members in the returned GeoDataFrame (#281, #216)NEW: Raise a clear
InvalidOSMFileErrorwhen the input.pbfis not a valid OSM PBF file, instead of a cryptic zlib/protobuf error (#280, #160)NEW: Accept
pathlib.Path(and anyos.PathLike) filepaths in theOSMconstructor, not just strings (#279, #145)FIXED: Decode node coordinates at full float64 precision (exact OSM 7-decimal values, matching GDAL/osmium); they were truncated to float32, introducing a ~0.1 m error, false extra precision, and visible distortion of straight geometry edges (#283, #245, #225)
FIXED: Normalize polygon/multipolygon ring orientation to the OGC/GeoJSON right-hand rule (exterior counter-clockwise, holes clockwise), matching osmium and QGIS; previously rings inherited the OSM way node order and were inconsistently wound (#282, #230)
FIXED:
get_bounding_boxnow reads the header bounding box correctly; it returnedNonefor every file after the protobuf backend migration (#280, #160)FIXED: Download data over HTTPS using certifi’s CA bundle instead of the OS trust store, so fetching datasets no longer fails on Windows with
ssl.SSLError [ASN1: NOT_ENOUGH_DATA](a CPython bug triggered by a malformed entry in the Windows certificate store) (#294)Refresh the README badges (#278)
Thanks for all the contributors who helped to improve the library either via PRs or reporting bugs:
v0.7.0 (Jun 7, 2026)#
NEW: Add
pandarmgraph-export backend (the maintained, NumPy 2-compatible fork of pandana); deprecategraph_type="pandana"(#271)NEW: Make cycling networks directed and honour
oneway:bicycle(#255)NEW: Add
custom_filtertoget_networkso custom-filtered networks also return graph nodes (#264)NEW: Add
street_countnode attribute to the NetworkX export (compatible with OSMnxbasic_stats) (#265)NEW: Support combining
custom_filterTruewith explicit tag values (#251)Support Python 3.10–3.14 (drop 3.9) and fix OSH parsing under pandas 3.0 (#248)
Return complete (uncut) geometries for ways/edges that straddle a bounding-box edge (#268)
Keep bounding-box network
nodesconsistent with the keptedgesso graph export works without manual cleanup (#269)Fix non-dense PBF node parsing (
parse_nodes) (#275)Handle bounding boxes that select no nodes instead of raising
KeyError(#267)Fix
custom_filterwithhighwayturning closed-way polygons into lines (#266)Fix network exclude/keep filters leaking on multi-key filters (#263)
Fix duplicate “phantom” nodes in the NetworkX export (#259)
Correct relation ids and surface a colliding
idtag asid_tag(#234, #249)Stop
get_*methods from mutating the shared default-tag config (#252)Fix spurious pandas chained-assignment warnings from the Cython frame builders (#256)
Fix Geofabrik UK sub-region downloads (moved under
united-kingdom) (#258)Fix reading PBF produced by
osmconvert(#238)Fix documentation URL (#223)
Measure Cython (
.pyx) coverage and raise overall test coverage (#273)Document the
pandarmgraph backend and thepandanadeprecation, and reading OSM history files (.osh.pbf) (#257)Fix the Read the Docs build; run live download tests on a single CI runner; bump GitHub Actions to Node 24 (#250, #254, #260)
Thanks for all the contributors who helped to improve the library either via PRs or reporting bugs:
v0.6.2 (Oct 26, 2023)#
Fix installation issues and support only Python >= 3.9 (#221)
Fix GA actions and use micromamba to install environments (#221)
Use Shapely 2.0 instead of pygeos (#214)
Thanks for the following contributors:
v0.6.1 (Oct 11, 2021)#
v0.6.0 (Nov 18, 2020)#
NEW: Adds possibility to export street networks to igraph, networkx and pandana (#57, #58, #70) - Add functionality to parse/return the nodes of the network when requested (#52) - Calculate length of the edge for networks in meters (#56, #70) - Filter out weakly connected component by default when exporting to graph (#59) - Add (vectorized) functionality to create directed edges according oneway rules (#68)
Fix installation issue with pip on Windows (#61)
Fix numpy deprecation warning (#50)
Update the documentation to use new theme (#74)
Add possibility to test the tool using JupyterLab in browser (#75)
Fix issue when parsing POIs using rare tags as a custom filter (#47)
Fix issue when filtering with bounding box polygon (#54)
Add documentation about exporting the networks to graphs (#69)
Improve documentation overall
v0.5.3 (Sep 13, 2020)#
Changes:
Ensures that geometry construction works with new Pygeos release v0.8.0 (#46)
v0.5.1/2 (May 11, 2020)#
Fix multi-level filtering
Add support for using “exclude” also with nodes and relations
Fix data source for New York City
v0.5.0 (May 7, 2020)#
Adds a function to download PBF data from Geofabrik and BBBike easily from hundreds of locations across the world
Improved geometry parsing for relations
Parse boundary geometries as Polygons instead of LinearRings (following OSM definition)
Fix invalid geometries automatically (self-intersection and “bowties”)
Add better documentation about custom filters
Make parsing more robust for incorrectly tagged OSM entries.
Bug fixes
Update website to a new theme.
v0.4.3 (April 27, 2020)#
Fixes a bug related to filtering with custom filters (see details here.)
v0.4.2 (April 23, 2020)#
Add functionality to parse boundaries from PBF (+ integrate name search for finding e.g. specific administrative boundary)
Support using Shapely Polygon / MultiPolygon to filter the data spatially
add possibility to add “extra attributes” (i.e. OSM keys) that will be parsed as columns.
improve documentation
v0.4.1 (April 17, 2020)#
add documentation
create website: https://pyrosm.readthedocs.io
v0.4.0 (April 16, 2020)#
read PBF using custom queries (allows anything to be fetched)
read landuse from PBF
read natural from PBF
improve geometry parsing so that geometry type is read automatically according OSM rules
modularize code-base
improve test coverage
v0.3.1 (April 15, 2020)#
generalize code base
read Points of Interest (POI) from PBF
v0.2.0 (April 13, 2020)#
read buildings from PBF into GeoDataFrame
enable applying custom filter to filter data: e.g. with buildings you can filter specific
types of buildings with {‘building’: [‘residential’, ‘retail’]} - handle Relations as well - handle cases where data is not available (warn user and return empty GeoDataFrame)
v0.1.8 (April 8, 2020)#
read street networks from PBF into GeoDataFrame (separately for driving, cycling, walking and all-combined)
filter data based on bounding box
v0.1.0 (April 7, 2020)#
first release on PyPi