Pyrosm – Python’s Rapid OSM Parser#

Pyrosm is a Python library for reading OpenStreetMap from Protocolbuffer Binary Format -files (*.osm.pbf) into Geopandas GeoDataFrames. Pyrosm makes it easy to extract various datasets from OpenStreetMap pbf-dumps including e.g. road networks, buildings, Points of Interest (POI), landuse, natural elements, administrative boundaries and much more. Fully customized queries are supported which makes it possible to parse any kind of data from OSM, even with more specific filters. Getting the data is just as easy: pyrosm allows you to search and download a PBF for any location in the world based on the place name (via geocoding) or by a bounding box. It can also crop a PBF to a smaller area before reading. Pyrosm is designed for speed, and currently it is one of the fastest PBF extraction and cropping tools available (see benchmarks).

Pyrosm is easy to use and it provides a somewhat similar user interface as OSMnx. The main difference between pyrosm and OSMnx is that OSMnx reads the data using an OverPass API, whereas pyrosm reads the data from local OSM data dumps that are downloaded from the PBF data providers (Geofabrik, BBBike). This makes it possible to parse OSM data faster and make it more feasible to extract data covering large regions.

_images/NY_roads_and_buildings.PNG

Explore a live example below – building footprints for Lower Manhattan to Midtown, parsed from an OpenStreetMap PBF with pyrosm and coloured by their construction year (grey = year unknown):

Interactive map built with lonboard (deck.gl) — drag to pan, scroll to zoom, hover a building for its details.

Show the code that builds this map
"""Generate the interactive New York buildings map embedded on the landing page.

Reads buildings for a Lower/Midtown Manhattan bounding box from the New York City extract,
colours them by construction year, and writes a self-contained lonboard map.
"""
from pathlib import Path

import matplotlib as mpl
from lonboard import Map, PolygonLayer
from lonboard.basemap import CartoStyle, MaplibreBasemap
from lonboard.colormap import apply_continuous_cmap

from pyrosm import OSM, get_data

OUT = Path(__file__).parent / "_static" / "ny_buildings.html"
# Lower Manhattan to Midtown, kept on the island (min_lon, min_lat, max_lon, max_lat)
BBOX = [-74.017, 40.703, -73.972, 40.762]


def extract_year(series):
    """Pull a 4-digit year out of a free-text OSM column, as float (NaN if none)."""
    return series.astype(str).str.extract(r"(\d{4})")[0].astype(float)


def join_address(row):
    parts = [row.get("addr:street"), row.get("addr:housenumber")]
    parts = [p for p in parts if isinstance(p, str) and p]
    return " ".join(parts) if parts else None


def main():
    osm = OSM(get_data("New York City"), bounding_box=BBOX)
    gdf = osm.get_buildings(extra_attributes=["start_date", "year_of_construction"])
    gdf = gdf.loc[gdf.geometry.geom_type != "MultiLineString"].copy()

    gdf["year"] = extract_year(gdf["year_of_construction"]).combine_first(
        extract_year(gdf["start_date"])
    )
    gdf["address"] = gdf.apply(join_address, axis=1)

    attr_cols = [c for c in ["year", "name", "address", "building", "amenity"] if c in gdf.columns]
    keep = attr_cols + [gdf.geometry.name]

    dated = gdf[gdf["year"].notna()].copy()
    undated = gdf[gdf["year"].isna()].copy()

    years = dated["year"].to_numpy()
    norm = mpl.colors.Normalize(vmin=years.min(), vmax=years.max(), clip=True)
    colors = apply_continuous_cmap(norm(years), mpl.colormaps["viridis"], alpha=0.85)

    dated_layer = PolygonLayer.from_geopandas(
        dated[keep],
        get_fill_color=colors,
        get_line_color=[40, 40, 40, 100],
        pickable=True,
        auto_highlight=True,
    )
    undated_layer = PolygonLayer.from_geopandas(
        undated[keep],
        get_fill_color=[180, 180, 180, 120],
        get_line_color=[120, 120, 120, 100],
        pickable=True,
        auto_highlight=True,
    )

    m = Map(
        [undated_layer, dated_layer],
        basemap=MaplibreBasemap(style=CartoStyle.DarkMatter),
        view_state={"longitude": -73.9945, "latitude": 40.7325, "zoom": 12.4},
    )
    m.to_html(OUT, title="New York City buildings by construction year (pyrosm)")
    print(f"wrote {OUT} ({OUT.stat().st_size / 1e6:.1f} MB) | "
          f"dated={len(dated)} undated={len(undated)} total={len(gdf)}")


if __name__ == "__main__":
    main()

Current features#

  • download PBF data easily from any location in the world

  • find and download the right extract for a bounding box or a place name (NEW in v0.9.0)

  • read street networks (separately for driving, cycling, walking and all-combined)

  • read buildings from PBF

  • read Points of Interest (POI) from PBF

  • read landuse from PBF

  • read “natural” from PBF

  • read boundaries from PBF (such as administrative borders)

  • read any other data from PBF by using a custom user-defined filter

  • read large PBF extracts (country level, even some continents) with bounded memory using the opt-in out-of-core engine, with parallel decoding and automatic result caching (NEW in v0.10.0)

  • filter data based on bounding box

  • control which OSM tags are parsed into columns

  • crop a PBF to a smaller area and write modified OSM data back to PBF (NEW in v0.9.0)

  • export networks as a directed graph to igraph, networkx and pandarm

When should I use Pyrosm?#

Pyrosm can of course be used whenever you need to parse data from OSM into geopandas GeoDataFrames. However, pyrosm is better suited for situations where you want to fetch data for whole city or larger regions (even whole country).

If you are interested to fetch OSM data for smaller areas such as neighborhoods, or search data around a specific location/address, we recommend using OSMnx which is more flexible in terms of specifying the area of interest and fetching only the data requested via API. That being said, it is also possible to extract neighborhood level information with pyrosm and filter data based on a bounding box (see docs).

License#

Pyrosm is licensed under MIT (see license).

Data © Geofabrik GmbH, BBBike and OpenStreetMap Contributors. All data from the OpenStreetMap is licensed under the OpenStreetMap License.

Citation#

If you use pyrosm in your work, please cite it. Pyrosm is archived on Zenodo with a citable DOI:

Tenkanen, H. (2026). pyrosm: A Python library for reading and writing OpenStreetMap PBF data with GeoDataFrames. (v0.10.0) Zenodo. https://doi.org/10.5281/zenodo.3755057

See How to cite pyrosm for the full reference and a BibTeX entry.

Getting started#

Contents

Indices and tables#