Saving and cropping data#

Pyrosm can write .osm.pbf files, not only read them. This lets you crop a large extract down to a smaller area to share or re-read faster (OSM.to_pbf), and write modified OSM data back to a valid PBF after editing tags or attributes in pandas (OSM.write_pbf).

How to?

Crop a PBF to a bounding box#

to_pbf() crops the source file by the OSM object’s bounding_box and writes a valid, re-readable PBF. The crop is complete-ways (a way is kept whole when any of its nodes is inside the box), it streams the file blob-by-blob (so it stays low on memory), and coordinates round-trip exactly.

import os
from pyrosm import OSM, get_data

fp = get_data("helsinki_pbf")

# Crop to a bounding box [minx, miny, maxx, maxy] in lon/lat
bbox = [24.93, 60.16, 24.96, 60.18]
cropped = OSM(fp, bounding_box=bbox).to_pbf()
print(os.path.basename(cropped), os.path.getsize(cropped), "bytes")
pyrosm_crop_10y3v1jx.osm.pbf 682143 bytes
# The cropped file reads back like any other PBF
OSM(cropped).get_buildings().shape
(490, 35)

Smaller output: compact and repack#

By default the crop keeps each source block’s string table, which is the fastest option. Two flags trade a little speed for a smaller file:

  • compact=True prunes each block’s string table to only the strings its kept elements reference.

  • repack=True re-packs the kept elements into densely filled blocks (as osmium/Osmosis produce) for the smallest output; it supersedes compact.

The written OSM data is identical for every option.

default = OSM(fp, bounding_box=bbox).to_pbf()
compact = OSM(fp, bounding_box=bbox).to_pbf(compact=True)
repack = OSM(fp, bounding_box=bbox).to_pbf(repack=True)
for label, path in [("default", default), ("compact", compact), ("repack", repack)]:
    print(f"{label:8s} {os.path.getsize(path):>7d} bytes")
default   682143 bytes
compact   679913 bytes
repack    689447 bytes

On this small bundled extract the three files are within a few percent, and repack’s fixed per-block overhead can even make it marginally larger — there is little sparsity to reclaim. The savings appear on large country → region crops, where the kept source blocks are sparse: a Finland → Helsinki crop drops from about 83 MB (default) to 65 MB (compact) to 59 MB (repack), matching what osmium/Osmosis produce.

Write modified OSM data back to a PBF#

write_pbf() writes the data you read back to a valid PBF after you edit it in pandas — for example to fill a missing maxspeed or add a computed travel_time tag. Each row updates the matching element (by osm_type + id); rows whose id is not in the source are added as new elements synthesised from their geometry.

import tempfile

osm = OSM(get_data("test_pbf"))
edges = osm.get_network("driving")

# Edit/add tags in pandas — here we tag every edge with a travel_time
edges["travel_time"] = 1.5

out_path = os.path.join(tempfile.gettempdir(), "modified.osm.pbf")
osm.write_pbf(edges, out_path)
print("wrote", os.path.basename(out_path), os.path.getsize(out_path), "bytes")
wrote modified.osm.pbf 136559 bytes
# Read it back, requesting the new tag as a column
reread = OSM(out_path).get_network("driving", extra_attributes=["travel_time"])
reread["travel_time"].value_counts(dropna=False)
travel_time
1.5    200
Name: count, dtype: int64