Saving and cropping data#
Pyrosm can write .osm.pbf files, not only read them. This lets you crop a large extract down to a smaller area to share or re-read faster (OSM.to_pbf), and write modified OSM data back to a valid PBF after editing tags or attributes in pandas (OSM.write_pbf).
How to?
Crop a PBF to a bounding box#
to_pbf() crops the source file by the OSM object’s bounding_box and writes a valid, re-readable PBF. The crop is complete-ways (a way is kept whole when any of its nodes is inside the box), it streams the file blob-by-blob (so it stays low on memory), and coordinates round-trip exactly.
import os
from pyrosm import OSM, get_data
fp = get_data("helsinki_pbf")
# Crop to a bounding box [minx, miny, maxx, maxy] in lon/lat
bbox = [24.93, 60.16, 24.96, 60.18]
cropped = OSM(fp, bounding_box=bbox).to_pbf()
print(os.path.basename(cropped), os.path.getsize(cropped), "bytes")
pyrosm_crop_10y3v1jx.osm.pbf 682143 bytes
# The cropped file reads back like any other PBF
OSM(cropped).get_buildings().shape
(490, 35)
Smaller output: compact and repack#
By default the crop keeps each source block’s string table, which is the fastest option. Two flags trade a little speed for a smaller file:
compact=Trueprunes each block’s string table to only the strings its kept elements reference.repack=Truere-packs the kept elements into densely filled blocks (asosmium/Osmosis produce) for the smallest output; it supersedescompact.
The written OSM data is identical for every option.
default = OSM(fp, bounding_box=bbox).to_pbf()
compact = OSM(fp, bounding_box=bbox).to_pbf(compact=True)
repack = OSM(fp, bounding_box=bbox).to_pbf(repack=True)
for label, path in [("default", default), ("compact", compact), ("repack", repack)]:
print(f"{label:8s} {os.path.getsize(path):>7d} bytes")
default 682143 bytes
compact 679913 bytes
repack 689447 bytes
On this small bundled extract the three files are within a few percent, and repack’s fixed per-block overhead can even make it marginally larger — there is little sparsity to reclaim. The savings appear on large country → region crops, where the kept source blocks are sparse: a Finland → Helsinki crop drops from about 83 MB (default) to 65 MB (compact) to 59 MB (repack), matching what osmium/Osmosis produce.
Write modified OSM data back to a PBF#
write_pbf() writes the data you read back to a valid PBF after you edit it in pandas — for example to fill a missing maxspeed or add a computed travel_time tag. Each row updates the matching element (by osm_type + id); rows whose id is not in the source are added as new elements synthesised from their geometry.
import tempfile
osm = OSM(get_data("test_pbf"))
edges = osm.get_network("driving")
# Edit/add tags in pandas — here we tag every edge with a travel_time
edges["travel_time"] = 1.5
out_path = os.path.join(tempfile.gettempdir(), "modified.osm.pbf")
osm.write_pbf(edges, out_path)
print("wrote", os.path.basename(out_path), os.path.getsize(out_path), "bytes")
wrote modified.osm.pbf 136559 bytes
# Read it back, requesting the new tag as a column
reread = OSM(out_path).get_network("driving", extra_attributes=["travel_time"])
reread["travel_time"].value_counts(dropna=False)
travel_time
1.5 200
Name: count, dtype: int64