Downloading OSM data#

Pyrosm is designed to work with Protocolbuffer Binary Format (PBF) -files. This is an efficient format that OpenStreetMap contributors use to distribute the OSM data. A few free providers distribute OSM data as PBF:

  • Geofabrik provides data for countries and other regions, and

  • BBBike provides data for various cities across the world.

Pyrosm can download any of these datasets for you, and it can also pick the right extract for a bounding box or a place name, so you rarely need to visit the providers’ websites by hand.

How to?

Download data by name#

Pyrosm provides a function get_data() that can download any PBF dataset available at Geofabrik or BBBike to your local machine without the need to go to the website and do this manually. Currently, PBF data can be downloaded from hundreds of regions in the world. To download data from a specific city such as "Helsinki", you can simply call:

from pyrosm import get_data
# Download data for the city of Helsinki
fp = get_data("Helsinki")
print(fp)
/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/Helsinki.osm.pbf

By default, the get_data() function downloads the PBF file into a local TEMP directory and returns a filepath to the location where the data was downloaded.

It is also possible to define your own directory where the data will be downloaded if you don’t want to store it in TEMP.

# Download the data into specified directory
fp = get_data("Helsinki", directory="/Users/Shared/MyFolder")
print("Data was downloaded to:", fp)
Downloaded Protobuf data 'Helsinki.osm.pbf' (59.91 MB) to:
'/Users/Shared/MyFolder/Helsinki.osm.pbf'
Data was downloaded to: /Users/Shared/MyFolder/Helsinki.osm.pbf

If you have downloaded the data previously into your computer, pyrosm will by default use that same data file. However, if you want to update the data, it is possible to specify update=True which will remove the old PBF file and download a fresh version from Geofabrik or BBBike.

# Refresh the data
# ----------------

# The first call won't download the data because it was already downloaded earlier
fp = get_data("Helsinki")
print(fp)

# This one will update the data and download the data
print("\nDownload will happen:")
fp = get_data("Helsinki", update=True)
print(fp)
/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/Helsinki.osm.pbf

Download will happen:
Downloaded Protobuf data 'Helsinki.osm.pbf' (59.91 MB) to:
'/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/Helsinki.osm.pbf'
/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/Helsinki.osm.pbf

Available datasets#

You can investigate the available datasets in Geofabrik and BBBike easily by calling:

from pyrosm.data import sources

# Print available source categories
sources.available.keys()
dict_keys(['africa', 'antarctica', 'asia', 'australia_oceania', 'central_america', 'europe', 'north_america', 'south_america', 'cities', 'subregions'])

The available datasets have been divided into categories which makes it easier to navigate through the available PBF files.

The datasets are divided under continents, cities and subregions (countries with data divided into smaller subregions).

  • As an example, you can see all available data sources in Africa by calling:

# Prints a list of countries in Africa that can be downloaded
print(sources.africa.available)
['algeria', 'angola', 'benin', 'botswana', 'burkina_faso', 'burundi', 'cameroon', 'canary_islands', 'cape_verde', 'central_african_republic', 'chad', 'comores', 'congo_brazzaville', 'congo_democratic_republic', 'djibouti', 'egypt', 'equatorial_guinea', 'eritrea', 'ethiopia', 'gabon', 'ghana', 'guinea', 'guinea_bissau', 'ivory_coast', 'kenya', 'lesotho', 'liberia', 'libya', 'madagascar', 'malawi', 'mali', 'mauritania', 'mauritius', 'morocco', 'mozambique', 'namibia', 'niger', 'nigeria', 'rwanda', 'saint_helena_ascension_and_tristan_da_cunha', 'sao_tome_and_principe', 'senegal_and_gambia', 'seychelles', 'sierra_leone', 'somalia', 'south_africa', 'south_africa_and_lesotho', 'south_sudan', 'sudan', 'swaziland', 'tanzania', 'togo', 'tunisia', 'uganda', 'zambia', 'zimbabwe']
  • If you want to see all available cities that can be downloaded, call:

# Prints a list of all cities that can be downloaded
print(sources.cities.available)
['Aachen', 'Aarhus', 'Adelaide', 'Albuquerque', 'Alexandria', 'Amsterdam', 'Antwerpen', 'Arnhem', 'Auckland', 'Augsburg', 'Austin', 'Baghdad', 'Baku', 'Balaton', 'Bamberg', 'Bangkok', 'Barcelona', 'Basel', 'Beijing', 'Beirut', 'Berkeley', 'Berlin', 'Bern', 'Bielefeld', 'Birmingham', 'Bochum', 'Bogota', 'Bombay', 'Bonn', 'Bordeaux', 'Boulder', 'BrandenburgHavel', 'Braunschweig', 'Bremen', 'Bremerhaven', 'Brisbane', 'Bristol', 'Brno', 'Bruegge', 'Bruessel', 'Budapest', 'BuenosAires', 'Cairo', 'Calgary', 'Cambridge', 'CambridgeMa', 'Canberra', 'CapeTown', 'Chemnitz', 'Chicago', 'ClermontFerrand', 'Colmar', 'Copenhagen', 'Cork', 'Corsica', 'Corvallis', 'Cottbus', 'Cracow', 'CraterLake', 'Curitiba', 'Cusco', 'Dallas', 'Darmstadt', 'Davis', 'DenHaag', 'Denver', 'Dessau', 'Dortmund', 'Dresden', 'Dublin', 'Duesseldorf', 'Duisburg', 'Edinburgh', 'Eindhoven', 'Emden', 'Erfurt', 'Erlangen', 'Eugene', 'Flensburg', 'FortCollins', 'Frankfurt', 'FrankfurtOder', 'Freiburg', 'Gdansk', 'Genf', 'Gent', 'Gera', 'Glasgow', 'Gliwice', 'Goerlitz', 'Goeteborg', 'Goettingen', 'Graz', 'Groningen', 'Halifax', 'Halle', 'Hamburg', 'Hamm', 'Hannover', 'Heilbronn', 'Helsinki', 'Hertogenbosch', 'Huntsville', 'Innsbruck', 'Istanbul', 'Jena', 'Jerusalem', 'Johannesburg', 'Kaiserslautern', 'Karlsruhe', 'Kassel', 'Katowice', 'Kaunas', 'Kiel', 'Kiew', 'Koblenz', 'Koeln', 'Konstanz', 'LaPaz', 'LaPlata', 'LakeGarda', 'Lausanne', 'Leeds', 'Leipzig', 'Lima', 'Linz', 'Lisbon', 'Liverpool', 'Ljubljana', 'Lodz', 'London', 'Luebeck', 'Luxemburg', 'Lyon', 'Maastricht', 'Madison', 'Madrid', 'Magdeburg', 'Mainz', 'Malmoe', 'Manchester', 'Mannheim', 'Marseille', 'Melbourne', 'Memphis', 'MexicoCity', 'Miami', 'Moenchengladbach', 'Montevideo', 'Montpellier', 'Montreal', 'Moscow', 'Muenchen', 'Muenster', 'NewDelhi', 'NewOrleans', 'NewYorkCity', 'Nuernberg', 'Oldenburg', 'Oranienburg', 'Orlando', 'Oslo', 'Osnabrueck', 'Ostrava', 'Ottawa', 'Paderborn', 'Palma', 'PaloAlto', 'Paris', 'Perth', 'Philadelphia', 'PhnomPenh', 'Portland', 'PortlandME', 'Porto', 'PortoAlegre', 'Potsdam', 'Poznan', 'Prag', 'Providence', 'Regensburg', 'Riga', 'RiodeJaneiro', 'Rostock', 'Rotterdam', 'Ruegen', 'Saarbruecken', 'Sacramento', 'Saigon', 'Salzburg', 'SanFrancisco', 'SanJose', 'SanktPetersburg', 'SantaBarbara', 'SantaCruz', 'Santiago', 'Sarajewo', 'Schwerin', 'Seattle', 'Seoul', 'Sheffield', 'Singapore', 'Sofia', 'Stockholm', 'Stockton', 'Strassburg', 'Stuttgart', 'Sucre', 'Sydney', 'Szczecin', 'Tallinn', 'Tehran', 'Tilburg', 'Tokyo', 'Toronto', 'Toulouse', 'Trondheim', 'Tucson', 'Turin', 'UlanBator', 'Ulm', 'Usedom', 'Utrecht', 'Vancouver', 'Victoria', 'WarenMueritz', 'Warsaw', 'WashingtonDC', 'Waterloo', 'Wien', 'Wroclaw', 'Wuerzburg', 'Wuppertal', 'Zagreb', 'Zuerich']
  • Some of the countries have smaller sub-regions that can be downloaded separately (such as states in USA):

# Check all countries having sub-regions
print("All countries with sub-regions:", sources.subregions.available.keys())

# Check sub-regions in Brazil
print("Sub-regions in Brazil:", sources.subregions.brazil.available)
All countries with sub-regions: dict_keys(['brazil', 'canada', 'france', 'germany', 'great_britain', 'italy', 'japan', 'netherlands', 'poland', 'russia', 'united_kingdom', 'usa'])
Sub-regions in Brazil: ['centro_oeste', 'nordeste', 'norte', 'sudeste', 'sul']

In a similar manner, you can easily investigate all other regions that are available for download.

When you want to download data for any of these areas, you just need to pass the name of the area into the get_data() -function:

# Download data for Aachen
fp = get_data("Aachen")
print(fp)
Downloaded Protobuf data 'Aachen.osm.pbf' (73.29 MB) to:
'/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/Aachen.osm.pbf'
/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/Aachen.osm.pbf

Note

Some of the available names e.g. in cities are written in CamelCase format and some of the countries are written with underscore (e.g. canary_islands). Pyrosm tries to automatically identify different styles of writing the place name. For example, writing "Rio de Janeiro" works fine even though the name in cities.available list is written as "RiodeJaneiro":

# Passing names in slightly different style does not matter
fp = get_data("Rio de Janeiro")
print(fp)
Downloaded Protobuf data 'RiodeJaneiro.osm.pbf' (35.3 MB) to:
'/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/RiodeJaneiro.osm.pbf'
/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/RiodeJaneiro.osm.pbf

Find the extract that covers a bounding box#

If your area of interest does not match a named extract, get_data_by_bbox() finds the smallest Geofabrik extract that covers a bounding box, downloads it, and – by default – crops it to the box. The bounding box can be a [minx, miny, maxx, maxy] list (lon/lat), a Shapely geometry, or a GeoDataFrame/GeoSeries.

Pass download=False to just look up the covering extract’s URL without downloading anything (handy to check what you would get before pulling a possibly large file):

from pyrosm import get_data_by_bbox

# Look up the covering extract for a bounding box (Brighton & Hove); no download
get_data_by_bbox([-0.245, 50.798, -0.016, 50.892], download=False)
Geofabrik extract covering the area: 'England' (id: england)
'https://download.geofabrik.de/europe/united-kingdom/england-latest.osm.pbf'

By default (crop=True, download=True) it downloads the covering extract and crops it to the box, returning a file named after the bounding box:

# Download the covering extract and crop it to the box
fp = get_data_by_bbox([-0.245, 50.798, -0.016, 50.892])
print("Cropped filename:", fp)

# crop=False keeps the full extract instead
fp = get_data_by_bbox([-0.245, 50.798, -0.016, 50.892], crop=False)
print("No cropping:", fp)
Geofabrik extract covering the area: 'England' (id: england)
Cropped filename: /var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/bbox_-0.245_50.798_-0.016_50.892.osm.pbf
Geofabrik extract covering the area: 'England' (id: england)
No cropping: /var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/england-latest.osm.pbf

Download data for a place name#

geocode() turns a place name into a Shapely polygon (via OpenStreetMap’s Nominatim service):

from pyrosm import geocode

area = geocode("Tartu, Estonia")
area.geom_type, [round(b, 3) for b in area.bounds]
Geocoded 'Tartu, Estonia' to: Tartu linn, Tartu maakond, Eesti
('Polygon', [26.667, 58.339, 26.798, 58.411])

get_data_by_geocoding() chains geocoding with the download and crop above – it geocodes the place, downloads the covering extract, and (by default) crops it to the place, returning a file named after the place:

from pyrosm import get_data_by_geocoding

# Download the covering extract and crop it to the place
fp = get_data_by_geocoding("Tartu, Estonia")
print("Cropped output:", fp)

# crop=False keeps the full extract; download=False just returns the extract URL
fp = get_data_by_geocoding("Tartu, Estonia", crop=False)
print("No cropping:", fp)

url = get_data_by_geocoding("Tartu, Estonia", download=False)
print("No download, return URL instead:", url)
Geocoded 'Tartu, Estonia' to: Tartu linn, Tartu maakond, Eesti
Geofabrik extract covering the area: 'Estonia' (id: estonia)
Cropped output: /var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/tartu-estonia.osm.pbf
Geocoded 'Tartu, Estonia' to: Tartu linn, Tartu maakond, Eesti
Geofabrik extract covering the area: 'Estonia' (id: estonia)
No cropping: /var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/estonia-latest.osm.pbf
Geocoded 'Tartu, Estonia' to: Tartu linn, Tartu maakond, Eesti
Geofabrik extract covering the area: 'Estonia' (id: estonia)
No download, return URL instead: https://download.geofabrik.de/europe/estonia-latest.osm.pbf