Basic usage

Using pyrosm is straightforward. Following sections introduce the basics how to parse various kind of datasets from OSM Protobuf files.

How to?

Protobuf file: What is it and how to get one?

Pyrosm is designed to work with Protocolbuffer Binary Format (PBF) -files. This file format is a commonly used and efficient method to serialize and compress structured data which is also used by OpenStreetMap contributors to distribute the OSM data. There are a few free data providers distributing OSM data in PBF format, such as:

  • Geofabrik provides data for countries and other-regions, and

  • BBBike provides data for various cities across the world. BBBike also provides a handy tool that makes it possible to define your own region that will be used to extract data in PBF format (up to 512 MB in size).

Pyrosm provides a function get_data() that can be used to download any PBF dataset available at Geofabrik or BBBike to your local machine without the need to go to the website and do this manually. Currently, PBF data can be downloaded from 654 regions in the world.

  • To download data from a specific city such as "Helsinki", you can simply call:

from pyrosm import get_data
# Download data for the city of Helsinki
fp = get_data("Helsinki")
print(fp)
/tmp/pyrosm/Helsinki.osm.pbf

By default, the get_data() function downloads the PBF file into a local TEMP directory and returns a filepath to the location where the data was downloaded.

It is also possible to define your own directory where the data will be downloaded if you don’t want to store it in TEMP.

# Download the data into specified directory
fp = get_data("helsinki", directory="my_data")
print("Data was downloaded to:", fp)
Data was downloaded to: /mnt/c/HY-DATA/hentenka/KOODIT/Uni/Pyrosm/docs/my_data/Helsinki.osm.pbf

If you have downloaded the data previously into your computer, pyrosm will by default use that same data file. However, if you want to update the data, it is possible to specify update=True which will remove the old PBF file and download a fresh version from Geofabrik or BBBike.

# Refresh the data
# ----------------

# The first call won't download the data because it was already downloaded earlier
fp = get_data("Helsinki")
print(fp)

# This one will update the data and download the data
print("\nDownload will happen:")
fp = get_data("Helsinki", update=True)
print(fp)
/tmp/pyrosm/Helsinki.osm.pbf

Download will happen:
Downloaded Protobuf data 'Helsinki.osm.pbf' (28.3 MB) to:
'/tmp/pyrosm/Helsinki.osm.pbf'
/tmp/pyrosm/Helsinki.osm.pbf

“UserWarning: The Shapely GEOS …” ?

Following warning (or something similar) might appear, but is nothing to be worried about (will get fixed eventually): UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.8.1-CAPI-1.13.3). Conversions between both will be slow.

Available datasets

You can investigate the available datasets easily by calling:

from pyrosm.data import sources

# Print available source categories
sources.available.keys()
dict_keys(['africa', 'antarctica', 'asia', 'australia_oceania', 'central_america', 'europe', 'north_america', 'south_america', 'cities', 'subregions'])

The available datasets (654) have been divided into categories which makes it easier to navigate through the available PBF files.

The datasets are divided under continents, cities and subregions (countries with data divided into smaller subregions).

  • As an example, you can see all available data sources in Africa by calling:

# Prints a list of countries in Africa that can be downloaded
print(sources.africa.available)
['algeria', 'angola', 'benin', 'botswana', 'burkina_faso', 'burundi', 'cameroon', 'canary_islands', 'cape_verde', 'central_african_republic', 'chad', 'comores', 'congo_brazzaville', 'congo_democratic_republic', 'djibouti', 'egypt', 'equatorial_guinea', 'eritrea', 'ethiopia', 'gabon', 'ghana', 'guinea', 'guinea_bissau', 'ivory_coast', 'kenya', 'lesotho', 'liberia', 'libya', 'madagascar', 'malawi', 'mali', 'mauritania', 'mauritius', 'morocco', 'mozambique', 'namibia', 'niger', 'nigeria', 'rwanda', 'saint_helena_ascension_and_tristan_da_cunha', 'sao_tome_and_principe', 'senegal_and_gambia', 'seychelles', 'sierra_leone', 'somalia', 'south_africa', 'south_africa_and_lesotho', 'south_sudan', 'sudan', 'swaziland', 'tanzania', 'togo', 'tunisia', 'uganda', 'zambia', 'zimbabwe']
  • If you want to see all available cities that can be downloaded, call:

# Prints a list of all cities that can be downloaded
print(sources.cities.available)
['Aachen', 'Aarhus', 'Adelaide', 'Albuquerque', 'Alexandria', 'Amsterdam', 'Antwerpen', 'Arnhem', 'Auckland', 'Augsburg', 'Austin', 'Baghdad', 'Baku', 'Balaton', 'Bamberg', 'Bangkok', 'Barcelona', 'Basel', 'Beijing', 'Beirut', 'Berkeley', 'Berlin', 'Bern', 'Bielefeld', 'Birmingham', 'Bochum', 'Bogota', 'Bombay', 'Bonn', 'Bordeaux', 'Boulder', 'BrandenburgHavel', 'Braunschweig', 'Bremen', 'Bremerhaven', 'Brisbane', 'Bristol', 'Brno', 'Bruegge', 'Bruessel', 'Budapest', 'BuenosAires', 'Cairo', 'Calgary', 'Cambridge', 'CambridgeMa', 'Canberra', 'CapeTown', 'Chemnitz', 'Chicago', 'ClermontFerrand', 'Colmar', 'Copenhagen', 'Cork', 'Corsica', 'Corvallis', 'Cottbus', 'Cracow', 'CraterLake', 'Curitiba', 'Cusco', 'Dallas', 'Darmstadt', 'Davis', 'DenHaag', 'Denver', 'Dessau', 'Dortmund', 'Dresden', 'Dublin', 'Duesseldorf', 'Duisburg', 'Edinburgh', 'Eindhoven', 'Emden', 'Erfurt', 'Erlangen', 'Eugene', 'Flensburg', 'FortCollins', 'Frankfurt', 'FrankfurtOder', 'Freiburg', 'Gdansk', 'Genf', 'Gent', 'Gera', 'Glasgow', 'Gliwice', 'Goerlitz', 'Goeteborg', 'Goettingen', 'Graz', 'Groningen', 'Halifax', 'Halle', 'Hamburg', 'Hamm', 'Hannover', 'Heilbronn', 'Helsinki', 'Hertogenbosch', 'Huntsville', 'Innsbruck', 'Istanbul', 'Jena', 'Jerusalem', 'Johannesburg', 'Kaiserslautern', 'Karlsruhe', 'Kassel', 'Katowice', 'Kaunas', 'Kiel', 'Kiew', 'Koblenz', 'Koeln', 'Konstanz', 'LaPaz', 'LaPlata', 'LakeGarda', 'Lausanne', 'Leeds', 'Leipzig', 'Lima', 'Linz', 'Lisbon', 'Liverpool', 'Ljubljana', 'Lodz', 'London', 'Luebeck', 'Luxemburg', 'Lyon', 'Maastricht', 'Madison', 'Madrid', 'Magdeburg', 'Mainz', 'Malmoe', 'Manchester', 'Mannheim', 'Marseille', 'Melbourne', 'Memphis', 'MexicoCity', 'Miami', 'Moenchengladbach', 'Montevideo', 'Montpellier', 'Montreal', 'Moscow', 'Muenchen', 'Muenster', 'NewDelhi', 'NewOrleans', 'NewYorkCity', 'Nuernberg', 'Oldenburg', 'Oranienburg', 'Orlando', 'Oslo', 'Osnabrueck', 'Ostrava', 'Ottawa', 'Paderborn', 'Palma', 'PaloAlto', 'Paris', 'Perth', 'Philadelphia', 'PhnomPenh', 'Portland', 'PortlandME', 'Porto', 'PortoAlegre', 'Potsdam', 'Poznan', 'Prag', 'Providence', 'Regensburg', 'Riga', 'RiodeJaneiro', 'Rostock', 'Rotterdam', 'Ruegen', 'Saarbruecken', 'Sacramento', 'Saigon', 'Salzburg', 'SanFrancisco', 'SanJose', 'SanktPetersburg', 'SantaBarbara', 'SantaCruz', 'Santiago', 'Sarajewo', 'Schwerin', 'Seattle', 'Seoul', 'Sheffield', 'Singapore', 'Sofia', 'Stockholm', 'Stockton', 'Strassburg', 'Stuttgart', 'Sucre', 'Sydney', 'Szczecin', 'Tallinn', 'Tehran', 'Tilburg', 'Tokyo', 'Toronto', 'Toulouse', 'Trondheim', 'Tucson', 'Turin', 'UlanBator', 'Ulm', 'Usedom', 'Utrecht', 'Vancouver', 'Victoria', 'WarenMueritz', 'Warsaw', 'WashingtonDC', 'Waterloo', 'Wien', 'Wroclaw', 'Wuerzburg', 'Wuppertal', 'Zagreb', 'Zuerich']
  • Some of the countries have smaller sub-regions that can be downloaded separately (such as states in USA):

# Check all countries having sub-regions
print("All countries with sub-regions:", sources.subregions.available.keys())

# Check sub-regions in Brazil
print("Sub-regions in Brazil:", sources.subregions.brazil.available)
All countries with sub-regions: dict_keys(['brazil', 'canada', 'france', 'germany', 'great_britain', 'italy', 'japan', 'netherlands', 'poland', 'russia', 'usa'])
Sub-regions in Brazil: ['centro_oeste', 'nordeste', 'norte', 'sudeste', 'sul']

In a similar manner, you can easily investigate all other regions that are available for download.

When you want to download data for any of these areas, you just need to pass the name of the area into the get_data() -function:

# Download data for Aachen
fp = get_data("Aachen")
print(fp)
/tmp/pyrosm/Aachen.osm.pbf

Note

Some of the available names e.g. in cities are written in CamelCase format and some of the countries are written with underscore (e.g. canary_islands). Pyrosm tries to automatically identify different styles of writing the place name. For example, writing "Rio de Janeiro" works fine even though the name in cities.available list is written as "RiodeJaneiro":

# Passing names in slightly different style does not matter
fp = get_data("Rio de Janeiro")
print(fp)
/tmp/pyrosm/RiodeJaneiro.osm.pbf

What to do if you cannot find the data for your area of interest?

In case, you cannot find a region for your needs from the above data sources, it is possible to download data from your own area of interest by using BBBike’s extract tool, which allows you to parse OSM data from anywhere in the world and download it in Protocolbuffer Binary format.

Steps:

  1. Go to Extract tool

  2. Specify the format as Protocolbuffer (PBF) from the panel on the left

  3. Specify the name for your data file

  4. Provide the email addresss where a download link for the data file will be sent in a few minutes

  5. When the email arrives, download the data and use that with pyrosm as shown in the basic tutorial.

Initializing the Pyrosm OSM -reader object

When using Pyrosm, the first step is to initialize a specific reader object called OSM that is available from the pyrosm library:

# Import the library
import pyrosm

# Print information about the basic usage of the `OSM` reader object
help(pyrosm.OSM.__init__)
Help on function __init__ in module pyrosm.pyrosm:

__init__(self, filepath, bounding_box=None)
    Parameters
    ----------
    
    filepath : str
        Filepath to input OSM dataset ( *.osm.pbf )
    
    bounding_box : list | shapely geometry
        Filtering OSM data spatially is allowed by passing a
        bounding box either as a list `[minx, miny, maxx, maxy]` or
        as a Shapely Polygon/MultiPolygon or closed LineString/LinearRing.

As we can see from the documentation, the OSM object accepts two parameters:

  1. filepath which is the filepath to the PBF file (*.osm.pbf) which will be read (see info above how to get one), and

  2. bounding_box which is an optional parameter that can be used to filter OSM data geographically from specific area (see here for further details)

The following shows how to initialize an OSM reader object using a test dataset that comes with Pyrosm, and which can be retrieved using a get_data function:

import pyrosm

# Get filepath to test PBF dataset
fp = pyrosm.get_data("test_pbf")
print("Filepath to test data:", fp)

# Initialize the OSM object 
osm = pyrosm.OSM(fp)

# See the type
print("Type of 'osm' instance: ", type(osm))
Filepath to test data: /home/hentenka/miniconda3/envs/pyrosm/lib/python3.8/site-packages/pyrosm/data/test.osm.pbf
Type of 'osm' instance:  <class 'pyrosm.pyrosm.OSM'>

As we can see, the test dataset lives in my case somewhere under the miniconda3 package, and the type of the osm instance is something called pyrosm.pyrosm.OSM.

Notice that osm (lower case) is the actually initialized reader instance for the given PBF dataset that should always be used to make the calls for fetching different datasets from the OpenStreetMap PBF -file. Read further to see how things work.

Read street networks

Pyrosm makes it easy to filter street networks using the get_network() method. You can parse streets separately for different travel modes by specifying the type of network using network_type -parameter. The allowed network types are:

The following shows how to read all drivable roads from OSM. Notice that from here on, we will import the OSM reader object directly from the package:

from pyrosm import OSM
from pyrosm import get_data

# Pyrosm comes with a couple of test datasets 
# that can be used straight away without
# downloading anything
fp = get_data("test_pbf")

# Initialize the OSM parser object
osm = OSM(fp)

# Read all drivable roads
# =======================
drive_net = osm.get_network(network_type="driving")
drive_net.plot()
<AxesSubplot:>
_images/basics_26_1.png

The network contains various information that is parsed from the OSM data, and includes length column that contains information about the length of the road in meters (scroll right):

drive_net.head(2)
access bridge highway int_ref lanes lit maxspeed name oneway ref service surface id timestamp version tags osm_type geometry length
0 None None secondary None 2 None 80 Hurukselantie None 357 None asphalt 4732994 1441800394 23 {"name:fi":"Hurukselantie"} way MULTILINESTRING ((26.94310 60.52580, 26.94295 ... 1504.0
1 None None secondary None None None None None yes 170 None None 5184588 1378828296 7 None way MULTILINESTRING ((26.94778 60.52231, 26.94717 ... 242.0

Notice that each way in the network is represented as a MultiLineString geometry constructed from multiple road segments. This is how the data is represented by default in OSM. However, this differs if reading nodes and edges: in that case each road segment is represented as a separate row in data (to improve connectivity).

Hint

It is also possible to export network to routable graphs in various formats using to_graph() function (new in version 0.6.0). Read more from “Export street networks to graph -section”.

Understanding the “osm_type” -column values

pyrosm will create a column osm_type to the result which can contain values node, way or relation. These correspond to the three basic components of OpenStreetMap’s conceptual data model of the physical world:

  • nodes (points in space),

  • ways (linear features and area boundaries),

  • relations (sometimes used to explain how other elements work together).

Hence, the “way” values in osm_type column might not necessarily represent only LineString features, as they can also be Polygons or LinearRings. If you want to know the geometry types of your data, you can access such information with geopandas by calling (gdf here represents a GeoDataFrame): gdf["geometry_types"] = gdf.geom_type

Check an example here to see how to filter your GeoDataFrame based on specific geometry type.

Read buildings

from pyrosm import OSM
from pyrosm import get_data
fp = get_data("test_pbf")
# Initialize the OSM parser object
osm = OSM(fp)
buildings = osm.get_buildings()
buildings.plot()
<AxesSubplot:>
_images/basics_31_1.png

Read Points of Interest

# Read POIs such as amenities and shops
# =====================================
from pyrosm import OSM
from pyrosm import get_data
fp = get_data("helsinki_pbf")
# Initialize the OSM parser object
osm = OSM(fp)

# By default pyrosm reads all elements having "amenity", "shop" or "tourism" tag
# Here, let's read only "amenity" and "shop" by applying a custom filter that
# overrides the default filtering mechanism
custom_filter = {'amenity': True, "shop": True}
pois = osm.get_pois(custom_filter=custom_filter)

# Gather info about POI type (combines the tag info from "amenity" and "shop")
pois["poi_type"] = pois["amenity"]
pois["poi_type"] = pois["poi_type"].fillna(pois["shop"])

# Plot
ax = pois.plot(column='poi_type', markersize=3, figsize=(12,12), legend=True, legend_kwds=dict(loc='upper left', ncol=5, bbox_to_anchor=(1, 1)))
_images/basics_33_0.png

Read landuse

# Read landuse
# ============
from pyrosm import OSM
from pyrosm import get_data
fp = get_data("test_pbf")
# Initialize the OSM parser object
osm = OSM(fp)
landuse = osm.get_landuse()
landuse.plot(column='landuse', legend=True, figsize=(10,6))
<AxesSubplot:>
_images/basics_35_1.png

Read natural

# Read natural
# ============
from pyrosm import OSM
from pyrosm import get_data
fp = get_data("helsinki_pbf")
# Initialize the OSM parser object
osm = OSM(fp)
natural = osm.get_natural()
natural.plot(column='natural', legend=True, figsize=(10,6))
<AxesSubplot:>
_images/basics_37_1.png

Read boundaries

Pyrosm supports reading boundaries such as administrative borders from PBF using get_boundaries() -function. By default, the function reads all "administrative" borders from the PBF. You can adjust the type of boundary that is parsed from PBF by modifying boundary_type -parameter. You can also search boundaries for specific name using name parameter:

from pyrosm import OSM
from pyrosm import get_data
fp = get_data("helsinki_region_pbf")
osm = OSM(fp)

# Read all boundaries using the default settings
boundaries = osm.get_boundaries()
boundaries.plot(facecolor="none", edgecolor="blue")
<AxesSubplot:>
_images/basics_39_1.png

The following shows how to search a specific boundary using the name -parameter.

# Note: the following uses the same osm instance initialized above
selected_boundary = osm.get_boundaries(name="Punavuori")
selected_boundary.plot()
<AxesSubplot:>
_images/basics_41_1.png

The name search functionality supports partial text search, meaning that e.g. a query "vuori" would return all elements where the work "vuori" is included in the name tag (such as “Punavuori”):

# Use a partial name "vuori" to look for data
selected_boundary = osm.get_boundaries(name="vuori")
selected_boundary.plot()
<AxesSubplot:>
_images/basics_43_1.png

As we can see there were multiple boundaries in the data that included the word "vuori" in their name:

# Check all records that have the word "vuori" in their name
selected_boundary['name'].unique()
array(['Punavuori', 'Munkkivuori', 'Roihuvuori', 'Mustavuori',
       'Vilhonvuori'], dtype=object)

It is also possible to search different kind of boundaries from the PBF.

Supported boundary types are:

  • "administrative (default)

  • "national_park"

  • "political"

  • "postal_code"

  • "protected_area"

  • "aboriginal_lands"

  • "maritime"

  • "lot"

  • "parcel"

  • "tract"

  • "marker"

  • "all"

Let’s read all "protected_area" boundaries from the PBF:

# Note: the following uses the same osm instance initialized above 
protected_areas = osm.get_boundaries(boundary_type="protected_area")
protected_areas.plot()
<AxesSubplot:>
_images/basics_47_1.png

Read OSM data with custom filter

Pyrosm also allows making custom queries. For example, to parse all transit related OSM elements you can use following approach and create a custom filter combining multiple criteria:

from pyrosm import OSM
from pyrosm import get_data
fp = get_data("helsinki_pbf")

# Initialize the OSM parser object with test data from Helsinki
osm = OSM(fp)

# Test reading all transit related data (bus, trains, trams, metro etc.)
# Exclude nodes (not keeping stops, etc.)
routes = ["bus", "ferry", "railway", "subway", "train", "tram", "trolleybus"]
rails = ["tramway", "light_rail", "rail", "subway", "tram"]
bus = ['yes']
transit = osm.get_data_by_custom_criteria(custom_filter={
                                        'route': routes,
                                        'railway': rails,
                                        'bus': bus,
                                        'public_transport': True},
                                        # Keep data matching the criteria above
                                        filter_type="keep",
                                        # Do not keep nodes (point data)    
                                        keep_nodes=False, 
                                        keep_ways=True, 
                                        keep_relations=True)
transit.plot()
<AxesSubplot:>
_images/basics_49_1.png

Further information on how to make customized queries is available in Parsing OSM data with custom queries.

Filtering data based on bounding box

Quite often one might be needing to extract only a subset of the whole OSM PBF file covering e.g. a specific region. Pyrosm provides an easy way to filter even larger PBF files using a bounding box (rectangular shape) or a more complex geometric feature (e.g. a Polygon). In the following, we will go through the process of extracting a small sample from the whole PBF dataset for specific area of interest. We will use a data dump from Greater London region and extract data covering the Borough of Camden.

from pyrosm import OSM, get_data

# Download a dataset for Greater London (update if exists in the temp already)
fp = get_data("Greater London", update=True)
osm = OSM(fp)
Downloaded Protobuf data 'greater-london-latest.osm.pbf' (55.49 MB) to:
'C:\Users\hentenka\AppData\Local\Temp\pyrosm\greater-london-latest.osm.pbf'
# Read buildings (takes ~30 seconds)
buildings = osm.get_buildings()
buildings.head(2)
addr:city addr:country addr:full addr:housenumber addr:housename addr:postcode addr:place addr:street email name ... source start_date wikipedia id timestamp version tags geometry osm_type changeset
0 None None None None None None None None None Laurence House ... None None None 2956186 1469657765 2 None POLYGON ((-0.02162 51.44472, -0.02033 51.44469... way NaN
1 None None None None Town Hall SE6 4RU None Catford Broadway None Lewisham Town Hall ... None None None 2956187 1504282380 5 None POLYGON ((-0.02110 51.44523, -0.02132 51.44508... way NaN

2 rows × 41 columns

# Plot the buildings (will take awhile to plot)
buildings.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1c18de4b9c8>
_images/basics_54_1.png

Now have parsed quite a few buildings from the Greater London area (~488,000).

Let’s filter the data spatially and include only buildings from the Borough of Camden. There are a couple of ways how you can pass the bounding box information to the Pyrosm:

  1. You can specify the bounding box by a list of x- and y-coordinates (in decimal degrees) of the lower left corner and upper right corner of the geographical area (rectangular) that you want to keep as a result: [minx, miny, maxx, maxy]

  2. You can also specify the bounding by passing a Shapely Polygon, MultiPolygon or LinearRing (all closed geometries supported) that can be used to filter the data with a more complex geographical features.

We will now use the boundary of the Camden Borough as our spatial filter. For finding the boundaries of Camden Borough is easy by utilizing the get_boundaries() -function and using the name parameter:

# Get the borough of Camden as our bounding box
bounding_box = osm.get_boundaries(name="London Borough of Camden")
bounding_box.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1c2a2d6e588>
_images/basics_56_1.png

Now we can initialize the OSM reader with the given bounding box that will then keep the data only from the areas that are within the given bounding box:

# Get the shapely geometry from GeoDataFrame
bbox_geom = bounding_box['geometry'].values[0]

# Initiliaze with bounding box
osm = OSM(fp, bounding_box=bbox_geom)

Now the bounding box information is stored in the attribute bounding_box that will be applied every time when an extract of the PBF (e.g. buildings, roads, etc.) is parsed:

# Bounding box is now stored as an attribute 
osm.bounding_box
_images/basics_60_0.svg

Finally, let’s read the buildings now from the Camden Borough using our bounding box filter. Notice, that you do not need to make any changes to the actual get_buildings() call, as the bounding box information is read automatically from the osm instance (osm.bounding_box).

# Retrieve buildings for Camden
camden = osm.get_buildings()

Okay, now we have data for the Camden area! Let’s take a look what it looks like on a map. Here, we will color the building based on how it has been tagged in the OSM:

# Let's plot the buildings and specify colors according the type of the building
ax = camden.plot(column="building", figsize=(12,12), legend=True, legend_kwds=dict(loc='upper left', ncol=3, bbox_to_anchor=(1, 1)))
_images/basics_64_0.png

Great, now we can see that a subset of the data was taken according our bounding box coordinates.

We can utilize the same bounding box for filtering other datasets as well, which can be handy. Let’s also filter the walkable roads from the same area:

# Apply the same bounding box filter and retrieve walking network
walk = osm.get_network("walking")
walk.plot(color="k", figsize=(12,12), lw=0.7, alpha=0.6)
<matplotlib.axes._subplots.AxesSubplot at 0x1c1ecc78088>
_images/basics_66_1.png

Pyrosm/OSM tagging system

OpenStreetMap uses a “free tagging system” that allows the map to include an unlimited number of attributes describing each feature. A tag consists of two items, a key and a value. Tags describe specific features of map elements (nodes, ways, or relations) or changesets. Both items are free format text fields, but can often represent also numeric or other structured items (e.g. maxspeed attribute contains speed limit information at a given road represented in numbers) (OSM Wiki, 2020).

Because of this flexibility, OSM data tend to contain huge number of different attributes. Because keeping all of these attributes in their own columns is not very practical (the dataframe can end up having even hundreds of columns), Pyrosm implements its own tagging system where only specific tags are kept as columns (separately for each OSM key). All the rest of the attributes are stored into a separete column "tags" which is a valid JSON object.

It is possible to see these default tags from the osm instance directly by accessing its configuration settings. Let’s see how:

from pyrosm import OSM, get_data

# Initialize the OSM reader with test data
fp = get_data("test_pbf")
osm = OSM(fp)

# The instance has a configuration attribute containing:
print([item for item in osm.conf.__dict__.keys() if not item.startswith("_")])
['network_filters', 'tags']

Okay, from here we can see that the configuration includes network_filter attribute and tags attribute:

  • network_filter attribute contains information about the rules that are applied when parsing different kind of roads from the OSM

  • tags attribute contains information about the tags that are parsed into columns by default

Let’s take a closer look into the tags:

# Show all available tag attributes
osm.conf.tags.available
['aerialway',
 'aeroway',
 'amenity',
 'boundary',
 'building',
 'craft',
 'emergency',
 'geological',
 'highway',
 'historic',
 'landuse',
 'leisure',
 'natural',
 'office',
 'power',
 'public_transport',
 'railway',
 'route',
 'place',
 'shop',
 'tourism',
 'waterway']

This is a list basically containing all OSM primary features that can be parsed from the OSM (see wiki for details). Each of these items contain a list of default tags (OSM keys) that will be inserted into columns when parsing the OSM data with Pyrosm.

For example the default tags that will be turned into columns from buildings can be accessed by:

# Show all tags that are converted into columns from building features
osm.conf.tags.building
['addr:city',
 'addr:country',
 'addr:full',
 'addr:housenumber',
 'addr:housename',
 'addr:postcode',
 'addr:place',
 'addr:street',
 'email',
 'name',
 'opening_hours',
 'operator',
 'phone',
 'ref',
 'url',
 'website',
 'yes',
 'building',
 'amenity',
 'building:flats',
 'building:levels',
 'building:material',
 'building:max_level',
 'building:min_level',
 'building:fireproof',
 'building:use',
 'craft',
 'height',
 'internet_access',
 'landuse',
 'levels',
 'office',
 'operator',
 'shop',
 'source',
 'start_date',
 'wikipedia']

As we can see, there are quite a few attributes that will be parsed into columns if they exist in the data. The list is mostly based on the OSM documentation about Key:building but it also contains some generic attributes that are commonly useful for many types of OSM features such as name, address information, opening_hours, website etc. Similar approach is used with all OSM Keys listed above in conf.tags.available. If the data contains additional attributes not listed in the default tags, such attributes are stored separately into a column "tags".

Let’s make an example to understand this better:

# Parse buildings
buildings = osm.get_buildings()

# Print columns
buildings.columns
Index(['addr:city', 'addr:country', 'addr:housenumber', 'addr:postcode',
       'addr:street', 'name', 'opening_hours', 'phone', 'building',
       'building:levels', 'landuse', 'shop', 'source', 'id', 'timestamp',
       'version', 'tags', 'geometry', 'osm_type'],
      dtype='object')

Our test data contains quite many of the default tags as columns (not all though). We seem to have also some additional data in the “tags” columns which were not listed in the default tag list.

  • Let’s take a closer look at those:

# List "extra" tags that were associated with some of the buildings
buildings["tags"].unique()
array([None, '{"mml:class":"42211"}', '{"mml:class":"42221"}',
       '{"mml:class":"42261"}', '{"mml:class":"42241"}',
       '{"mml:class":"42212"}'], dtype=object)

As we can see, some of the OSM elements included information about "mml:class" which is additional data that might be relevant for some, but most probably not for most, hence it is not added as a column to the GeoDataFrame.

It is still possible to access the data values of these “extra tags” by parsing the data from the JSON e.g. as follows:

import json 

# Iterate over rows having extra tags and print out the values
rows_with_extra_info = buildings.dropna(subset=["tags"])


i = 0
for row in rows_with_extra_info.itertuples():
    
    # Read the JSON
    tags = json.loads(row.tags)
    
    # Print the keys and values
    for key, value in tags.items():
        print("Key:", key, ", value: ", value)
    
    # Continue only up to first 10 
    if i == 9:
        break
    i+=1
Key: mml:class , value:  42211
Key: mml:class , value:  42211
Key: mml:class , value:  42211
Key: mml:class , value:  42211
Key: mml:class , value:  42211
Key: mml:class , value:  42211
Key: mml:class , value:  42221
Key: mml:class , value:  42211
Key: mml:class , value:  42211
Key: mml:class , value:  42211

Controlling which OSM attributes are parsed into columns

In some cases, it might be useful to parse some of these “extra” attributes directly into columns. Doing this is easy with pyrosm which is demonstrated below.

from pyrosm import OSM, get_data
# Get test data 
fp = get_data("test_pbf")

# Initialize the reader
osm = OSM(fp)
            
buildings = osm.get_buildings()

# Print info
print("Existing columns:\n", buildings.columns)
print("\nAdditional attributes in the 'tags': \n", buildings.tags.unique())
Existing columns:
 Index(['addr:city', 'addr:country', 'addr:housenumber', 'addr:postcode',
       'addr:street', 'name', 'opening_hours', 'phone', 'building',
       'building:levels', 'landuse', 'shop', 'source', 'id', 'timestamp',
       'version', 'tags', 'osm_type', 'geometry'],
      dtype='object')

Additional attributes in the 'tags': 
 [None '{"mml:class":"42211"}' '{"mml:class":"42221"}'
 '{"mml:class":"42261"}' '{"mml:class":"42241"}' '{"mml:class":"42212"}']

The "tags" column includes additional information with key "mml:class". If we would like to parse this attribute also as a column in our resulting GeoDataFrame, we can easily do this by using extra_attributes -parameter which accepts a list of keys (one or multiple) that will be converted into columns:

# Parse buildings and store also "mml:class" as a column
buildings2 = osm.get_buildings(extra_attributes=["mml:class"])

# Print columns
buildings2.columns
Index(['addr:city', 'addr:country', 'addr:housenumber', 'addr:postcode',
       'addr:street', 'name', 'opening_hours', 'phone', 'building',
       'building:levels', 'landuse', 'shop', 'source', 'id', 'timestamp',
       'version', 'mml:class', 'osm_type', 'geometry'],
      dtype='object')

Great! Now the "mml:class" was also added as column in our GeoDataFrame:

buildings2.tail(5)
addr:city addr:country addr:housenumber addr:postcode addr:street name opening_hours phone building building:levels landuse shop source id timestamp version mml:class osm_type geometry
2188 None None None None None None None None residential None None None None 424115702 1465573852 1 42211 way POLYGON ((26.96337 60.52196, 26.96330 60.52205...
2189 None None None None None None None None residential None None None None 424115707 1465573852 1 42211 way POLYGON ((26.96773 60.53151, 26.96771 60.53167...
2190 None None None None None None None None residential None None None None 424115720 1465573853 1 42211 way POLYGON ((26.95398 60.52896, 26.95416 60.52883...
2191 None None None None None None None None residential None None None None 424115722 1465573853 1 42211 way POLYGON ((26.96623 60.53462, 26.96615 60.53469...
2192 None None None None None None None None residential None None None None 424115743 1465573855 1 42211 way POLYGON ((26.93940 60.52654, 26.93940 60.52662...

Now it is easy to access and use the values of the new column in a similar manner as any other column:

# Get unique values in the "mml:class" column
print(buildings2["mml:class"].unique())
[None '42211' '42221' '42261' '42241' '42212']

Export street networks to graph

If you want to analyze the street networks using your favourite network analysis library, you can export the street network from Pyrosm into a graph (new in version 0.6.0). Supported graphs are iGraph, NetworkX (compatible with OSMnx) and Pandana. Those libraries provide numerous possibilities to analyze different properties of the graph (e.g. centrality) or conduct e.g. shortest path analysis to find the fastest (or shortest) route from a location to another. Notice that the numerous algorithms provided by these libraries are not going to be integrated into Pyrosm, but you can easily export the graphs to these libraries and continue working with them.

Exporting the network into a graph can be done as follows:

  1. Retrieve the graph elements (nodes and edges) from a given OSM network by specifying nodes=True in the osm.get_network() function.

  2. Use osm.to_graph() function to convert the nodes and edges into a graph.

    • The output graph type can be specified with graph_type parameter. Available types are: "igraph" (default), "networkx" and "pandana".

Following sections show how to do this in practice.

What is a graph? (a super short intro)

Graphs are, in principle, quite simple data structures consisting of:

  • nodes (e.g. intersections on a street, or a person in social network), and

  • edges (a link that connects the nodes to each other)

A simple graph could look like this:

Simple graph with five nodes and edges between them. Simple graph with five nodes and edges between them.

Graph can be directed or undirected, which basically determines whether the roads can be travelled to any direction or whether the travel direction is restricted to certain direction (e.g. a one-way-street). A directed graph looks something like this:

Directed graph Directed graph.

Notes about Pyrosm graph building

  • Pyrosm will always create a directed graph when exporting the street network to graph (works similarly for all libraries).

    • When exporting network that is used for driving (i.e. network_type="driving"), pyrosm considers the oneway restrictions that are defined in oneway column in the data.

    • With “walking”, “cycling” and “all”, pyrosm creates a bidirectional graph, meaning that the travel is allowed to both directions.

  • By default, Pyrosm will only keep connected edges in the output graph (largest strongly connected component). This means that all “isolated islands” of the network will be filtered out because those cannot be reached from other parts of the network (you can change this behavior by specifying retain_all=True, see further info).

  • When constructing the graph all road segments are kept separately to enable good connectivity in the graph. However, it is possible to simplify the NetworkX graph (hence reducing it’s size) using OSMnx by merging all road segments belonging to the same link (i.e. road between two intersections). Read more from OSMnx docs here. (this might be integrated into Pyrosm in the future iterations of the library)

Read nodes and edges (first step)

The first step that needs to be done is to read the nodes and edges from the graph:

from pyrosm import OSM, get_data

# Initialize reader 
osm = OSM(get_data("test_pbf"))

# Read nodes and edges of the 'driving' network
nodes, edges = osm.get_network(nodes=True, network_type="driving")

# Plot nodes and edges on a map
ax = edges.plot(figsize=(6,6), color="gray")
ax = nodes.plot(ax=ax, color="red", markersize=2.5)
_images/basics_91_0.png

The map shows the nodes (red color) and edges (gray color) which connect the nodes to each other (thus constructing the network).

When parsing the nodes, two extra columns (u and v) are added to the GeoDataFrame. These columns specify the source (u) and target (u) nodes for each edge (also commonly called as from- and to-ids):

# Show the last 5 columns of the first 5 rows in edges 
edges.iloc[:5, -5:]
osm_type geometry u v length
0 way LINESTRING (26.94310 60.52580, 26.94295 60.52596) 36156596 2316826913 20.096
1 way LINESTRING (26.94295 60.52596, 26.94261 60.52639) 2316826913 3735963133 51.356
2 way LINESTRING (26.94261 60.52639, 26.94132 60.52804) 3735963133 277446336 196.370
3 way LINESTRING (26.94132 60.52804, 26.94108 60.52835) 277446336 3730253796 36.410
4 way LINESTRING (26.94108 60.52835, 26.93975 60.52998) 3730253796 277446337 195.452

The u and v values have corresponding data in the nodes GeoDataFrame (column id):

# 'id' column here corresponds to the 'u' and 'v' values in edges GeoDataFrame
nodes.head()
lon lat tags timestamp version changeset id geometry
0 26.943103 60.525798 None 1369300078 4 0 36156596 POINT (26.94310 60.52580)
1 26.942948 60.525962 {'highway': 'crossing'} 1369300072 1 0 2316826913 POINT (26.94295 60.52596)
2 26.942611 60.526393 None 1441800372 1 0 3735963133 POINT (26.94261 60.52639)
3 26.941323 60.528041 None 1282588818 4 0 277446336 POINT (26.94132 60.52804)
4 26.941076 60.528345 None 1441438154 1 0 3730253796 POINT (26.94108 60.52835)

Export to iGraph

Python’s iGraph library is used by default when exporting the nodes and edges to graph. iGraph is a good choice for analyzing large graphs (such as the street networks parsed with Pyrosm) as it is faster and more memory efficient than e.g. NetworkX.

To export the nodes and edges into directed graph that can be used with igraph library, you pass the data into osm.to_graph() -function:

from pyrosm import OSM, get_data
osm = OSM(get_data("test_pbf"))
nodes, edges = osm.get_network(nodes=True, network_type="driving")

# Create a graph for igraph from nodes and edges
G = osm.to_graph(nodes, edges)
G
<igraph.Graph at 0x7f793b2a5d60>

As an output, you get a directed igraph.Graph object that can be used for further analysis using the various functionalities of igraph.

See all available parameters of to_graph() from here and usage examples from working with graphs.

Export to NetworkX / OSMnx

NetworkX and OSMnx (focusing on street networks) are two widely used network analysis libraries for Python. Exporting the OSM street network to these libraries is easy. You just need to specify graph_type="networkx" when calling the to_graph() function:

from pyrosm import OSM, get_data
osm = OSM(get_data("test_pbf"))
nodes, edges = osm.get_network(nodes=True, network_type="driving")

# Export the nodes and edges to NetworkX graph
G = osm.to_graph(nodes, edges, graph_type="networkx")
G
<networkx.classes.multidigraph.MultiDiGraph at 0x7f7951272790>

As an output, you get a directed networkx MultiDiGraph object that can be used for further analysis using either NetworkX or OSMnx.

See all available parameters of to_graph() from here and usage examples from working with graphs.

Note

By default, when exporting to networkx, the edge and node attributes are named in such a way that you can directly start using osmnx functionalities. Also a column key (with value 0) is automatically added to the edge table to make the data compatible with osmnx.

If you want to disable these modifications, specify osmnx_compatible=False when exporting the data to graph, i.e:

G2 = osm.to_graph(nodes, edges, graph_type="networkx", osmnx_compatible=False)

Export to Pandana

Pandana is a Python library for conducting accessibility/reachability analysis using contraction hierarchies algorithm. It has useful functionalities to conduct more specific queries such as “find me all restaurants that are within 500 meters from given locations (e.g. buildings)”.

Exporting the OSM street network to Pandana works in a similar manner as with the previous ones. By specifying graph_type=”pandana” for the to_graph() function, the output will be a pandana graph. In addition, you can specify with pandana_weights -parameter which columns in your edges data is used as edge weights in the pandana graph. By default length column is used but you can add any other numerical column as an edge weight (you can use multiple weights in the same graph):

from pyrosm import OSM, get_data
osm = OSM(get_data("test_pbf"))
nodes, edges = osm.get_network(nodes=True, network_type="driving")

# Export the nodes and edges to Pandana graph
G = osm.to_graph(nodes, edges, graph_type="pandana", pandana_weights=["length"])
G
<pandana.network.Network at 0x7f7904c5c820>

As an output, you get a directed pandana Network object that can be used for further analysis using pandana.

See all available parameters of to_graph() from here and usage examples from working with graphs.

to_graph parameters explained

to_graph() function has multiple parameters that can be adjusted. Below you can read all available parameters, and their explanations:

from pyrosm import OSM, get_data
osm = OSM(get_data("test_pbf"))

# To see all available parameters and their explanations, simply call help
help(osm.to_graph)
Help on method to_graph in module pyrosm.pyrosm:

to_graph(nodes, edges, graph_type='igraph', direction='oneway', from_id_col='u', to_id_col='v', edge_id_col='id', node_id_col='id', force_bidirectional=False, network_type=None, retain_all=False, osmnx_compatible=True, pandana_weights=['length']) method of pyrosm.pyrosm.OSM instance
    `
    Export OSM network to routable graph. Supported output graph types are:
      - "igraph" (default),
      - "networkx",
      - "pandana"
    
    For walking and cycling, the output graph will be bidirectional by default
    (i.e. travel along the street is allowed to both directions). For driving,
    one-way streets are taken into account by default and the travel is restricted
    based on the rules in OSM data (based on "oneway" attribute).
    
    Parameters
    ----------
    
    nodes : GeoDataFrame
        GeoDataFrame containing nodes of the road network.
        Note: Use `osm.get_network(nodes=True)` to retrieve both the nodes and edges.
    
    edges : GeoDataFrame
        GeoDataFrame containing the edges of the road network.
    
    graph_type : str
        Type of the output graph. Available graphs are:
          - "igraph" --> returns an igraph.Graph -object.
          - "networkx" --> returns a networkx.MultiDiGraph -object.
          - "pandana" --> returns an pandana.Network -object.
    
    direction : str
        Name for the column containing information about the allowed driving directions
    
    from_id_col : str
        Name for the column having the from-node-ids of edges.
    
    to_id_col : str
        Name for the column having the to-node-ids of edges.
    
    edge_id_col : str
        Name for the column having the unique id for edges.
    
    node_id_col : str
        Name for the column having the unique id for nodes.
    
    force_bidirectional : bool
        If True, all edges will be created as bidirectional (allow travel to both directions).
    
    network_type : str (optional)
        Network type for the given data. Determines how the graph will be constructed.
        The network type is typically extracted automatically from the metadata of
        the edges/nodes GeoDataFrames. This parameter can be used if this metadata is not
        available for a reason or another. By default, bidirectional graph is created for walking, cycling and all,
        and directed graph for driving (i.e. oneway streets are taken into account).
        Possible values are: 'walking', 'cycling', 'driving', 'driving+service', 'all'.
    
    retain_all : bool
        if True, return the entire graph even if it is not connected.
        otherwise, retain only the connected edges.
    
    osmnx_compatible : bool (default True)
        if True, modifies the edge and node-attribute naming to be compatible with OSMnx
        (allows utilizing all OSMnx functionalities).
        NOTE: Only applicable with "networkx" graph type.
    
    pandana_weights : list
        Columns that are used as weights when exporting to Pandana graph. By default uses "length" column.