Working with graphs#

This notebook gives a few examples how Pyrosm can be used together with Python network analysis libraries. Before starting, ensure that you have read the basic documentation about how to export graphs with Pyrosm.

Contents:

  1. Find shortest path with pyrosm + networkx + osmnx

  2. Calculate the catchment areas of hospitals in Estonia with pyrosm + igraph.

  3. Calculate the job accessibility in Helsinki Region with pyrosm + pandarm

Note

  1. The idea of this page is not to give a comprehensive overview of all the possibilities that you can do with these network analysis libraries, but to just give you a bit of idea what can be done. You can get more detailed and further information from the documentation of the packages themselves.

  2. Integrating routing or network analysis functionalities is not planned for Pyrosm. There is a separate Python project called cafein that will be built on top of pyrosm including different routing and network analysis functionalities (will become available in 2021).

Export street networks to graph#

If you want to analyze the street networks using your favourite network analysis library, you can export the street network from Pyrosm into a graph (new in version 0.6.0). Supported graphs are iGraph, NetworkX (compatible with OSMnx), Pandarm and Pandana. Those libraries provide numerous possibilities to analyze different properties of the graph (e.g. centrality) or conduct e.g. shortest path analysis to find the fastest (or shortest) route from a location to another. Notice that the numerous algorithms provided by these libraries are not going to be integrated into Pyrosm, but you can easily export the graphs to these libraries and continue working with them.

Exporting the network into a graph can be done as follows:

  1. Retrieve the graph elements (nodes and edges) from a given OSM network by specifying nodes=True in the osm.get_network() function.

  2. Use osm.to_graph() function to convert the nodes and edges into a graph.

    • The output graph type can be specified with graph_type parameter. Available types are: "igraph" (default), "networkx", "pandarm" and "pandana" (deprecated; use "pandarm").

Following sections show how to do this in practice.

What is a graph? (a super short intro)

Graphs are, in principle, quite simple data structures consisting of:

  • nodes (e.g. intersections on a street, or a person in social network), and

  • edges (a link that connects the nodes to each other)

A simple graph could look like this:

Simple graph with five nodes and edges between them. Simple graph with five nodes and edges between them.

Graph can be directed or undirected, which basically determines whether the roads can be travelled to any direction or whether the travel direction is restricted to certain direction (e.g. a one-way-street). A directed graph looks something like this:

Directed graph Directed graph.

Notes about Pyrosm graph building#

  • Pyrosm will always create a directed graph when exporting the street network to graph (works similarly for all libraries).

    • When exporting network that is used for driving (i.e. network_type="driving"), pyrosm considers the oneway restrictions that are defined in oneway column in the data.

    • With “walking”, “cycling” and “all”, pyrosm creates a bidirectional graph, meaning that the travel is allowed to both directions.

  • By default, Pyrosm will only keep connected edges in the output graph (largest strongly connected component). This means that all “isolated islands” of the network will be filtered out because those cannot be reached from other parts of the network (you can change this behavior by specifying retain_all=True, see further info).

  • When constructing the graph all road segments are kept separately to enable good connectivity in the graph. However, it is possible to simplify the NetworkX graph (hence reducing it’s size) using OSMnx by merging all road segments belonging to the same link (i.e. road between two intersections). Read more from OSMnx docs here. (this might be integrated into Pyrosm in the future iterations of the library)

Read nodes and edges (first step)#

The first step that needs to be done is to read the nodes and edges from the graph:

from pyrosm import OSM, get_data

# Initialize reader 
osm = OSM(get_data("test_pbf"))

# Read nodes and edges of the 'driving' network
nodes, edges = osm.get_network(nodes=True, network_type="driving")

# Plot nodes and edges on a map
ax = edges.plot(figsize=(6,6), color="gray")
ax = nodes.plot(ax=ax, color="red", markersize=2.5)
_images/8beca949127b264674218f9f4c5397fc3f1879803b9f294e3c046f344f3b2334.png

The map shows the nodes (red color) and edges (gray color) which connect the nodes to each other (thus constructing the network).

When parsing the nodes, two extra columns (u and v) are added to the GeoDataFrame. These columns specify the source (u) and target (u) nodes for each edge (also commonly called as from- and to-ids):

# Show the last 5 columns of the first 5 rows in edges 
edges.iloc[:5, -5:]
osm_type geometry u v length
0 way LINESTRING (26.94310 60.52580, 26.94295 60.52596) 36156596 2316826913 20.096
1 way LINESTRING (26.94295 60.52596, 26.94261 60.52639) 2316826913 3735963133 51.356
2 way LINESTRING (26.94261 60.52639, 26.94132 60.52804) 3735963133 277446336 196.370
3 way LINESTRING (26.94132 60.52804, 26.94108 60.52835) 277446336 3730253796 36.410
4 way LINESTRING (26.94108 60.52835, 26.93975 60.52998) 3730253796 277446337 195.452

The u and v values have corresponding data in the nodes GeoDataFrame (column id):

# 'id' column here corresponds to the 'u' and 'v' values in edges GeoDataFrame
nodes.head()
lon lat tags timestamp version changeset id geometry
0 26.943103 60.525798 None 1369300078 4 0 36156596 POINT (26.94310 60.52580)
1 26.942948 60.525962 {'highway': 'crossing'} 1369300072 1 0 2316826913 POINT (26.94295 60.52596)
2 26.942611 60.526393 None 1441800372 1 0 3735963133 POINT (26.94261 60.52639)
3 26.941323 60.528041 None 1282588818 4 0 277446336 POINT (26.94132 60.52804)
4 26.941076 60.528345 None 1441438154 1 0 3730253796 POINT (26.94108 60.52835)

Export to iGraph#

Python’s iGraph library is used by default when exporting the nodes and edges to graph. iGraph is a good choice for analyzing large graphs (such as the street networks parsed with Pyrosm) as it is faster and more memory efficient than e.g. NetworkX.

To export the nodes and edges into directed graph that can be used with igraph library, you pass the data into osm.to_graph() -function:

from pyrosm import OSM, get_data
osm = OSM(get_data("test_pbf"))
nodes, edges = osm.get_network(nodes=True, network_type="driving")

# Create a graph for igraph from nodes and edges
G = osm.to_graph(nodes, edges)
G
<igraph.Graph at 0x7f793b2a5d60>

As an output, you get a directed igraph.Graph object that can be used for further analysis using the various functionalities of igraph.

See all available parameters of to_graph() from here and usage examples from working with graphs.

Export to NetworkX / OSMnx#

NetworkX and OSMnx (focusing on street networks) are two widely used network analysis libraries for Python. Exporting the OSM street network to these libraries is easy. You just need to specify graph_type="networkx" when calling the to_graph() function:

from pyrosm import OSM, get_data
osm = OSM(get_data("test_pbf"))
nodes, edges = osm.get_network(nodes=True, network_type="driving")

# Export the nodes and edges to NetworkX graph
G = osm.to_graph(nodes, edges, graph_type="networkx")
G
<networkx.classes.multidigraph.MultiDiGraph at 0x7f7951272790>

As an output, you get a directed networkx MultiDiGraph object that can be used for further analysis using either NetworkX or OSMnx.

See all available parameters of to_graph() from here and usage examples from working with graphs.

Note

By default, when exporting to networkx, the edge and node attributes are named in such a way that you can directly start using osmnx functionalities. Also a column key (with value 0) is automatically added to the edge table to make the data compatible with osmnx.

If you want to disable these modifications, specify osmnx_compatible=False when exporting the data to graph, i.e:

G2 = osm.to_graph(nodes, edges, graph_type="networkx", osmnx_compatible=False)

Export to Pandarm#

Pandarm (the maintained, NumPy 2-compatible fork of Pandana) is a Python library for conducting accessibility/reachability analysis using the contraction hierarchies algorithm. It has useful functionalities to conduct more specific queries such as “find me all restaurants that are within 500 meters from given locations (e.g. buildings)”.

Exporting the OSM street network to Pandarm works in a similar manner as with the previous ones. By specifying graph_type="pandarm" for the to_graph() function, the output will be a pandarm graph. In addition, you can specify with pandana_weights -parameter which columns in your edges data is used as edge weights in the graph. By default length column is used but you can add any other numerical column as an edge weight (you can use multiple weights in the same graph):

Note

graph_type="pandana" is still accepted for backwards compatibility, but pandana is deprecated (unmaintained and incompatible with NumPy 2 on Windows) and will be removed in a future release. Use "pandarm" instead.

from pyrosm import OSM, get_data
osm = OSM(get_data("test_pbf"))
nodes, edges = osm.get_network(nodes=True, network_type="driving")

# Export the nodes and edges to Pandarm graph
G = osm.to_graph(nodes, edges, graph_type="pandarm", pandana_weights=["length"])
G
<pandarm.network.Network at 0x7f7904c5c820>

As an output, you get a directed pandarm Network object that can be used for further analysis using pandarm.

See all available parameters of to_graph() from here and usage examples from working with graphs.

to_graph parameters explained#

to_graph() function has multiple parameters that can be adjusted. Below you can read all available parameters, and their explanations:

from pyrosm import OSM, get_data
osm = OSM(get_data("test_pbf"))

# To see all available parameters and their explanations, simply call help
help(osm.to_graph)
Help on method to_graph in module pyrosm.pyrosm:

to_graph(nodes, edges, graph_type='igraph', direction='oneway', from_id_col='u', to_id_col='v', edge_id_col='id', node_id_col='id', force_bidirectional=False, network_type=None, retain_all=False, osmnx_compatible=True, pandana_weights=['length']) method of pyrosm.pyrosm.OSM instance
    `
    Export OSM network to routable graph. Supported output graph types are:
      - "igraph" (default),
      - "networkx",
      - "pandana"
    
    For walking and cycling, the output graph will be bidirectional by default
    (i.e. travel along the street is allowed to both directions). For driving,
    one-way streets are taken into account by default and the travel is restricted
    based on the rules in OSM data (based on "oneway" attribute).
    
    Parameters
    ----------
    
    nodes : GeoDataFrame
        GeoDataFrame containing nodes of the road network.
        Note: Use `osm.get_network(nodes=True)` to retrieve both the nodes and edges.
    
    edges : GeoDataFrame
        GeoDataFrame containing the edges of the road network.
    
    graph_type : str
        Type of the output graph. Available graphs are:
          - "igraph" --> returns an igraph.Graph -object.
          - "networkx" --> returns a networkx.MultiDiGraph -object.
          - "pandana" --> returns an pandana.Network -object.
    
    direction : str
        Name for the column containing information about the allowed driving directions
    
    from_id_col : str
        Name for the column having the from-node-ids of edges.
    
    to_id_col : str
        Name for the column having the to-node-ids of edges.
    
    edge_id_col : str
        Name for the column having the unique id for edges.
    
    node_id_col : str
        Name for the column having the unique id for nodes.
    
    force_bidirectional : bool
        If True, all edges will be created as bidirectional (allow travel to both directions).
    
    network_type : str (optional)
        Network type for the given data. Determines how the graph will be constructed.
        The network type is typically extracted automatically from the metadata of
        the edges/nodes GeoDataFrames. This parameter can be used if this metadata is not
        available for a reason or another. By default, bidirectional graph is created for walking, cycling and all,
        and directed graph for driving (i.e. oneway streets are taken into account).
        Possible values are: 'walking', 'cycling', 'driving', 'driving+service', 'all'.
    
    retain_all : bool
        if True, return the entire graph even if it is not connected.
        otherwise, retain only the connected edges.
    
    osmnx_compatible : bool (default True)
        if True, modifies the edge and node-attribute naming to be compatible with OSMnx
        (allows utilizing all OSMnx functionalities).
        NOTE: Only applicable with "networkx" graph type.
    
    pandana_weights : list
        Columns that are used as weights when exporting to Pandana graph. By default uses "length" column.

Find shortest path with Pyrosm + NetworkX + OSMnx#

This tutorial shows how to construct simple shortest path routing between selected source and destination addresses using NetworkX and OSMnx. Pyrosm is used for extracting the data from OSM PBF file and constructing the graph, while NetworkX provides the basic network analysis capabilities and OSMnx provides many useful functionalities that makes it easy to work with street networks.

from pyrosm import OSM, get_data
import osmnx as ox

# Initialize the reader
osm = OSM(get_data("helsinki_pbf"))

# Get all walkable roads and the nodes 
nodes, edges = osm.get_network(nodes=True)

# Check first rows in the edge 
edges.head()
access bicycle bridge cycleway est_width foot footway highway int_ref lanes ... width id timestamp version tags osm_type geometry u v length
0 None None None None None None None unclassified None 2 ... None 4236349 1380031970 21 {"name:fi":"Erottajankatu","name:sv":"Skillnad... way LINESTRING (24.94327 60.16651, 24.94337 60.16644) 1372477605 292727220 9.370
1 None None None None None None None unclassified None 2 ... None 4236349 1380031970 21 {"name:fi":"Erottajankatu","name:sv":"Skillnad... way LINESTRING (24.94337 60.16644, 24.94340 60.16641) 292727220 2394117042 4.499
2 None None None None None None None unclassified None 2 ... None 4243035 1543430213 12 {"name:fi":"Korkeavuorenkatu","name:sv":"H\u00... way LINESTRING (24.94567 60.16767, 24.94567 60.16763) 296250563 2049084195 4.174
3 None None None None None None None unclassified None 2 ... None 4243035 1543430213 12 {"name:fi":"Korkeavuorenkatu","name:sv":"H\u00... way LINESTRING (24.94567 60.16763, 24.94569 60.16744) 2049084195 60072359 21.692
4 None None None None None None None unclassified None 2 ... None 4243035 1543430213 12 {"name:fi":"Korkeavuorenkatu","name:sv":"H\u00... way LINESTRING (24.94569 60.16744, 24.94571 60.16726) 60072359 6100704327 19.083

5 rows × 33 columns

Now we can easily export the nodes and edges into a directed NetworkX graph, as shown in the basic documentation:

# Create NetworkX graph
G = osm.to_graph(nodes, edges, graph_type="networkx")
G
<networkx.classes.multidigraph.MultiDiGraph at 0x7fa338335760>

As we can see the output is now a NetworkX MultiDiGraph.

By default, the graph is exported in such a way that you can continue your analysis using OSMnx library that has many useful functions for analyzing and visualizing street networks.

# Plot the graph with OSMnx
ox.plot_graph(G)
_images/057a8b7ceff8921aed47766cff7a4e68f8b035ac9a079afea5dd4d8fd9ede930.png
(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)

Note

When exporting to networkx graph, Pyrosm will by default change names of a few variables:

  • id --> osmid,

  • lon --> x,

  • lat --> y

Also a key column will be added to the edge attributes. This makes it possible to use OSMnx straight away when you export the data. You can distable this behavior by using osmnx_compatible=False in to_graph function.

Calculate shortest paths#

Let’s first see how we can now calculate shortest path between two addresses using networkx and osmnx:

source_address = "Bulevardi 5, Helsinki"
target_address = "Unioninkatu 40, Helsinki"

source = ox.geocode(source_address)
target = ox.geocode(target_address)
# Source and target are now points tuple of (x, y) coordinates
print(source)
print(target)
(60.1651049, 24.9387954)
(60.1726635, 24.949283641875)

Now we need to find the nearest graph nodes for our locations:

# Find the closest nodes from the graph
source_node = ox.get_nearest_node(G, source)
target_node = ox.get_nearest_node(G, target)
# Check the nodeids of the source/target node
print(source_node)
print(target_node)
537519894
1012307807

Now it is very easy to calculate the shortest path between those two addresses based on length using networkx function shortest_path(). OSMnx provides very handy function to plot our result on top of the graph:

# Find shortest path (by distance)
import networkx as nx
route = nx.shortest_path(G, source_node, target_node, weight="length")
fig, ax = ox.plot_graph_route(G, route, route_linewidth=6, node_size=0, bgcolor='k')
_images/fdc88d641b2e16edfdc6d3a260c933c43d0de8f3e414ab9281c03ac22bd92794.png

That’s it! Following this approach it is easy to find shortest paths between selected locations.

You can read many more examples from OSMnx examples gallery.

Note

Although NetworkX is very easy to use, it tends to use quite a lot of memory and it is relatively slow with large networks. Hence, the default output graph for pyrosm is iGraph, which performs much better and consumes less memory.

Calculate the catchment areas of hospitals in Estonia with pyrosm + igraph#

Analyzing the catchment areas of specific services (such as hospitals) is a typical example where large scale spatial network analysis is used. In this example, we will see how pyrosm can be used together with igraph (+ a few other libraries) to conduct large scale network analysis and find the closest hospital for each building in (mainland) Estonia. The whole process should take less than 10 minutes with computer having 16GB of available RAM. The end result will look something like following:

Catchment areas of mainland hospitals in Estonia

Warning

As OpenStreetMap is a voluntarily maintained data source, the hospitals extracted from OSM might not be up-to-date and some of them might be missing or otherwise incorrect. The hospitals represented in this example can be of any kind (not resticted e.g. to acute care hospitals). Lastly, it would make more sense to assess the number of inhabitants instead of buildings (after all, it’s the people who need care, not the buildings). HOWEVER, the main purpose of this demo is just to give you an idea what can be done, not to make a scientific assessment of health care accessibility in Estonia. If you find a mistake in the OSM data, please contribute and make it better by suggesting an edit at www.openstreetmap.org/

  • Let’s start by defining a few helper functions for our analysis:

from pyrosm import OSM, get_data
import geopandas as gpd
import pandas as pd
from sklearn.neighbors import BallTree
import numpy as np
import mapclassify as mc
import matplotlib.pyplot as plt
import time

def get_igraph_nodes(G):
    """Retrieves a frame from nodes of the igaph"""
    attributes = G.vs.attribute_names()
    if len(attributes) == 0:
        raise ValueError("Nodes does not have data.")
    data = {name: G.vs[name] for name in attributes}
    if "geometry" in attributes:
        return gpd.GeoDataFrame(data, 
                                geometry="geometry", 
                                crs="epsg:4326")
    return pd.DataFrame(data)

def get_nearest(src_points, candidates, k_neighbors=1):
    """Find nearest neighbors for all source points from a set of candidate points"""
    tree = BallTree(candidates, leaf_size=15, metric='haversine')
    distances, indices = tree.query(src_points, k=k_neighbors)
    distances = distances.transpose()
    indices = indices.transpose()
    closest = indices[0]
    closest_dist = distances[0]
    return (closest, closest_dist)

def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
    """
    For each point in left_gdf, find closest point in right GeoDataFrame and return them.
    
    For further info, take a look this lesson: 
        https://autogis-site.readthedocs.io/en/latest/notebooks/L3/06_nearest-neighbor-faster.html
    """
    left_geom_col = left_gdf.geometry.name
    right_geom_col = right_gdf.geometry.name
    right = right_gdf.copy().reset_index(drop=True)
    left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: 
                                                          (geom.x * np.pi / 180, geom.y * np.pi / 180)
                                                         ).to_list())
    right_radians = np.array(right[right_geom_col].apply(lambda geom: 
                                                         (geom.x * np.pi / 180, geom.y * np.pi / 180)
                                                        ).to_list())
    closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
    closest_points = right.loc[closest]
    closest_points = closest_points.reset_index(drop=True)
    if return_dist:
        earth_radius = 6371000  # meters
        closest_points['distance'] = dist * earth_radius
    return closest_points

def find_nearest_nodeids(nodes, src_gdf):
    """Finds the nearest node-ids for all points in 'src_gdf'."""
    nearest = nearest_neighbor(src_gdf, nodes, return_dist=True)
    return list(set(nearest["node_id"].values)), nearest["distance"].values
  • Retrieve the network, buildings and hospitals for Estonia:

# Get the data
osm = OSM(get_data("estonia"))
nodes, edges = osm.get_network(nodes=True, network_type="driving")
hospitals = osm.get_pois({"amenity": ["hospital"]})
buildings = osm.get_buildings()

# Track the number of buildings
building_cnt = len(buildings)
  • Create the directed graph for Estonia:

# Create graph
G = osm.to_graph(nodes, edges)
  • Get nodes from the graph

    • This step needs to be done because some of the nodes originally parsed from the street network are likely to be dropped out when the graph is generated (because unconnected edges are removed). Note: you can keep the whole network by specifying retain_all=True when calling to_graph().

nodes = get_igraph_nodes(G)
  • Find closest network node for each hospital:

# Ensure that all hospitals are represented as a point (take centroid)
# Warning: using a centroid is not necessarily a smart thing if doing the analysis for real, 
# but for the sake of simplicity we assume that the centroid of the hospital polygon 
# is the destination to be reached.
hospitals["geometry"] = hospitals.centroid

# Get a list of nearest source-ids of the network to all hospitals (with distance)
# Warning: Using the nearest node (in stead of nearest edge) can produce slightly incorrect results, 
# but for the sake of simplicity we use nodes.
src_ids, euclidean_distance = find_nearest_nodeids(nodes, hospitals)
  • Calculate travel distances to all hospitals in Estonia:

# Keep track of how many new columns are inserted to the frame
src_cnt = len(src_ids)

# Iterate over hospitals and calculate the network distances
# Note: this could also be run by passing all `src_ids` at once 
# to the shortest_paths_dijkstra -function. 
for src_id, distance_to_closest_node in zip(src_ids, euclidean_distance):
    #print(f"Calculate shortest paths to {src_id}")
    
    # Calculate shortest path lengths to given locations
    path_lengths = G.shortest_paths_dijkstra(source=src_id, weights="length", mode="IN")
    
    # Add the euclidean distance between the hospital and closest node in the network 
    # to keep track of the door-to-door travel distance
    path_lengths = np.array(path_lengths) + distance_to_closest_node

    # Attach the path lenghts to nodes
    nodes[f"distance_to_{src_id}"] = path_lengths[0]

At this point, we have distance from all nodes in the network to each hospital. They are stored as separate columns in the nodes GeoDataFrame.

  • Next, we calculate which of the hospitals is the closest one for each node:

# Calculate distance to the closest hospital
nodes["distance_to_closest"] = nodes.iloc[:, -src_cnt:].min(axis=1)

# At this point you could already see the catchment areas from each road network node 
# to the closest hospital.
# Comment out the following if you want to see:
# ax = nodes.plot(column="distance_to_closest", cmap="RdYlBu", markersize=0.5)

At this point, we have calculated the catchments for each hospital, but those are not linked yet to buildings.

  • Let’s find the closest node for each building and keep track of the distance between them:

# Find closest node for each building (centroid)
# Warning: Again, taking a centroid of a building might not be the smartest thing to do for real..
# But for the sake of simplicity, we do it such a way
buildings["centroid"] = buildings.centroid
buildings = buildings.set_geometry("centroid")

# Note: With ~900000 x ~830000 point pairs, 
# this takes awhile to solve. 
# Also keep track of the distance between the points.
closest = nearest_neighbor(buildings, nodes, return_dist=True)
buildings["node_id"] = closest["node_id"]
buildings["distance_to_closest_node"] = closest["distance"]
  • Now we can associate the distance information to closest hospital into the buildings:

# Link the distance information
access = buildings.merge(nodes[["distance_to_closest", "node_id"]], on="node_id")

# Add the (Euclidean) distance between building 
# and the closest node in the road network to get a full "door-to-door" distance
access["distance"] = access["distance_to_closest"] + access["distance_to_closest_node"]

# Calculate distance in kilometers (meters by default)
access["distance_km"] = (access["distance"] / 1000).round(1)
  • Some of the buildings might be very far from the closest road. Drop buildings that are further than 5km from the road:

# Drop such buildings that are further than 5 km from closest road
access = access.loc[access["distance_to_closest_node"]<=5000]

Notice: the previous step will remove most of the buildings from the islands of Saarenmaa and Hiiumaa, because those are not linked to the mainland network (requires taking a ferry).

  • Finally, to make things a bit more interesting, we can also classify the distances and calculate some statistics:

# Classify distances to every 2 km zones, specify that the upper boundary is 50 kilometer
# everything above this will be put into a same class
upper_boundary_distance = 50
width = 5

# Use self-defined classification
classifier = mc.UserDefined(access["distance_km"], bins=[x for x in range(0, int(upper_boundary_distance)+1, width)])
access["cls"] = access[["distance_km"]].apply(classifier)
# Replace the class numbers to distance categories (e.g. 0-2, 2-4 .. km etc.)
access["travel_distance"] = access["cls"].replace({k: v 
                                                   for k, v in 
                                                   zip([x for x in range(len(classifier.bins))], 
                                                       classifier.bins)})

# Set all values over 50 km as 51 (to improve how the colorscale works)
access.loc[access["travel_distance"] > upper_boundary_distance, "travel_distance"] = upper_boundary_distance + 1

# Convert the observation counts as percentages
classifier.counts = (classifier.counts / classifier.counts.sum()).round(3)

# Convert stats to dataframe
bins = classifier.bins.astype(int)
categories = np.vstack([bins[:-1], bins[1:]]).T
categories = [f"{low}-{high}" for low, high in categories]
access_zone_classes = pd.DataFrame({"Distance": categories, "% of buildings": classifier.counts[1:]})

Finally, we can plot the results as a map:

plt.style.use('dark_background')
ax = access.plot(column="travel_distance", markersize=0.5, legend=True, cmap="RdYlBu", figsize=(17,10))
# Adjust the map extent
ax.set_xlim(22.5, 30.5)
ax.set_ylim(57.4, 59.9)
# Add some useful info
ax.text(23, 59.8, f"Network distance (km) from {building_cnt} buildings to closest hospital (n={src_cnt}) in Estonia (mainland).", {"size": 14})
ax.text(28.3, 58.0, access_zone_classes.to_string(index=False), {"size": 14});
_images/e555f65076466073a906da742500aeb0b5a8f1bd3a07c32bdd16d155b8e5f2e9.png

The map shows the catchment areas of all hospitals in mainland Estonia based on network distance between buildings and the closest hospital. It is possible to spot the 5 kilometer distance zones around each hospital quite easily, and the summary statistics reveal that ~20 % of the buildings in Estonia are within 5 kilometer from the closest hospital, and ~50 % of the buildings in Estonia are within 15 kilometers from closest hospital (however, remember/see the caveats in the beginning of this section about the data quality and the rationale).

This example shows, how pyrosm can be used to conduct even national level analysis quite easily and fairly efficiently. The total time to run this tutorial took about 8.5 minutes (with a laptop having 16GB memory, SSD drive, and Intel Core i5-8250U CPU 1.6 GHZ). This included all steps from downloading the OSM data, parsing the required datasets (steets, buildings and hospitals), exporting the network to graph, conducting the network distance calculations and parsing the statistics (plotting was not taken into account).

Warning

Although pyrosm works fairly well with large datasets, conducting national level analyses still consume a fair amount of memory. Hence, be aware that you might end up having memory errors if you try to conduct national level analyses without sufficient RAM. Repeating this example with 16 GB of RAM should work well in smaller countries (~up to 150MB size of the PBF), but conducting similar analysis e.g. in Finland (PBF size ~539MB) would most likely end up with memory error. (the idea is to allow optimizing the memory usage in the future versions of pyrosm, see #87)

Calculate the job accessibility in Helsinki Region with pyrosm + pandarm#

Pandarm (the maintained, NumPy 2-compatible fork of pandana) is a Python network analysis library supported by pyrosm export functionality. Pandarm operates using a fast contraction hierarchies -algorithm and includes various useful functions. In this tutorial, we will see 1) how to calculate distance to nearest 5 restaurants for each network node in the Helsinki Region, and 2) how to calculate a simple job accessibility index for the region based on number of restaurant employees (simulated).

Note

pyrosm also still accepts graph_type="pandana", but pandana is deprecated because it is unmaintained and incompatible with NumPy 2 on Windows. Use "pandarm" (a drop-in replacement with the same API) instead.

from pyrosm import OSM, get_data
import numpy as np
import matplotlib.pyplot as plt
osm = OSM(get_data("helsinki"))
nodes, edges = osm.get_network(network_type="driving", nodes=True)
restaurants = osm.get_pois(custom_filter={"amenity": ["restaurant"]})
G = osm.to_graph(nodes, edges, graph_type="pandarm")
# For simplicity, ensure all restaurants are represented as points
restaurants["geometry"] = restaurants.centroid
restaurants = restaurants.dropna(subset=["lon", "lat"])
# Precompute distances up to 2000 meters
# Notice using long distance with large network can consume memory quite a bit
G.precompute(2000)

Find nearest X number of POIs for each network node#

After we have done precalculations with pandarm, it allows conducting performant queries, such as querying distances to X number of nearest restaurants for all network nodes.

  • The following example shows how to find the 5 closest restaurants from each network node (up to 2000 meter distance threshold):

# Attach restaurants to Pandana graph
G.set_pois(category="restaurants", maxdist=2000, maxitems=10, 
           x_col=restaurants.lon, y_col=restaurants.lat)
# For each node in the network find distances to 5 closest restaurants (up to 2000 meters)
nearest_five = G.nearest_pois(2000, "restaurants", num_pois=5)
nearest_five.tail(10)
1 2 3 4 5
8094713890 517.169983 1408.347046 1479.694946 1656.630005 1695.170044
1956624053 520.372009 1411.548950 1482.896973 1659.832031 1698.371948
8094751564 529.398010 1420.574951 1491.922974 1668.858032 1707.397949
8094751563 537.838013 1429.015015 1500.363037 1677.297974 1715.838013
1956624064 540.979004 1432.156006 1503.504028 1680.438965 1718.979004
223537823 554.515991 1445.692993 1517.041016 1693.975952 1732.515991
8094751562 550.070984 1441.248047 1512.595947 1689.531006 1728.071045
7747100203 549.541992 1440.718994 1512.067017 1689.001953 1727.541992
8094751561 549.187988 1440.364990 1511.713013 1688.647949 1727.187988
8094751560 545.604004 1436.781006 1508.129028 1685.063965 1723.604004

As a result, we get the distance to five nearest restaurants for each node (node-id is the index here).

Calculate the job accessibility#

Next we will use a useful pandarm function called aggregate that allows to calculate accessibility index based on a given numeric variable, such as employee count. Here, we won’t be using any real employee information, but assign a random number of employees for each restaurant (between 1-10). Then we will calculate a specific job accessibility index based on cumulative number of restaurant employees within 500 meters (from each network node). This kind of index can provide a nice overview of the (restaurant) job accessibility in a city.

# Assign a random number of workers for each restaurant (between 1-10)
restaurants["employee_cnt"] = np.random.choice([x for x in range(1,11)], size=len(restaurants))
# Find the closest node-id for each restaurant
node_ids = G.get_node_ids(restaurants.lon, restaurants.lat)
# Add employee counts to the graph
G.set(node_ids, variable=restaurants.employee_cnt, name="employee_cnt")
# Calculate the number of employees (cumulative sum) from each node up to 500 meters
result = G.aggregate(500, type="sum", decay="linear", name="employee_cnt")
result = result.to_frame(name="cum_employees")
# Attach the information from nodes
result = nodes.merge(result, left_on="id", right_on=result.index)
# Visualize the results
plt.style.use('dark_background')
ax = result.plot(column="cum_employees", cmap='BrBG', markersize=0.1, legend=True, figsize=(20, 20))
_images/6164ebcbac50556c5931d511afdf851ac2dc770be10157d56f2933031cd8aa79.png

As a result, we get a map showing the restaurant job accessibility in Helsinki Region. The highlighted colors show areas where there are high number of restaurant jobs/employees working (notice that this is not based on real data).