Working with graphs#

This notebook gives a few examples how Pyrosm can be used together with Python network analysis libraries. Before starting, ensure that you have read the basic documentation about how to export graphs with Pyrosm.

Contents:

  1. Find shortest path with pyrosm + networkx + osmnx

  2. Calculate the catchment areas of hospitals in Estonia with pyrosm + igraph.

  3. Calculate the job accessibility in Helsinki Region with pyrosm + pandana

Note

  1. The idea of this page is not to give a comprehensive overview of all the possibilities that you can do with these network analysis libraries, but to just give you a bit of idea what can be done. You can get more detailed and further information from the documentation of the packages themselves.

  2. Integrating routing or network analysis functionalities is not planned for Pyrosm. There is a separate Python project called cafein that will be built on top of pyrosm including different routing and network analysis functionalities (will become available in 2021).

Find shortest path with Pyrosm + NetworkX + OSMnx#

This tutorial shows how to construct simple shortest path routing between selected source and destination addresses using NetworkX and OSMnx. Pyrosm is used for extracting the data from OSM PBF file and constructing the graph, while NetworkX provides the basic network analysis capabilities and OSMnx provides many useful functionalities that makes it easy to work with street networks.

from pyrosm import OSM, get_data
import osmnx as ox

# Initialize the reader
osm = OSM(get_data("helsinki_pbf"))

# Get all walkable roads and the nodes 
nodes, edges = osm.get_network(nodes=True)

# Check first rows in the edge 
edges.head()
access bicycle bridge cycleway est_width foot footway highway int_ref lanes ... width id timestamp version tags osm_type geometry u v length
0 None None None None None None None unclassified None 2 ... None 4236349 1380031970 21 {"name:fi":"Erottajankatu","name:sv":"Skillnad... way LINESTRING (24.94327 60.16651, 24.94337 60.16644) 1372477605 292727220 9.370
1 None None None None None None None unclassified None 2 ... None 4236349 1380031970 21 {"name:fi":"Erottajankatu","name:sv":"Skillnad... way LINESTRING (24.94337 60.16644, 24.94340 60.16641) 292727220 2394117042 4.499
2 None None None None None None None unclassified None 2 ... None 4243035 1543430213 12 {"name:fi":"Korkeavuorenkatu","name:sv":"H\u00... way LINESTRING (24.94567 60.16767, 24.94567 60.16763) 296250563 2049084195 4.174
3 None None None None None None None unclassified None 2 ... None 4243035 1543430213 12 {"name:fi":"Korkeavuorenkatu","name:sv":"H\u00... way LINESTRING (24.94567 60.16763, 24.94569 60.16744) 2049084195 60072359 21.692
4 None None None None None None None unclassified None 2 ... None 4243035 1543430213 12 {"name:fi":"Korkeavuorenkatu","name:sv":"H\u00... way LINESTRING (24.94569 60.16744, 24.94571 60.16726) 60072359 6100704327 19.083

5 rows × 33 columns

Now we can easily export the nodes and edges into a directed NetworkX graph, as shown in the basic documentation:

# Create NetworkX graph
G = osm.to_graph(nodes, edges, graph_type="networkx")
G
<networkx.classes.multidigraph.MultiDiGraph at 0x7fa338335760>

As we can see the output is now a NetworkX MultiDiGraph.

By default, the graph is exported in such a way that you can continue your analysis using OSMnx library that has many useful functions for analyzing and visualizing street networks.

# Plot the graph with OSMnx
ox.plot_graph(G)
_images/057a8b7ceff8921aed47766cff7a4e68f8b035ac9a079afea5dd4d8fd9ede930.png
(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)

Note

When exporting to networkx graph, Pyrosm will by default change names of a few variables:

  • id --> osmid,

  • lon --> x,

  • lat --> y

Also a key column will be added to the edge attributes. This makes it possible to use OSMnx straight away when you export the data. You can distable this behavior by using osmnx_compatible=False in to_graph function.

Calculate shortest paths#

Let’s first see how we can now calculate shortest path between two addresses using networkx and osmnx:

source_address = "Bulevardi 5, Helsinki"
target_address = "Unioninkatu 40, Helsinki"

source = ox.geocode(source_address)
target = ox.geocode(target_address)
# Source and target are now points tuple of (x, y) coordinates
print(source)
print(target)
(60.1651049, 24.9387954)
(60.1726635, 24.949283641875)

Now we need to find the nearest graph nodes for our locations:

# Find the closest nodes from the graph
source_node = ox.get_nearest_node(G, source)
target_node = ox.get_nearest_node(G, target)
# Check the nodeids of the source/target node
print(source_node)
print(target_node)
537519894
1012307807

Now it is very easy to calculate the shortest path between those two addresses based on length using networkx function shortest_path(). OSMnx provides very handy function to plot our result on top of the graph:

# Find shortest path (by distance)
import networkx as nx
route = nx.shortest_path(G, source_node, target_node, weight="length")
fig, ax = ox.plot_graph_route(G, route, route_linewidth=6, node_size=0, bgcolor='k')
_images/fdc88d641b2e16edfdc6d3a260c933c43d0de8f3e414ab9281c03ac22bd92794.png

That’s it! Following this approach it is easy to find shortest paths between selected locations.

You can read many more examples from OSMnx examples gallery.

Note

Although NetworkX is very easy to use, it tends to use quite a lot of memory and it is relatively slow with large networks. Hence, the default output graph for pyrosm is iGraph, which performs much better and consumes less memory.

Calculate the catchment areas of hospitals in Estonia with pyrosm + igraph#

Analyzing the catchment areas of specific services (such as hospitals) is a typical example where large scale spatial network analysis is used. In this example, we will see how pyrosm can be used together with igraph (+ a few other libraries) to conduct large scale network analysis and find the closest hospital for each building in (mainland) Estonia. The whole process should take less than 10 minutes with computer having 16GB of available RAM. The end result will look something like following:

Catchment areas of mainland hospitals in Estonia

Warning

As OpenStreetMap is a voluntarily maintained data source, the hospitals extracted from OSM might not be up-to-date and some of them might be missing or otherwise incorrect. The hospitals represented in this example can be of any kind (not resticted e.g. to acute care hospitals). Lastly, it would make more sense to assess the number of inhabitants instead of buildings (after all, it’s the people who need care, not the buildings). HOWEVER, the main purpose of this demo is just to give you an idea what can be done, not to make a scientific assessment of health care accessibility in Estonia. If you find a mistake in the OSM data, please contribute and make it better by suggesting an edit at www.openstreetmap.org/

  • Let’s start by defining a few helper functions for our analysis:

from pyrosm import OSM, get_data
import geopandas as gpd
import pandas as pd
from sklearn.neighbors import BallTree
import numpy as np
import mapclassify as mc
import matplotlib.pyplot as plt
import time

def get_igraph_nodes(G):
    """Retrieves a frame from nodes of the igaph"""
    attributes = G.vs.attribute_names()
    if len(attributes) == 0:
        raise ValueError("Nodes does not have data.")
    data = {name: G.vs[name] for name in attributes}
    if "geometry" in attributes:
        return gpd.GeoDataFrame(data, 
                                geometry="geometry", 
                                crs="epsg:4326")
    return pd.DataFrame(data)

def get_nearest(src_points, candidates, k_neighbors=1):
    """Find nearest neighbors for all source points from a set of candidate points"""
    tree = BallTree(candidates, leaf_size=15, metric='haversine')
    distances, indices = tree.query(src_points, k=k_neighbors)
    distances = distances.transpose()
    indices = indices.transpose()
    closest = indices[0]
    closest_dist = distances[0]
    return (closest, closest_dist)

def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
    """
    For each point in left_gdf, find closest point in right GeoDataFrame and return them.
    
    For further info, take a look this lesson: 
        https://autogis-site.readthedocs.io/en/latest/notebooks/L3/06_nearest-neighbor-faster.html
    """
    left_geom_col = left_gdf.geometry.name
    right_geom_col = right_gdf.geometry.name
    right = right_gdf.copy().reset_index(drop=True)
    left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: 
                                                          (geom.x * np.pi / 180, geom.y * np.pi / 180)
                                                         ).to_list())
    right_radians = np.array(right[right_geom_col].apply(lambda geom: 
                                                         (geom.x * np.pi / 180, geom.y * np.pi / 180)
                                                        ).to_list())
    closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
    closest_points = right.loc[closest]
    closest_points = closest_points.reset_index(drop=True)
    if return_dist:
        earth_radius = 6371000  # meters
        closest_points['distance'] = dist * earth_radius
    return closest_points

def find_nearest_nodeids(nodes, src_gdf):
    """Finds the nearest node-ids for all points in 'src_gdf'."""
    nearest = nearest_neighbor(src_gdf, nodes, return_dist=True)
    return list(set(nearest["node_id"].values)), nearest["distance"].values
  • Retrieve the network, buildings and hospitals for Estonia:

# Get the data
osm = OSM(get_data("estonia"))
nodes, edges = osm.get_network(nodes=True, network_type="driving")
hospitals = osm.get_pois({"amenity": ["hospital"]})
buildings = osm.get_buildings()

# Track the number of buildings
building_cnt = len(buildings)
  • Create the directed graph for Estonia:

# Create graph
G = osm.to_graph(nodes, edges)
  • Get nodes from the graph

    • This step needs to be done because some of the nodes originally parsed from the street network are likely to be dropped out when the graph is generated (because unconnected edges are removed). Note: you can keep the whole network by specifying retain_all=True when calling to_graph().

nodes = get_igraph_nodes(G)
  • Find closest network node for each hospital:

# Ensure that all hospitals are represented as a point (take centroid)
# Warning: using a centroid is not necessarily a smart thing if doing the analysis for real, 
# but for the sake of simplicity we assume that the centroid of the hospital polygon 
# is the destination to be reached.
hospitals["geometry"] = hospitals.centroid

# Get a list of nearest source-ids of the network to all hospitals (with distance)
# Warning: Using the nearest node (in stead of nearest edge) can produce slightly incorrect results, 
# but for the sake of simplicity we use nodes.
src_ids, euclidean_distance = find_nearest_nodeids(nodes, hospitals)
  • Calculate travel distances to all hospitals in Estonia:

# Keep track of how many new columns are inserted to the frame
src_cnt = len(src_ids)

# Iterate over hospitals and calculate the network distances
# Note: this could also be run by passing all `src_ids` at once 
# to the shortest_paths_dijkstra -function. 
for src_id, distance_to_closest_node in zip(src_ids, euclidean_distance):
    #print(f"Calculate shortest paths to {src_id}")
    
    # Calculate shortest path lengths to given locations
    path_lengths = G.shortest_paths_dijkstra(source=src_id, weights="length", mode="IN")
    
    # Add the euclidean distance between the hospital and closest node in the network 
    # to keep track of the door-to-door travel distance
    path_lengths = np.array(path_lengths) + distance_to_closest_node

    # Attach the path lenghts to nodes
    nodes[f"distance_to_{src_id}"] = path_lengths[0]

At this point, we have distance from all nodes in the network to each hospital. They are stored as separate columns in the nodes GeoDataFrame.

  • Next, we calculate which of the hospitals is the closest one for each node:

# Calculate distance to the closest hospital
nodes["distance_to_closest"] = nodes.iloc[:, -src_cnt:].min(axis=1)

# At this point you could already see the catchment areas from each road network node 
# to the closest hospital.
# Comment out the following if you want to see:
# ax = nodes.plot(column="distance_to_closest", cmap="RdYlBu", markersize=0.5)

At this point, we have calculated the catchments for each hospital, but those are not linked yet to buildings.

  • Let’s find the closest node for each building and keep track of the distance between them:

# Find closest node for each building (centroid)
# Warning: Again, taking a centroid of a building might not be the smartest thing to do for real..
# But for the sake of simplicity, we do it such a way
buildings["centroid"] = buildings.centroid
buildings = buildings.set_geometry("centroid")

# Note: With ~900000 x ~830000 point pairs, 
# this takes awhile to solve. 
# Also keep track of the distance between the points.
closest = nearest_neighbor(buildings, nodes, return_dist=True)
buildings["node_id"] = closest["node_id"]
buildings["distance_to_closest_node"] = closest["distance"]
  • Now we can associate the distance information to closest hospital into the buildings:

# Link the distance information
access = buildings.merge(nodes[["distance_to_closest", "node_id"]], on="node_id")

# Add the (Euclidean) distance between building 
# and the closest node in the road network to get a full "door-to-door" distance
access["distance"] = access["distance_to_closest"] + access["distance_to_closest_node"]

# Calculate distance in kilometers (meters by default)
access["distance_km"] = (access["distance"] / 1000).round(1)
  • Some of the buildings might be very far from the closest road. Drop buildings that are further than 5km from the road:

# Drop such buildings that are further than 5 km from closest road
access = access.loc[access["distance_to_closest_node"]<=5000]

Notice: the previous step will remove most of the buildings from the islands of Saarenmaa and Hiiumaa, because those are not linked to the mainland network (requires taking a ferry).

  • Finally, to make things a bit more interesting, we can also classify the distances and calculate some statistics:

# Classify distances to every 2 km zones, specify that the upper boundary is 50 kilometer
# everything above this will be put into a same class
upper_boundary_distance = 50
width = 5

# Use self-defined classification
classifier = mc.UserDefined(access["distance_km"], bins=[x for x in range(0, int(upper_boundary_distance)+1, width)])
access["cls"] = access[["distance_km"]].apply(classifier)
# Replace the class numbers to distance categories (e.g. 0-2, 2-4 .. km etc.)
access["travel_distance"] = access["cls"].replace({k: v 
                                                   for k, v in 
                                                   zip([x for x in range(len(classifier.bins))], 
                                                       classifier.bins)})

# Set all values over 50 km as 51 (to improve how the colorscale works)
access.loc[access["travel_distance"] > upper_boundary_distance, "travel_distance"] = upper_boundary_distance + 1

# Convert the observation counts as percentages
classifier.counts = (classifier.counts / classifier.counts.sum()).round(3)

# Convert stats to dataframe
bins = classifier.bins.astype(int)
categories = np.vstack([bins[:-1], bins[1:]]).T
categories = [f"{low}-{high}" for low, high in categories]
access_zone_classes = pd.DataFrame({"Distance": categories, "% of buildings": classifier.counts[1:]})

Finally, we can plot the results as a map:

plt.style.use('dark_background')
ax = access.plot(column="travel_distance", markersize=0.5, legend=True, cmap="RdYlBu", figsize=(17,10))
# Adjust the map extent
ax.set_xlim(22.5, 30.5)
ax.set_ylim(57.4, 59.9)
# Add some useful info
ax.text(23, 59.8, f"Network distance (km) from {building_cnt} buildings to closest hospital (n={src_cnt}) in Estonia (mainland).", {"size": 14})
ax.text(28.3, 58.0, access_zone_classes.to_string(index=False), {"size": 14});
_images/e555f65076466073a906da742500aeb0b5a8f1bd3a07c32bdd16d155b8e5f2e9.png

The map shows the catchment areas of all hospitals in mainland Estonia based on network distance between buildings and the closest hospital. It is possible to spot the 5 kilometer distance zones around each hospital quite easily, and the summary statistics reveal that ~20 % of the buildings in Estonia are within 5 kilometer from the closest hospital, and ~50 % of the buildings in Estonia are within 15 kilometers from closest hospital (however, remember/see the caveats in the beginning of this section about the data quality and the rationale).

This example shows, how pyrosm can be used to conduct even national level analysis quite easily and fairly efficiently. The total time to run this tutorial took about 8.5 minutes (with a laptop having 16GB memory, SSD drive, and Intel Core i5-8250U CPU 1.6 GHZ). This included all steps from downloading the OSM data, parsing the required datasets (steets, buildings and hospitals), exporting the network to graph, conducting the network distance calculations and parsing the statistics (plotting was not taken into account).

Warning

Although pyrosm works fairly well with large datasets, conducting national level analyses still consume a fair amount of memory. Hence, be aware that you might end up having memory errors if you try to conduct national level analyses without sufficient RAM. Repeating this example with 16 GB of RAM should work well in smaller countries (~up to 150MB size of the PBF), but conducting similar analysis e.g. in Finland (PBF size ~539MB) would most likely end up with memory error. (the idea is to allow optimizing the memory usage in the future versions of pyrosm, see #87)

Calculate the job accessibility in Helsinki Region with pyrosm + pandana#

Pandana is a third Python network analysis library supported by pyrosm export functionality at the time of writing. Pandana operates using a fast contraction hierarchies -algorithm and includes various useful functions. In this tutorial, we will see 1) how to calculate distance to nearest 5 restaurants for each network node in the Helsinki Region, and 2) how to calculate a simple job accessibility index for the region based on number of restaurant employees (simulated).

from pyrosm import OSM, get_data
import numpy as np
import matplotlib.pyplot as plt
osm = OSM(get_data("helsinki"))
nodes, edges = osm.get_network(network_type="driving", nodes=True)
restaurants = osm.get_pois(custom_filter={"amenity": ["restaurant"]})
G = osm.to_graph(nodes, edges, graph_type="pandana")
# For simplicity, ensure all restaurants are represented as points
restaurants["geometry"] = restaurants.centroid
restaurants = restaurants.dropna(subset=["lon", "lat"])
# Precompute distances up to 2000 meters
# Notice using long distance with large network can consume memory quite a bit
G.precompute(2000)

Find nearest X number of POIs for each network node#

After we have done precalculations with pandana, it allows conducting performant queries, such as querying distances to X number of nearest restaurants for all network nodes.

  • The following example shows how to find the 5 closest restaurants from each network node (up to 2000 meter distance threshold):

# Attach restaurants to Pandana graph
G.set_pois(category="restaurants", maxdist=2000, maxitems=10, 
           x_col=restaurants.lon, y_col=restaurants.lat)
# For each node in the network find distances to 5 closest restaurants (up to 2000 meters)
nearest_five = G.nearest_pois(2000, "restaurants", num_pois=5)
nearest_five.tail(10)
1 2 3 4 5
8094713890 517.169983 1408.347046 1479.694946 1656.630005 1695.170044
1956624053 520.372009 1411.548950 1482.896973 1659.832031 1698.371948
8094751564 529.398010 1420.574951 1491.922974 1668.858032 1707.397949
8094751563 537.838013 1429.015015 1500.363037 1677.297974 1715.838013
1956624064 540.979004 1432.156006 1503.504028 1680.438965 1718.979004
223537823 554.515991 1445.692993 1517.041016 1693.975952 1732.515991
8094751562 550.070984 1441.248047 1512.595947 1689.531006 1728.071045
7747100203 549.541992 1440.718994 1512.067017 1689.001953 1727.541992
8094751561 549.187988 1440.364990 1511.713013 1688.647949 1727.187988
8094751560 545.604004 1436.781006 1508.129028 1685.063965 1723.604004

As a result, we get the distance to five nearest restaurants for each node (node-id is the index here).

Calculate the job accessibility#

Next we will use a useful pandana function called aggregate that allows to calculate accessibility index based on a given numeric variable, such as employee count. Here, we won’t be using any real employee information, but assign a random number of employees for each restaurant (between 1-10). Then we will calculate a specific job accessibility index based on cumulative number of restaurant employees within 500 meters (from each network node). This kind of index can provide a nice overview of the (restaurant) job accessibility in a city.

# Assign a random number of workers for each restaurant (between 1-10)
restaurants["employee_cnt"] = np.random.choice([x for x in range(1,11)], size=len(restaurants))
# Find the closest node-id for each restaurant
node_ids = G.get_node_ids(restaurants.lon, restaurants.lat)
# Add employee counts to the graph
G.set(node_ids, variable=restaurants.employee_cnt, name="employee_cnt")
# Calculate the number of employees (cumulative sum) from each node up to 500 meters
result = G.aggregate(500, type="sum", decay="linear", name="employee_cnt")
result = result.to_frame(name="cum_employees")
# Attach the information from nodes
result = nodes.merge(result, left_on="id", right_on=result.index)
# Visualize the results
plt.style.use('dark_background')
ax = result.plot(column="cum_employees", cmap='BrBG', markersize=0.1, legend=True, figsize=(20, 20))
_images/6164ebcbac50556c5931d511afdf851ac2dc770be10157d56f2933031cd8aa79.png

As a result, we get a map showing the restaurant job accessibility in Helsinki Region. The highlighted colors show areas where there are high number of restaurant jobs/employees working (notice that this is not based on real data).