Customizing the queries#
As some of the examples in the basic Pyrosm tutorial shows, it is possible to customize the OSM parsing using a specific custom_filter parameter. This parameter is available for all reading methods, including get_network().
The custom_filter can be highly useful if you want to parse only certain type of OpenStreetMap elements from the PBF, such as “residential” buildings or specific shops such as “book” -shops.
In addition, Pyrosm allows the user to customize which attributes are parsed from the OSM elements into GeoDataFrame columns when parsing the data using the extra_tags -parameter. There is a specific set of default attributes that are always parsed from the OSM elements, but as OpenStreetMap is highly flexible in terms of what information can be associated with the data, this parameter makes it easy to parse some of the more “exotic” tags from the OSM.
Contents:
Constructing a custom filter#
Before diving into documentation about how to construct a custom filter, it is good to understand a bit how OpenStreetMap data is constructed. OpenStreetMap represents:
“physical features on the ground (e.g., roads or buildings) using tags attached to its basic data structures (its nodes, ways, and relations). Each tag describes a geographic attribute of the feature being shown by that specific node, way or relation” (OSM Wiki, 2020).
Pyrosm uses these tags to filter OSM data elements according specific predefined criteria which makes it possible to parse e.g. buildings or roads from the data. Passing your own custom_filter can be used to modify this process.
There are certain rules that comes to constructing the custom_filter. The filter should always be a Python dictionary where the key should be a string and the value should be a list of OSM tag-values matching the criteria defined by the user. The key should correspond to the key in OpenStreetMap tags (e.g. “building”) and the value-list should correspond the OSM values that are associated with the keys. You can see a long list of possible OSM keys and associated values from OSM Map Features wiki page.
As an example, a filter can look something like the one below which would parse all residential and retail buildings from the data:
{"building": ["residential", "retail"]}
This custom_filter can be used with get_buildings() or get_osm_by_custom_criteria() -function. With any other function, this filter does not have any effect on the results, as "building" tag is only associated with physical features representing buildings. Hence, if you would use this filter e.g. when parsing roads with get_network(), it wouldn’t do anything because none of the roads contain information about buildings (shouldn’t at least).
Let’s test:
from pyrosm import OSM, get_data
# Get test data
fp = get_data("test_pbf")
# Initialize the reader
osm = OSM(fp)
# Read buildings with custom filter
my_filter = {"building": ["residential", "retail"]}
buildings = osm.get_buildings(custom_filter=my_filter)
# Plot
title = "Filtered buildings: " + ", ".join(buildings["building"].unique())
ax = buildings.plot(column="building", cmap="RdBu", legend=True)
ax.set_title(title);
As we can see, as a result the data now only includes buildings that have residential or retail as a value for the key “building”.
Different kind of filters#
In some cases, such as when parsing Points of Interest (POI) from the PBF, it might be useful to e.g. parse all OSM features that are shops. If you want to parse all kind of shops (including all), it is possible to add True as a value in the custom_filter, such as in the case A below.
Example filters:
A:
custom_filter={"shop": True}B:
custom_filter={"shop": True, "tourism": True, "amenity": True, "leisure": True}C:
custom_filter={"shop": ["alcohol"], "tourism": True, "amenity": ["restaurant", "bar"], "leisure": ["dance"]}
All of the filters above produce slightly different results. The filter A would return all shops, B would return a broad selection of POIs including all data that relates to shops, tourism, amenities or leisure.
Filter C is very specific filter that might be used by someone in a party mood and being interested in knowing the shops selling alcohol, restaurants and bars, everything related to tourism and leisure activities related to dancing.
Let’s test:
Filter A#
from pyrosm import OSM, get_data
# Get test data
fp = get_data("helsinki_pbf")
# Initialize the reader
osm = OSM(fp)
# Read POIs with custom filter A
my_filter = {"shop": True}
pois = osm.get_pois(custom_filter=my_filter)
# Plot
ax = pois.plot(column="shop", legend=True, markersize=1, figsize=(14,6), legend_kwds=dict(loc='upper left', ncol=4, bbox_to_anchor=(1, 1)))
Filter B#
from pyrosm import OSM, get_data
# Get test data
fp = get_data("helsinki_pbf")
# Initialize the reader
osm = OSM(fp)
# Read POIs with custom filter B
my_filter={"shop": True, "tourism": True, "amenity": True, "leisure": True}
pois = osm.get_pois(custom_filter=my_filter)
# Merge poi type information into a single column
pois["shop"] = pois["shop"].fillna(' ')
pois["amenity"] = pois["amenity"].fillna(' ')
pois["leisure"] = pois["leisure"].fillna(' ')
pois["tourism"] = pois["tourism"].fillna(' ')
pois["poi_type"] = pois["amenity"] + pois["shop"] + pois["leisure"] + pois["tourism"]
# Plot
ax = pois.plot(column="poi_type", legend=True, markersize=1, figsize=(14,8), legend_kwds=dict(loc='upper left', ncol=6, bbox_to_anchor=(1, 1)))
Filter C#
from pyrosm import OSM, get_data
# Get test data
fp = get_data("helsinki_pbf")
# Initialize the reader
osm = OSM(fp)
# Read POIs with custom filter C
my_filter={"shop": ["alcohol"], "tourism": True, "amenity": ["restaurant", "bar"], "leisure": ["dance"]}
pois = osm.get_pois(custom_filter=my_filter)
# Merge poi type information into a single column
pois["shop"] = pois["shop"].fillna(' ')
pois["amenity"] = pois["amenity"].fillna(' ')
pois["leisure"] = pois["leisure"].fillna(' ')
pois["tourism"] = pois["tourism"].fillna(' ')
pois["poi_type"] = pois["amenity"] + pois["shop"] + pois["leisure"] + pois["tourism"]
# Plot
ax = pois.plot(column="poi_type", legend=True, markersize=4, figsize=(14,8), legend_kwds=dict(loc='upper left', ncol=2, bbox_to_anchor=(1, 1)))
As we can see from these examples. Using the custom_filter is an efficient way to customize what data is extracted from the OpenStreetMap data.
Regular expressions and Overpass-style filters#
The custom filters above match tag values exactly. Two opt-in forms let you match more flexibly. Both work anywhere a custom_filter is accepted (get_network, get_pois, get_buildings, get_data_by_custom_criteria, …), and an ordinary dictionary keeps behaving exactly as before.
Matching values with a regular expression#
OpenStreetMap values are often tagged inconsistently — the same road might appear as ref="I 20", ref="I 20;US 259", or under tiger:name_base="I-20". Pass a compiled regular expression (from re.compile(...)) as a value, and pyrosm matches it with re.search (a substring / pattern match) instead of requiring an exact string:
import re
from pyrosm import OSM, get_data
osm = OSM(get_data("copenhagen"))
# A compiled regex value is matched with re.search, so one value can cover several
# alternatives. Here it keeps any way whose highway value contains "footway" or "cycleway":
regex_filter = {"highway": [re.compile("footway|cycleway")]}
paths = osm.get_network(custom_filter=regex_filter, filter_type="keep")
print(f"{len(paths)} ways match the regex")
54548 ways match the regex
Overpass-style filters#
If you already know the Overpass QL tag-filter syntax — for example from OSMnx — you can pass a bracket-filter string, or a list of them. Each string is the AND of its brackets, and a list of strings is their OR. The supported operators are:
["key"="value"]— value equals;["key"!="value"]— value does not equal["key"~"regex"]— value matches a regex (re.search);["key"!~"regex"]— value does not match["key"]— key is present (any value);[!"key"]— key is absent
Append ,i to a ~ / !~ bracket for a case-insensitive match (e.g. ["name"~"oxford",i]). For example, to collect protected cycling infrastructure:
# Each string is the AND of its brackets; the list is their OR (Overpass-style).
cf = [
'["cycleway"~"track"]',
'["highway"~"cycleway"]',
'["highway"~"path"]["bicycle"~"designated"]',
'["cycleway:right"~"track"]',
'["cycleway:left"~"track"]',
'["cycleway:both"~"track"]',
'["cyclestreet"]',
'["highway"~"living_street"]',
]
cycle_infra = osm.get_network(network_type="cycling", custom_filter=cf)
cycle_infra.plot()
print(f"{len(cycle_infra)} cycling edges")
16149 cycling edges
A few things worth knowing:
An advanced filter selects candidate elements by the keys it mentions, so a
get_networkfilter is no longer limited tohighway— you can build arailwayorcyclewaynetwork, for instance.For an advanced filter,
get_network’sfilter_typedefaults tokeep(the Overpass/OSMnx union semantics); the predefinednetwork_typefilters still default toexclude. Passfilter_typeexplicitly to override.Only the Overpass tag-filter bracket subset is supported, not the full Overpass query language (no
area, recursion, or output statements). For SQL-style querying, read the layer into a GeoDataFrame and query it with your tool of choice (e.g. DuckDB).
Advanced filtering#
If the above methods do not meet your needs, pyrosm provides a method get_data_by_custom_criteria() to fully customize what kind of data will be parsed from the OSM PBF, and how the filtering is conducted. The method provides possibility to specify what kind of OSM elements are parsed (nodes, ways, relations, or any combination of these) and it also provides possibility to determine whether the specified filter should be used to "keep” the data or "exclude" the data from OSM.
Let’s start by looking at the help:
from pyrosm import OSM, get_data
fp = get_data("helsinki_pbf")
# Initialize the reader
osm = OSM(fp)
help(osm.get_data_by_custom_criteria)
Help on method get_data_by_custom_criteria in module pyrosm.pyrosm:
get_data_by_custom_criteria(
custom_filter=None,
osm_keys_to_keep=None,
filter_type='keep',
tags_as_columns=None,
keep_nodes=True,
keep_ways=True,
keep_relations=True,
extra_attributes=None,
keep_other_tags=True,
timestamp=None
) method of pyrosm.pyrosm.OSM instance
Parse OSM data based on custom criteria.
Parameters
----------
custom_filter : dict (optional)
A custom filter to filter only specific elements from OpenStreetMap.
If ``None`` (the default), every tagged element is returned without
key/value filtering (tagged nodes, ways, and relations); standalone
ways with no tags are dropped. Reading everything is memory-heavy on
large extracts, so prefer a pre-filtered PBF and/or a bounding box.
``filter_type`` is ignored in this mode.
osm_keys_to_keep : str | list
A filter to specify which OSM keys should be kept.
filter_type : str
"keep" | "exclude"
Whether the filters should be used to keep or exclude the data from OSM.
tags_as_columns : list
Which tags should be kept as columns in the resulting GeoDataFrame.
keep_nodes : bool
Whether or not the nodes should be kept in the resulting GeoDataFrame if they are found.
keep_ways : bool
Whether or not the ways should be kept in the resulting GeoDataFrame if they are found.
keep_relations : bool
Whether or not the relations should be kept in the resulting GeoDataFrame if they are found.
extra_attributes : list (optional)
Additional OSM tag keys that will be converted into columns in the resulting GeoDataFrame.
keep_other_tags : bool
By default (``True``) every tag is parsed: ``tags_as_columns`` become their own
columns and the rest are kept in a JSON ``tags`` column. ``False`` resolves only
the requested tags (``tags_as_columns`` plus the filter keys) and drops the JSON
``tags`` column, so the read does minimal tag work (a stray tag literally keyed
``id`` is not surfaced as ``id_tag`` in this mode). Only supported by the
out-of-core engine (``OSM(..., engine='out_of_core')``).
timestamp: str | datetime | int
If provided, the data from given moment of time will be returned. The time should be provided in UTC.
Note: This functionality only works with OSH.PBF files that can be downloaded manually e.g. from Geofabrik
(requires login with OSM account).
The logic: the closest version of each element up to given timestamp will be selected to the result.
This means that elements can be older than the given timestamp (the most up-to-date version is selected),
but not newer (records having exactly the selected timestamp will be kept). In case only a date is given,
the time will represent midnight of the given day, such as "2021-01-01 00:00:00".
As we can see, the function contains more parameters than any of the other functions.
The first two parameters custom_filter and osm_keys_to_keep can be used to filter the data on a OSM tag level.
Pyrosm implements a data filtering system that works on two levels.
osm_keys_to_keep-parameter can be used to specify which kind of OSM elements should be considered as “valid” records for further filtering (i.e. a first level of filtering). For instance, by specifyingosm_keys_to_keep="highway"tells the filtering algorithm to only consider OSM elements representing roads for further filtering. You can also pass multiple keys to this parameter inside a list, such asosm_keys_to_keep=["amenity", "shop"], which would pass all OSM elements containing “amenity” and “shop” tag-keys for further consideration in the second level of filtering.custom_filter-parameter specifies the second level of filtering that can be used to specify more specifically what kind of OSM elements are accepted for the final GeoDataFrame, such as{"amenity": ["restaurant", "bar"]}. See more details above.
Remarks
Notice that osm_keys_to_keep is an optional parameter, and by default the keys are parsed directly from the custom_filter dictionary (the keys of it). However, there are cases when it is useful to specify the osm_keys_to_keep yourself. For example, if you are interested to parse schools from the data you could use custom_filter={"amenity": ["school"]}. By default, this would parse all amenities that have a tag "school". However, if you would be interested to find only buildings that are tagged as schools you could use a combination of the two filters:
osm_keys_to_keep="building"custom_filter={"amenity": ["school"]}
The osm_keys_to_keep -parameter takes care that only such OSM elements that have a tag "building" are considered for further filtering, and then the custom_filter takes care that from buildings only such rows that have been tagged as "school" will be accepted to the final result.
Let’s try this out:
from pyrosm import OSM, get_data
osm = OSM(get_data("helsinki_region_pbf"))
# Create a custom filter that finds all schools from the data
custom_filter = {"amenity": ["school"]}
# Specify that you are only interested in such
# elements that have been tagged as buildings
osm_keys_to_keep = ["building"]
# Parse the data
schools_that_are_buildings = osm.get_data_by_custom_criteria(osm_keys_to_keep=osm_keys_to_keep,
custom_filter=custom_filter)
print("Number of schools that have been tagged as buildings:", len(schools_that_are_buildings))
# ============
# Comparison
# ============
# For comparison, let's parse all schools without the requirement of being a building
# i.e. we do not use the 'osm_keys_to_keep' parameter at all
all_schools = osm.get_data_by_custom_criteria(custom_filter=custom_filter)
print("Number of schools altogether:", len(all_schools))
Downloaded Protobuf data 'Helsinki_region.osm.pbf' (34.99 MB) to:
'/private/var/folders/f2/pgp09jl542zffhtrt2hx8zhh0000gp/T/pyrosm/Helsinki_region.osm.pbf'
Number of schools that have been tagged as buildings: 72
Number of schools altogether: 512
As the results show, there are 72 buildings tagged as schools in the data. This is quite much fewer than the number of all schools existing in the data. Following this principle, it is possible to make highly customized queries.
Controlling which OSM element types are returned#
It is also possible to determine what kind of OSM elements are returned to the final GeoDataFrame. By default the get_data_by_custom_criteria() returns all elements, i.e. nodes, ways and relations.
Let’s continue from the previous example and assume that you would be interested to find out all schools that are Polygons. There are different ways to filter such data (e.g. utilizing the geom_type attribute of a GeoDataFrame), however, one way that his can be accomplished is to filter out such OSM elements that are nodes (i.e. points).
You can easily control the type OSM elements that will be returned by using parameters:
keep_nodeskeep_wayskeep_relations
By default all of these parameters are specified as True. However, if for example specify nodes=False, pyrosm will return only ways and relations but skip nodes.
Let’s test this by filtering the schools that are nodes:
# Continuing from the previous example ..
# Parse all schools that are not nodes
custom_filter = {"amenity": ["school"]}
# Pass keep_nodes=False to filter out nodes
schools_that_are_not_nodes = osm.get_data_by_custom_criteria(custom_filter=custom_filter,
keep_nodes=False)
print("Number of schools that are not nodes:", len(schools_that_are_not_nodes))
Number of schools that are not nodes: 421
Now we have only 421 schools left (from 512).
Let’s take a look of the geometry types:
# Add information about the geometry type
schools_that_are_not_nodes["geom_type"] = schools_that_are_not_nodes.geometry.geom_type
# Print the geom types
schools_that_are_not_nodes["geom_type"].unique()
<ArrowStringArray>
['Polygon', 'MultiPolygon']
Length: 2, dtype: str
Great! Now we only have Polygons and MultiPolygons in the data.
Note: The way OSM element does not necessary mean that the geometries will be Polygons. They can be also LineStrings (depends on what kind of OSM data is parsed). Hence, if you need to parse OSM data based on geometry type, it is safer to use the GeoDataFrame.geometry.geom_type function (as above) and select the rows using Pandas.
keep vs exclude data with custom filters#
Pyrosm get_data_by_custom_criteria() makes it possible to also filter out records based on certain criteria. With parameter filter_type you specify whether the filters should be used as a criteria for keeping the records or excluding them.
One example of when using filter_type="exclude" can be useful, is for example when filtering specific roads from the OSM data. In fact, the get_network() function works exactly in such a way.
As an example of the excluding filter, let’s create a custom filter that parses all the cycling roads from OSM in a similar manner as is done by get_network(“cycling”):
from pyrosm import OSM, get_data
# When we want to keep only roads we want to only include data having "highway" tag (i.e. a road)
# we can pass osm_keys_to_keep: This is a "first level" of filtering
osm_keys_to_keep = "highway"
# Second level of filtering is done by passing our custom filter:
custom_filter = dict(
# Areas are not parsed for networks by default
area=['yes'],
# OSM "highway" elements that have these tags, cannot be cycled
highway=['footway', 'steps', 'corridor', 'elevator', 'escalator', 'motor', 'proposed',
'construction', 'abandoned', 'platform', 'raceway', 'motorway', 'motorway_link'],
# If specifically said that cycling is not allowed, exclude such
bicycle=['no'],
# Do not include private roads
service=['private']
)
# In this case we want to EXCLUDE all the rows that have tags matching the criteria above
filter_type = "exclude"
# Run and get all cycling roads
osm = OSM(get_data("test_pbf"))
cycling = osm.get_data_by_custom_criteria(custom_filter=custom_filter,
osm_keys_to_keep=osm_keys_to_keep,
filter_type=filter_type)
cycling.plot()
<Axes: >
Now we have filtered all the roads that can be cycled. This corresponds to the one procuded by default with get_network("cycling"):
cycling2 = osm.get_network("cycling")
cycling2.plot()
<Axes: >