PBFReadParse.read_pbf

classmethod PBFReadParse.read_pbf(pbf_pathname, readable=True, expand=False, parse_geometry=False, parse_properties=False, parse_other_tags=False, number_of_chunks=None, max_tmpfile_size=5000, **kwargs)[source]

Parse a PBF data file (by GDAL).

Parameters:
  • pbf_pathname (str) – pathname of a PBF data file

  • readable (bool) – whether to parse each feature in the raw data, defaults to False

  • expand (bool) – whether to expand dict-like data into separate columns, defaults to False

  • parse_geometry (bool) – whether to represent the 'geometry' field in a shapely.geometry format, defaults to False

  • parse_properties (bool) – whether to represent the 'properties' field in a tabular format, defaults to False

  • parse_other_tags (bool) – whether to represent a 'other_tags' (of 'properties') in a dict format, defaults to False

  • number_of_chunks (int | None) – number of chunks, defaults to None

  • max_tmpfile_size (int | None) – maximum size of the temporary file, defaults to None; when max_tmpfile_size=None, it defaults to 5000

  • kwargs – [optional] parameters of the function pyhelpers.settings.gdal_configurations()

Returns:

parsed OSM PBF data

Return type:

dict

Note

The GDAL/OGR drivers categorizes the features of OSM PBF data into five layers:

  • 0: ‘points’ - “node” features having significant tags attached

  • 1: ‘lines’ - “way” features being recognized as non-area

  • 2: ‘multilinestrings’ - “relation” features forming a multilinestring (type=’multilinestring’ / type=’route’)

  • 3: ‘multipolygons’ - “relation” features forming a multipolygon (type=’multipolygon’ / type=’boundary’), and “way” features being recognized as area

  • 4: ‘other_relations’ - “relation” features not belonging to the above 2 layers

For more information, please refer to OpenStreetMap XML and PBF.

Warning

  • Parsing large PBF data files (e.g. > 50MB) can be time-consuming!

  • The function read_osm_pbf() may require fairly high amount of physical memory to parse large files, in which case it would be recommended that number_of_chunks is set to be a reasonable value.

Examples:

>>> from pydriosm.reader import PBFReadParse
>>> from pydriosm.downloader import GeofabrikDownloader
>>> from pyhelpers.dirs import delete_dir
>>> import os

>>> # Download the PBF data file of 'Rutland' as an example
>>> subrgn_name = 'rutland'
>>> file_format = ".pbf"
>>> dwnld_dir = "tests\osm_data"

>>> gfd = GeofabrikDownloader()

>>> gfd.download_osm_data(subrgn_name, file_format, dwnld_dir, verbose=True)
To download .osm.pbf data of the following geographic (sub)region(s):
    Rutland
? [No]|Yes: yes
Downloading "rutland-latest.osm.pbf"
    to "tests\osm_data\rutland\" ... Done.

>>> rutland_pbf_path = gfd.data_paths[0]
>>> os.path.relpath(rutland_pbf_path)
'tests\osm_data\rutland\rutland-latest.osm.pbf'

>>> # Read the downloaded PBF data
>>> rutland_pbf = PBFReadParse.read_pbf(rutland_pbf_path)
>>> type(rutland_pbf)
dict
>>> list(rutland_pbf.keys())
['points', 'lines', 'multilinestrings', 'multipolygons', 'other_relations']

>>> rutland_pbf_points = rutland_pbf['points']
>>> rutland_pbf_points.head()
0    {'type': 'Feature', 'geometry': {'type': 'Poin...
1    {'type': 'Feature', 'geometry': {'type': 'Poin...
2    {'type': 'Feature', 'geometry': {'type': 'Poin...
3    {'type': 'Feature', 'geometry': {'type': 'Poin...
4    {'type': 'Feature', 'geometry': {'type': 'Poin...
Name: points, dtype: object

>>> # Set `expand` to be `True`
>>> pbf_0 = PBFReadParse.read_pbf(rutland_pbf_path, expand=True)
>>> type(pbf_0)
dict
>>> list(pbf_0.keys())
['points', 'lines', 'multilinestrings', 'multipolygons', 'other_relations']
>>> pbf_0_points = pbf_0['points']
>>> pbf_0_points.head()
         id  ...                                         properties
0    488432  ...  {'osm_id': '488432', 'name': None, 'barrier': ...
1    488658  ...  {'osm_id': '488658', 'name': 'Tickencote Inter...
2  13883868  ...  {'osm_id': '13883868', 'name': None, 'barrier'...
3  14049101  ...  {'osm_id': '14049101', 'name': None, 'barrier'...
4  14558402  ...  {'osm_id': '14558402', 'name': None, 'barrier'...
[5 rows x 3 columns]

>>> pbf_0_points['geometry'].head()
0    {'type': 'Point', 'coordinates': [-0.5134241, ...
1    {'type': 'Point', 'coordinates': [-0.5313354, ...
2    {'type': 'Point', 'coordinates': [-0.7229332, ...
3    {'type': 'Point', 'coordinates': [-0.7249816, ...
4    {'type': 'Point', 'coordinates': [-0.7266581, ...
Name: geometry, dtype: object

>>> # Set both `expand` and `parse_geometry` to be `True`
>>> pbf_1 = PBFReadParse.read_pbf(rutland_pbf_path, expand=True, parse_geometry=True)
>>> pbf_1_points = pbf_1['points']
>>> # Check the difference in 'geometry' column, compared to `pbf_0_points`
>>> pbf_1_points['geometry'].head()
0    POINT (-0.5134241 52.6555853)
1    POINT (-0.5313354 52.6737716)
2    POINT (-0.7229332 52.5889864)
3    POINT (-0.7249816 52.6748426)
4    POINT (-0.7266581 52.6695058)
Name: geometry, dtype: object

>>> # Set both `expand` and `parse_properties` to be `True`
>>> pbf_2 = PBFReadParse.read_pbf(rutland_pbf_path, expand=True, parse_properties=True)
>>> pbf_2_points = pbf_2['points']
>>> pbf_2_points['other_tags'].head()
0                 "odbl"=>"clean"
1                            None
2                            None
3    "traffic_calming"=>"cushion"
4        "direction"=>"clockwise"
Name: other_tags, dtype: object

>>> # Set both `expand` and `parse_other_tags` to be `True`
>>> pbf_3 = PBFReadParse.read_pbf(rutland_pbf_path, expand=True, parse_properties=True,
...                               parse_other_tags=True)
>>> pbf_3_points = pbf_3['points']
>>> # Check the difference in 'other_tags', compared to ``pbf_2_points``
>>> pbf_3_points['other_tags'].head()
0                 {'odbl': 'clean'}
1                              None
2                              None
3    {'traffic_calming': 'cushion'}
4        {'direction': 'clockwise'}
Name: other_tags, dtype: object

>>> # Delete the downloaded PBF data file
>>> delete_dir(gfd.download_dir, verbose=True)
To delete the directory "tests\osm_data\" (Not empty)
? [No]|Yes: yes
Deleting "tests\osm_data\" ... Done.

See also