parse_osm_pbf

pydriosm.reader.parse_osm_pbf(path_to_osm_pbf, parse_raw_feat=False, transform_geom=False, transform_other_tags=False, number_of_chunks=None, max_tmpfile_size=None)

Parse a PBF data file.

Parameters
  • path_to_osm_pbf (str) – path to a PBF data file

  • parse_raw_feat (bool) – whether to parse each feature in the raw data, defaults to False

  • transform_geom – whether to transform a single coordinate (or a collection of coordinates) into a geometric object, defaults to False

  • transform_other_tags (bool) – whether to transform a 'other_tags' into a dictionary, defaults to False

  • number_of_chunks (int or None) – number of chunks, defaults to None

  • max_tmpfile_size (int or None) – defaults to None; see also gdal_configurations()

Returns

parsed OSM PBF data

Return type

dict

Note

The driver categorises features into 5 layers:

  • 0: ‘points’ - “node” features having significant tags attached

  • 1: ‘lines’ - “way” features being recognized as non-area

  • 2: ‘multilinestrings’ - “relation” features forming a multilinestring (type=’multilinestring’ / type=’route’)

  • 3: ‘multipolygons’ - “relation” features forming a multipolygon (type=’multipolygon’ / type=’boundary’), and “way” features being recognized as area

  • 4: ‘other_relations’ - “relation” features not belonging to the above 2 layers

See also [POP-1].

This function may require fairly high amount of physical memory to parse large files (e.g. > 200MB), in which case it would be recommended that number_of_chunks is set to be a reasonable value.

Example:

>>> import os
>>> from pydriosm.reader import GeofabrikDownloader, parse_osm_pbf

>>> # Download the PBF data file of Rutland as an example
>>> geofabrik_downloader = GeofabrikDownloader()

>>> path_to_rutland_pbf = geofabrik_downloader.download_osm_data(
...     subregion_names='Rutland', osm_file_format=".pbf", download_dir="tests",
...     verbose=True, ret_download_path=True)
To download .osm.pbf data of the following geographic region(s):
    Rutland
? [No]|Yes: yes
Downloading "rutland-latest.osm.pbf" to "tests\" ... Done.

>>> print(os.path.relpath(path_to_rutland_pbf))
tests\rutland-latest.osm.pbf

>>> # Parse the downloaded PBF data
>>> rutland_pbf_raw = parse_osm_pbf(path_to_rutland_pbf)

>>> type(rutland_pbf_raw)
dict
>>> list(rutland_pbf_raw.keys())
['points', 'lines', 'multilinestrings', 'multipolygons', 'other_relations']

>>> rutland_pbf_raw_points = rutland_pbf_raw['points']
>>> rutland_pbf_raw_points.head()
                                              points
0  {"type": "Feature", "geometry": {"type": "Poin...
1  {"type": "Feature", "geometry": {"type": "Poin...
2  {"type": "Feature", "geometry": {"type": "Poin...
3  {"type": "Feature", "geometry": {"type": "Poin...
4  {"type": "Feature", "geometry": {"type": "Poin...

>>> # Set ``parse_raw_feat`` to be ``True``
>>> rutland_pbf_parsed_0 = parse_osm_pbf(path_to_rutland_pbf, parse_raw_feat=True)

>>> rutland_pbf_parsed_points_0 = rutland_pbf_parsed_0['points']
>>> rutland_pbf_parsed_points_0.head()
         id               coordinates  ... man_made                    other_tags
0    488432  [-0.5134241, 52.6555853]  ...     None               "odbl"=>"clean"
1    488658  [-0.5313354, 52.6737716]  ...     None                          None
2  13883868  [-0.7229332, 52.5889864]  ...     None                          None
3  14049101  [-0.7249922, 52.6748223]  ...     None  "traffic_calming"=>"cushion"
4  14558402  [-0.7266686, 52.6695051]  ...     None      "direction"=>"clockwise"
[5 rows x 12 columns]

>>> # Set both ``parse_raw_feat`` and ``transform_geom`` to be ``True``
>>> rutland_pbf_parsed_1 = parse_osm_pbf(path_to_rutland_pbf, parse_raw_feat=True,
...                                      transform_geom=True)

>>> rutland_pbf_parsed_points_1 = rutland_pbf_parsed_1['points']
>>> # Check the difference in 'coordinates', compared to ``rutland_pbf_parsed_points_0``
>>> rutland_pbf_parsed_points_1[['coordinates']].head()
                                coordinates
0             POINT (-0.5134241 52.6555853)
1             POINT (-0.5313354 52.6737716)
2    POINT (-0.7229332000000001 52.5889864)
3             POINT (-0.7249922 52.6748223)
4             POINT (-0.7266686 52.6695051)

>>> # Further, set ``transform_other_tags`` to be ``True``
>>> rutland_pbf_parsed_2 = parse_osm_pbf(path_to_rutland_pbf, parse_raw_feat=True,
...                                      transform_other_tags=True)

>>> rutland_pbf_parsed_points_2 = rutland_pbf_parsed_2['points']
>>> # Check the difference in 'other_tags', compared to ``rutland_pbf_parsed_points_0``
>>> rutland_pbf_parsed_points_2[['other_tags']].head()
                       other_tags
0               {'odbl': 'clean'}
1                            None
2                            None
3  {'traffic_calming': 'cushion'}
4      {'direction': 'clockwise'}

>>> # Delete the downloaded PBF data file
>>> os.remove(path_to_rutland_pbf)

See also

More examples for the method GeofabrikReader.read_osm_pbf().