PBFReadParse.read_pbf
- classmethod PBFReadParse.read_pbf(pbf_pathname, readable=True, expand=False, parse_geometry=False, parse_properties=False, parse_other_tags=False, number_of_chunks=None, max_tmpfile_size=5000, **kwargs)[source]
Parse a PBF data file (by GDAL).
- Parameters:
pbf_pathname (str) – pathname of a PBF data file
readable (bool) – whether to parse each feature in the raw data, defaults to
False
expand (bool) – whether to expand dict-like data into separate columns, defaults to
False
parse_geometry (bool) – whether to represent the
'geometry'
field in a shapely.geometry format, defaults toFalse
parse_properties (bool) – whether to represent the
'properties'
field in a tabular format, defaults toFalse
parse_other_tags (bool) – whether to represent a
'other_tags'
(of'properties'
) in a dict format, defaults toFalse
number_of_chunks (int | None) – number of chunks, defaults to
None
max_tmpfile_size (int | None) – maximum size of the temporary file, defaults to
None
; whenmax_tmpfile_size=None
, it defaults to5000
kwargs – [optional] parameters of the function pyhelpers.settings.gdal_configurations()
- Returns:
parsed OSM PBF data
- Return type:
dict
Note
The GDAL/OGR drivers categorizes the features of OSM PBF data into five layers:
0: ‘points’ - “node” features having significant tags attached
1: ‘lines’ - “way” features being recognized as non-area
2: ‘multilinestrings’ - “relation” features forming a multilinestring (type=’multilinestring’ / type=’route’)
3: ‘multipolygons’ - “relation” features forming a multipolygon (type=’multipolygon’ / type=’boundary’), and “way” features being recognized as area
4: ‘other_relations’ - “relation” features not belonging to the above 2 layers
For more information, please refer to OpenStreetMap XML and PBF.
Warning
Parsing large PBF data files (e.g. > 50MB) can be time-consuming!
The function
read_osm_pbf()
may require fairly high amount of physical memory to parse large files, in which case it would be recommended thatnumber_of_chunks
is set to be a reasonable value.
Examples:
>>> from pydriosm.reader import PBFReadParse >>> from pydriosm.downloader import GeofabrikDownloader >>> from pyhelpers.dirs import delete_dir >>> import os >>> # Download the PBF data file of 'Rutland' as an example >>> subrgn_name = 'rutland' >>> file_format = ".pbf" >>> dwnld_dir = "tests\osm_data" >>> gfd = GeofabrikDownloader() >>> gfd.download_osm_data(subrgn_name, file_format, dwnld_dir, verbose=True) To download .osm.pbf data of the following geographic (sub)region(s): Rutland ? [No]|Yes: yes Downloading "rutland-latest.osm.pbf" to "tests\osm_data\rutland\" ... Done. >>> rutland_pbf_path = gfd.data_paths[0] >>> os.path.relpath(rutland_pbf_path) 'tests\osm_data\rutland\rutland-latest.osm.pbf' >>> # Read the downloaded PBF data >>> rutland_pbf = PBFReadParse.read_pbf(rutland_pbf_path) >>> type(rutland_pbf) dict >>> list(rutland_pbf.keys()) ['points', 'lines', 'multilinestrings', 'multipolygons', 'other_relations'] >>> rutland_pbf_points = rutland_pbf['points'] >>> rutland_pbf_points.head() 0 {'type': 'Feature', 'geometry': {'type': 'Poin... 1 {'type': 'Feature', 'geometry': {'type': 'Poin... 2 {'type': 'Feature', 'geometry': {'type': 'Poin... 3 {'type': 'Feature', 'geometry': {'type': 'Poin... 4 {'type': 'Feature', 'geometry': {'type': 'Poin... Name: points, dtype: object >>> # Set `expand` to be `True` >>> pbf_0 = PBFReadParse.read_pbf(rutland_pbf_path, expand=True) >>> type(pbf_0) dict >>> list(pbf_0.keys()) ['points', 'lines', 'multilinestrings', 'multipolygons', 'other_relations'] >>> pbf_0_points = pbf_0['points'] >>> pbf_0_points.head() id ... properties 0 488432 ... {'osm_id': '488432', 'name': None, 'barrier': ... 1 488658 ... {'osm_id': '488658', 'name': 'Tickencote Inter... 2 13883868 ... {'osm_id': '13883868', 'name': None, 'barrier'... 3 14049101 ... {'osm_id': '14049101', 'name': None, 'barrier'... 4 14558402 ... {'osm_id': '14558402', 'name': None, 'barrier'... [5 rows x 3 columns] >>> pbf_0_points['geometry'].head() 0 {'type': 'Point', 'coordinates': [-0.5134241, ... 1 {'type': 'Point', 'coordinates': [-0.5313354, ... 2 {'type': 'Point', 'coordinates': [-0.7229332, ... 3 {'type': 'Point', 'coordinates': [-0.7249816, ... 4 {'type': 'Point', 'coordinates': [-0.7266581, ... Name: geometry, dtype: object >>> # Set both `expand` and `parse_geometry` to be `True` >>> pbf_1 = PBFReadParse.read_pbf(rutland_pbf_path, expand=True, parse_geometry=True) >>> pbf_1_points = pbf_1['points'] >>> # Check the difference in 'geometry' column, compared to `pbf_0_points` >>> pbf_1_points['geometry'].head() 0 POINT (-0.5134241 52.6555853) 1 POINT (-0.5313354 52.6737716) 2 POINT (-0.7229332 52.5889864) 3 POINT (-0.7249816 52.6748426) 4 POINT (-0.7266581 52.6695058) Name: geometry, dtype: object >>> # Set both `expand` and `parse_properties` to be `True` >>> pbf_2 = PBFReadParse.read_pbf(rutland_pbf_path, expand=True, parse_properties=True) >>> pbf_2_points = pbf_2['points'] >>> pbf_2_points['other_tags'].head() 0 "odbl"=>"clean" 1 None 2 None 3 "traffic_calming"=>"cushion" 4 "direction"=>"clockwise" Name: other_tags, dtype: object >>> # Set both `expand` and `parse_other_tags` to be `True` >>> pbf_3 = PBFReadParse.read_pbf(rutland_pbf_path, expand=True, parse_properties=True, ... parse_other_tags=True) >>> pbf_3_points = pbf_3['points'] >>> # Check the difference in 'other_tags', compared to ``pbf_2_points`` >>> pbf_3_points['other_tags'].head() 0 {'odbl': 'clean'} 1 None 2 None 3 {'traffic_calming': 'cushion'} 4 {'direction': 'clockwise'} Name: other_tags, dtype: object >>> # Delete the downloaded PBF data file >>> delete_dir(gfd.download_dir, verbose=True) To delete the directory "tests\osm_data\" (Not empty) ? [No]|Yes: yes Deleting "tests\osm_data\" ... Done.
See also
Examples for the methods:
GeofabrikReader.read_osm_pbf()
andBBBikeReader.read_osm_pbf()
.