GeofabrikReader.read_osm_pbf

GeofabrikReader.read_osm_pbf(subregion_name, data_dir=None, readable=False, expand=False, parse_geometry=False, parse_properties=False, parse_other_tags=False, update=False, download=True, pickle_it=False, ret_pickle_path=False, rm_pbf_file=False, chunk_size_limit=50, verbose=False, **kwargs)[source]

Read a PBF (.osm.pbf) data file of a geographic (sub)region.

Parameters:
  • subregion_name (str) – name of a geographic (sub)region (case-insensitive) that is available on Geofabrik free download server

  • data_dir (str | None) – directory where the .osm.pbf data file is located/saved; if None, the default local directory

  • readable (bool) – whether to parse each feature in the raw data, defaults to False

  • expand (bool) – whether to expand dict-like data into separate columns, defaults to False

  • parse_geometry (bool) – whether to represent the 'geometry' field in a shapely.geometry format, defaults to False

  • parse_properties (bool) – whether to represent the 'properties' field in a tabular format, defaults to False

  • parse_other_tags (bool) – whether to represent a 'other_tags' (of 'properties') in a dict format, defaults to False

  • download (bool) – whether to download/update the PBF data file of the given subregion, if it is not available at the specified path, defaults to True

  • update (bool) – whether to check to update pickle backup (if available), defaults to False

  • pickle_it (bool) – whether to save the .pbf data as a pickle file, defaults to False

  • ret_pickle_path (bool) – (when pickle_it=True) whether to return a path to the saved pickle file

  • rm_pbf_file (bool) – whether to delete the downloaded .osm.pbf file, defaults to False

  • chunk_size_limit (int | None) – threshold (in MB) that triggers the use of chunk parser, defaults to 50; if the size of the .osm.pbf file (in MB) is greater than chunk_size_limit, it will be parsed in a chunk-wise way

  • verbose (bool | int) – whether to print relevant information in console as the function runs, defaults to False

  • kwargs – [optional] parameters of the method PBFReadParse.read_pbf()

Returns:

dictionary of the .osm.pbf data; when pickle_it=True, return a tuple of the dictionary and a path to the pickle file

Return type:

dict | tuple | None

Examples:

>>> from pydriosm.reader import GeofabrikReader
>>> from pyhelpers.dirs import delete_dir

>>> gfr = GeofabrikReader()

>>> subrgn_name = 'rutland'
>>> dat_dir = "tests\osm_data"

>>> # If the PBF data of Rutland is not available at the specified data directory,
>>> # the function can download the latest data by setting `download=True` (default)
>>> pbf_raw = gfr.read_osm_pbf(subrgn_name, data_dir=dat_dir, verbose=True)
Downloading "rutland-latest.osm.pbf"
    to "tests\osm_data\rutland\" ... Done.
Reading "tests\osm_data\rutland\rutland-latest.osm.pbf" ... Done.
>>> type(pbf_raw)
dict
>>> list(pbf_raw.keys())
['points', 'lines', 'multilinestrings', 'multipolygons', 'other_relations']

>>> pbf_raw_points = pbf_raw['points']
>>> type(pbf_raw_points)
list
>>> type(pbf_raw_points[0])
osgeo.ogr.Feature

>>> # Set `readable=True`
>>> pbf_parsed = gfr.read_osm_pbf(subrgn_name, dat_dir, readable=True, verbose=True)
Parsing "tests\osm_data\rutland\rutland-latest.osm.pbf" ... Done.
>>> pbf_parsed_points = pbf_parsed['points']
>>> pbf_parsed_points.head()
0    {'type': 'Feature', 'geometry': {'type': 'Poin...
1    {'type': 'Feature', 'geometry': {'type': 'Poin...
2    {'type': 'Feature', 'geometry': {'type': 'Poin...
3    {'type': 'Feature', 'geometry': {'type': 'Poin...
4    {'type': 'Feature', 'geometry': {'type': 'Poin...
Name: points, dtype: object

>>> # Set `expand=True`, which would force `readable=True`
>>> pbf_parsed_ = gfr.read_osm_pbf(subrgn_name, dat_dir, expand=True, verbose=True)
Parsing "tests\osm_data\rutland\rutland-latest.osm.pbf" ... Done.
>>> pbf_parsed_points_ = pbf_parsed_['points']
>>> pbf_parsed_points_.head()
         id  ...                                         properties
0    488432  ...  {'osm_id': '488432', 'name': None, 'barrier': ...
1    488658  ...  {'osm_id': '488658', 'name': 'Tickencote Inter...
2  13883868  ...  {'osm_id': '13883868', 'name': None, 'barrier'...
3  14049101  ...  {'osm_id': '14049101', 'name': None, 'barrier'...
4  14558402  ...  {'osm_id': '14558402', 'name': None, 'barrier'...
[5 rows x 3 columns]

>>> # Set `readable` and `parse_geometry` to be `True`
>>> pbf_parsed_1 = gfr.read_osm_pbf(subrgn_name, dat_dir, readable=True,
...                                 parse_geometry=True)
>>> pbf_parsed_1_point = pbf_parsed_1['points'][0]
>>> pbf_parsed_1_point['geometry']
'POINT (-0.5134241 52.6555853)'
>>> pbf_parsed_1_point['properties']['other_tags']
'"odbl"=>"clean"'

>>> # Set `readable` and `parse_other_tags` to be `True`
>>> pbf_parsed_2 = gfr.read_osm_pbf(subrgn_name, dat_dir, readable=True,
...                                 parse_other_tags=True)
>>> pbf_parsed_2_point = pbf_parsed_2['points'][0]
>>> pbf_parsed_2_point['geometry']
{'type': 'Point', 'coordinates': [-0.5134241, 52.6555853]}
>>> pbf_parsed_2_point['properties']['other_tags']
{'odbl': 'clean'}

>>> # Set `readable`, `parse_geometry` and `parse_other_tags` to be `True`
>>> pbf_parsed_3 = gfr.read_osm_pbf(subrgn_name, dat_dir, readable=True,
...                                 parse_geometry=True, parse_other_tags=True)
>>> pbf_parsed_3_point = pbf_parsed_3['points'][0]
>>> pbf_parsed_3_point['geometry']
'POINT (-0.5134241 52.6555853)'
>>> pbf_parsed_3_point['properties']['other_tags']
{'odbl': 'clean'}

>>> # Delete the example data and the test data directory
>>> delete_dir(dat_dir, verbose=True)
To delete the directory "tests\osm_data\" (Not empty)
? [No]|Yes: yes
Deleting "tests\osm_data\" ... Done.