PostgresOSM.read_sql_query¶

PostgresOSM.read_sql_query(sql_query, method='tempfile', tempfile_mode='w+b', max_size_spooled=1, delimiter=',', dtype=None, tempfile_kwargs=None, stringio_kwargs=None, **kwargs)¶

Read table data by SQL query (recommended for large table).

See also [SQL-P-RSQ-1], [SQL-P-RSQ-2] and [SQL-P-RSQ-3].

Parameters

sql_query (str) – SQL query to be executed
method (str) –
method to be used for buffering temporary data
- 'tempfile' (default): use tempfile.TemporaryFile
- 'stringio': use io.StringIO
- 'spooled': use tempfile.SpooledTemporaryFile
tempfile_mode (str) – mode of the specified method, defaults to 'w+b'
max_size_spooled (int or float) – max_size of tempfile.SpooledTemporaryFile, defaults to 1 (in gigabyte)
delimiter (str) – delimiter used in data, defaults to ','
dtype (dict or None) – data type for specified data columns, dtype used by pandas.read_csv, defaults to None
tempfile_kwargs – optional parameters of tempfile.TemporaryFile or tempfile.SpooledTemporaryFile
stringio_kwargs – optional parameters of io.StringIO, e.g. initial_value (default: '')
kwargs – optional parameters of pandas.read_csv

Returns

data frame as queried by the statement sql_query

Return type

pandas.DataFrame

Examples:

>>> import pandas
>>> from pyhelpers.sql import PostgreSQL

>>> testdb = PostgreSQL('localhost', 5432, username='postgres', database_name='testdb')
Password (postgres@localhost:5432): ***
Connecting postgres:***@localhost:5432/testdb ... Successfully.

>>> # Create a pandas.DataFrame
>>> xy_array = [(530034, 180381),
...             (406689, 286822),
...             (383819, 398052),
...             (582044, 152953)]
>>> idx_labels = ['London', 'Birmingham', 'Manchester', 'Leeds']
>>> col_names = ['Easting', 'Northing']
>>> dat = pandas.DataFrame(xy_array, index=idx_labels, columns=col_names)

>>> print(dat)
            Easting  Northing
London       530034    180381
Birmingham   406689    286822
Manchester   383819    398052
Leeds        582044    152953

>>> table = 'England'
>>> schema = 'points'

>>> testdb.import_data(dat, table, schema, if_exists='replace', index=True,
...                    chunk_size=None, force_replace=False, col_type=None, verbose=2)
To import data into "points"."England" at postgres:***@localhost:5432/testdb
? [No]|Yes: yes
Creating a schema: "points" ... Done.
Importing the data into the table "points"."England" ... Done.

>>> res = testdb.table_exists(table, schema)
>>> print("The table "{}"."{}" exists? {}.".format(schema, table, res))
The table "points"."England" exists? True.

>>> dat_retrieval = testdb.read_table(table, schema, index_col='index')
>>> dat_retrieval.index.name = None
>>> print(dat_retrieval)
            Easting  Northing
London       530034    180381
Birmingham   406689    286822
Manchester   383819    398052
Leeds        582044    152953

>>> # Alternatively
>>> sql_qry = 'SELECT * FROM "{}"."{}"'.format(schema, table)

>>> dat_retrieval_alt = testdb.read_sql_query(sql_qry, index_col='index')
>>> dat_retrieval_alt.index.name = None
>>> print(dat_retrieval_alt)
            Easting  Northing
London       530034    180381
Birmingham   406689    286822
Manchester   383819    398052
Leeds        582044    152953

>>> # Delete the table "England"
>>> testdb.drop_table(table_name=table, schema_name=schema, verbose=True)
To drop the table "points"."England" from postgres:***@localhost:5432/testdb
? [No]|Yes: yes
Dropping "points"."England" ... Done.

>>> # Delete the schema "points"
>>> testdb.drop_schema(schema, verbose=True)
To drop the schema "points" from postgres:***@localhost:5432/testdb
? [No]|Yes: yes
Dropping "points" ... Done.

>>> # Delete the database "testdb"
>>> testdb.drop_database(verbose=True)
To drop the database "testdb" from postgres:***@localhost:5432
? [No]|Yes: yes
Dropping "testdb" ... Done.

Aside: a brief example of using the parameter params for pandas.read_sql

import datetime

sql_qry = 'SELECT * FROM "table_name" '
          'WHERE "timestamp_column_name" BETWEEN %(ts_start)s AND %(ts_end)s'

params = {'ds_start': datetime.datetime.today(), 'ds_end': datetime.datetime.today()}

data_frame = pandas.read_sql(sql_qry, testdb.engine, params=params)