API Reference¶
-
class
pymapd.
Connection
(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)¶ Connect to your OmniSci database.
-
close
()¶ Disconnect from the database unless created with sessionid
-
commit
()¶ This is a noop, as OmniSci does not provide transactions.
Implemented to comply with the DBI specification.
-
create_table
(table_name, data, preserve_index=False)¶ Create a table from a pandas.DataFrame
- Parameters
- table_name: str
- data: DataFrame
- preserve_index: bool, default False
Whether to create a column in the table for the DataFrame index
-
deallocate_ipc
(df, device_id=0)¶ Deallocate a DataFrame using CPU shared memory.
- Parameters
- device_id: int
GPU which contains TDataFrame
-
deallocate_ipc_gpu
(df, device_id=0)¶ Deallocate a DataFrame using GPU memory.
- Parameters
- device_ids: int
GPU which contains TDataFrame
-
duplicate_dashboard
(dashboard_id, new_name=None, source_remap=None)¶ Duplicate an existing dashboard, returning the new dashboard id.
- Parameters
- dashboard_id: int
The id of the dashboard to duplicate
- new_name: str
The name for the new dashboard
- source_remap: dict
EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.
Examples
>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}} >>> newdash = con.duplicate_dashboard(12345, "new dash", source_remap)
-
execute
(operation, parameters=None)¶ Execute a SQL statement
- Parameters
- operation: str
A SQL statement to exucute
- Returns
- c: Cursor
-
get_dashboard
(dashboard_id)¶ Return the dashboard object of a specific dashboard
Examples
>>> con.get_dashboard(123)
-
get_dashboards
()¶ List all the dashboards in the database
Examples
>>> con.get_dashboards()
-
get_table_details
(table_name)¶ Get the column names and data types associated with a table.
- Parameters
- table_name: str
- Returns
- details: List[tuples]
Examples
>>> con.get_table_details('stocks') [ColumnDetails(name='date_', type='STR', nullable=True, precision=0, scale=0, comp_param=32, encoding='DICT'), ColumnDetails(name='trans', type='STR', nullable=True, precision=0, scale=0, comp_param=32, encoding='DICT'), ... ]
-
get_tables
()¶ List all the tables in the database
Examples
>>> con.get_tables() ['flights_2008_10k', 'stocks']
-
load_table
(table_name, data, method='infer', preserve_index=False, create='infer')¶ Load data into a table
- Parameters
- table_name: str
- data: pyarrow.Table, pandas.DataFrame, or iterable of tuples
- method: {‘infer’, ‘columnar’, ‘rows’, ‘arrow’}
Method to use for loading the data. Three options are available
pyarrow
and Apache Arrow loadercolumnar loader
row-wise loader
The Arrow loader is typically the fastest, followed by the columnar loader, followed by the row-wise loader. If a DataFrame or
pyarrow.Table
is passed andpyarrow
is installed, the Arrow-based loader will be used. If arrow isn’t available, the columnar loader is used. Finally,data
is an iterable of tuples the row-wise loader is used.- preserve_index: bool, default False
Whether to keep the index when loading a pandas DataFrame
- create: {“infer”, True, False}
Whether to issue a CREATE TABLE before inserting the data.
infer: check to see if the table already exists, and create a table if it does not
True: attempt to create the table, without checking if it exists
False: do not attempt to create the table
See also
-
load_table_arrow
(table_name, data, preserve_index=False)¶ Load a pandas.DataFrame or a pyarrow Table or RecordBatch to the database using Arrow columnar format for interchange
- Parameters
- table_name: str
- data: pandas.DataFrame, pyarrow.RecordBatch, pyarrow.Table
- preserve_index: bool, default False
Whether to include the index of a pandas DataFrame when writing.
Examples
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']}) >>> con.load_table_arrow('foo', df, preserve_index=False)
-
load_table_columnar
(table_name, data, preserve_index=False, chunk_size_bytes=0, col_names_from_schema=False)¶ Load a pandas DataFrame to the database using OmniSci’s Thrift-based columnar format
- Parameters
- table_name: str
- data: DataFrame
- preserve_index: bool, default False
Whether to include the index of a pandas DataFrame when writing.
- chunk_size_bytes: integer, default 0
Chunk the loading of columns to prevent large Thrift requests. A value of 0 means do not chunk and send the dataframe as a single request
- col_names_from_schema: bool, default False
Read the existing table schema to determine the column names. This will read the schema of an existing table in OmniSci and match those names to the column names of the dataframe. This is for user convenience when loading from data that is unordered, especially handy when a table has a large number of columns.
See also
Notes
Use
pymapd >= 0.11.0
while running withomnisci >= 4.6.0
in order to avoid loading inconsistent values into DATE column.Examples
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']}) >>> con.load_table_columnar('foo', df, preserve_index=False)
-
load_table_rowwise
(table_name, data)¶ Load data into a table row-wise
- Parameters
- table_name: str
- data: Iterable of tuples
Each element of data should be a row to be inserted
Examples
>>> data = [(1, 'a'), (2, 'b'), (3, 'c')] >>> con.load_table('bar', data)
-
register_runtime_udfs
()¶ Register any bending Runtime UDF functions in OmniSci server.
If no Runtime UDFs have been defined, the call to this method is noop.
-
render_vega
(vega, compression_level=1)¶ Render vega data on the database backend, returning the image as a PNG.
- Parameters
- vega: dict
The vega specification to render.
- compression_level: int
The level of compression for the rendered PNG. Ranges from 0 (low compression, faster) to 9 (high compression, slower).
-
select_ipc
(operation, parameters=None, first_n=- 1, release_memory=True)¶ Execute a
SELECT
operation using CPU shared memory- Parameters
- operation: str
A SQL select statement
- parameters: dict, optional
Parameters to insert for a parametrized query
- first_n: int, optional
Number of records to return
- release_memory: bool, optional
Call
self.deallocate_ipc(df)
after DataFrame created
- Returns
- df: pandas.DataFrame
Notes
This method requires the Python code to be executed on the same machine where OmniSci running.
-
select_ipc_gpu
(operation, parameters=None, device_id=0, first_n=- 1, release_memory=True)¶ Execute a
SELECT
operation using GPU memory.- Parameters
- operation: str
A SQL statement
- parameters: dict, optional
Parameters to insert into a parametrized query
- device_id: int
GPU to return results to
- first_n: int, optional
Number of records to return
- release_memory: bool, optional
Call
self.deallocate_ipc_gpu(df)
after DataFrame created
- Returns
- gdf: cudf.GpuDataFrame
Notes
This method requires
cudf
andlibcudf
to be installed. AnImportError
is raised if those aren’t available.This method requires the Python code to be executed on the same machine where OmniSci running.
-
-
class
pymapd.
Cursor
(connection)¶ A database cursor.
-
property
arraysize
¶ The number of rows to fetch at a time with fetchmany. Default 1.
See also
-
close
()¶ Close this cursor.
-
property
description
¶ Read-only sequence describing columns of the result set. Each column is an instance of Description describing
name
type_code
display_size
internal_size
precision
scale
null_ok
We only use name, type_code, and null_ok; The rest are always
None
-
execute
(operation, parameters=None)¶ Execute a SQL statement.
- Parameters
- operation: str
A SQL query
- parameters: dict
Parameters to substitute into
operation
.
- Returns
- selfCursor
Examples
>>> c = conn.cursor() >>> c.execute("select symbol, qty from stocks") >>> list(c) [('RHAT', 100.0), ('IBM', 1000.0), ('MSFT', 1000.0), ('IBM', 500.0)]
Passing in
parameters
:>>> c.execute("select symbol qty from stocks where qty <= :max_qty", ... parameters={"max_qty": 500}) [('RHAT', 100.0), ('IBM', 500.0)]
-
executemany
(operation, parameters)¶ Execute a SQL statement for many sets of parameters.
- Parameters
- operation: str
- parameters: list of dict
- Returns
- results: list of lists
-
fetchmany
(size=None)¶ Fetch
size
rows from the results set.
-
fetchone
()¶ Fetch a single row from the results set
-
property
-
pymapd.
connect
(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None, bin_cert_validate=None, bin_ca_certs=None, idpurl=None, idpformusernamefield='username', idpformpasswordfield='password', idpsslverify=True)¶ Create a new Connection.
- Parameters
- uri: str
- user: str
- password: str
- host: str
- port: int
- dbname: str
- protocol: {‘binary’, ‘http’, ‘https’}
- sessionid: str
- bin_cert_validate: bool, optional, binary encrypted connection only
Whether to continue if there is any certificate error
- bin_ca_certs: str, optional, binary encrypted connection only
Path to the CA certificate file
- idpurlstr
EXPERIMENTAL Enable SAML authentication by providing the logon page of the SAML Identity Provider.
- idpformusernamefield: str
The HTML form ID for the username, defaults to ‘username’.
- idpformpasswordfield: str
The HTML form ID for the password, defaults to ‘password’.
- idpsslverify: str
Enable / disable certificate checking, defaults to True.
- Returns
- conn: Connection
Examples
You can either pass a string
uri
, all the individual components, or an existing sessionid excluding user, password, and database>>> connect('mapd://admin:HyperInteractive@localhost:6274/omnisci?' ... 'protocol=binary') Connection(mapd://mapd:***@localhost:6274/mapd?protocol=binary)
>>> connect(user='admin', password='HyperInteractive', host='localhost', ... port=6274, dbname='omnisci')
>>> connect(user='admin', password='HyperInteractive', host='localhost', ... port=443, idpurl='https://sso.localhost/logon', protocol='https')
>>> connect(sessionid='XihlkjhdasfsadSDoasdllMweieisdpo', host='localhost', ... port=6273, protocol='http')
Exceptions¶
Define exceptions as specified by the DB API 2.0 spec.
Includes some helper methods for translating thrift exceptions to the ones defined here.
-
exception
pymapd.exceptions.
DatabaseError
¶ Raised when the database encounters an error.
-
exception
pymapd.exceptions.
Error
¶ Base class for all pymapd errors.
-
exception
pymapd.exceptions.
IntegrityError
¶ Raised when the relational integrity of the database is affected.
-
exception
pymapd.exceptions.
InterfaceError
¶ Raised whenever you use pymapd interface incorrectly.
-
exception
pymapd.exceptions.
InternalError
¶ Raised for errors internal to the database, e.g. and invalid cursor.
-
exception
pymapd.exceptions.
NotSupportedError
¶ Raised when an API not supported by the database is used.
-
exception
pymapd.exceptions.
OperationalError
¶ Raised for non-programmer related database errors, e.g. an unexpected disconnect.
-
exception
pymapd.exceptions.
ProgrammingError
¶ Raised for programming errors, e.g. syntax errors, table already exists.