API Reference

pymapd.connect(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None)

Create a new Connection.

Parameters
uri: str
user: str
password: str
host: str
port: int
dbname: str
protocol: {‘binary’, ‘http’, ‘https’}
sessionid: str
Returns
conn: Connection

Examples

You can either pass a string uri, all the individual components, or an existing sessionid excluding user, password, and database

>>> connect('mapd://admin:HyperInteractive@localhost:6274/omnisci?'
...         'protocol=binary')
Connection(mapd://mapd:***@localhost:6274/mapd?protocol=binary)
>>> connect(user='admin', password='HyperInteractive', host='localhost',
...         port=6274, dbname='omnisci')
>>> connect(sessionid='XihlkjhdasfsadSDoasdllMweieisdpo', host='localhost',
...         port=6273, protocol='http')
class pymapd.Connection(uri=None, user=None, password=None, host=None, port=6274, dbname=None, protocol='binary', sessionid=None)

Connect to your OmniSci database.

close(self)

Disconnect from the database

commit(self)

This is a noop, as OmniSci does not provide transactions.

Implemented to comply with the DBI specification.

create_table(self, table_name, data, preserve_index=False)

Create a table from a pandas.DataFrame

Parameters
table_name: str
data: DataFrame
preserve_index: bool, default False

Whether to create a column in the table for the DataFrame index

cursor(self)

Create a new Cursor object attached to this connection.

deallocate_ipc(self, df, device_id=0)

Deallocate a DataFrame using CPU shared memory.

Parameters
device_id: int

GPU which contains TDataFrame

deallocate_ipc_gpu(self, df, device_id=0)

Deallocate a DataFrame using GPU memory.

Parameters
device_ids: int

GPU which contains TDataFrame

duplicate_dashboard(self, dashboard_id, new_name=None, source_remap=None)

Duplicate an existing dashboard, returning the new dashboard id.

Parameters
dashboard_id: int

The id of the dashboard to duplicate

new_name: str

The name for the new dashboard

source_remap: dict

EXPERIMENTAL A dictionary remapping table names. The old table name(s) should be keys of the dict, with each value being another dict with a ‘name’ key holding the new table value. This structure can be used later to support changing column names.

Examples

>>> source_remap = {'oldtablename1': {'name': 'newtablename1'}, 'oldtablename2': {'name': 'newtablename2'}}
>>> newdash = con.duplicate_dashboard(12345, "new dash", source_remap)
execute(self, operation, parameters=None)

Execute a SQL statement

Parameters
operation: str

A SQL statement to exucute

Returns
c: Cursor
get_dashboards(self)

List all the dashboards in the database

Examples

>>> con.get_dashboards()
get_table_details(self, table_name)

Get the column names and data types associated with a table.

Parameters
table_name: str
Returns
details: List[tuples]

Examples

>>> con.get_table_details('stocks')
[ColumnDetails(name='date_', type='STR', nullable=True, precision=0,
               scale=0, comp_param=32, encoding='DICT'),
 ColumnDetails(name='trans', type='STR', nullable=True, precision=0,
               scale=0, comp_param=32, encoding='DICT'),
 ...
]
get_tables(self)

List all the tables in the database

Examples

>>> con.get_tables()
['flights_2008_10k', 'stocks']
load_table(self, table_name, data, method='infer', preserve_index=False, create='infer')

Load data into a table

Parameters
table_name: str
data: pyarrow.Table, pandas.DataFrame, or iterable of tuples
method: {‘infer’, ‘columnar’, ‘rows’, ‘arrow’}

Method to use for loading the data. Three options are available

  1. pyarrow and Apache Arrow loader

  2. columnar loader

  3. row-wise loader

The Arrow loader is typically the fastest, followed by the columnar loader, followed by the row-wise loader. If a DataFrame or pyarrow.Table is passed and pyarrow is installed, the Arrow-based loader will be used. If arrow isn’t available, the columnar loader is used. Finally, data is an iterable of tuples the row-wise loader is used.

preserve_index: bool, default False

Whether to keep the index when loading a pandas DataFrame

create: {“infer”, True, False}

Whether to issue a CREATE TABLE before inserting the data.

  • infer: check to see if the table already exists, and create a table if it does not

  • True: attempt to create the table, without checking if it exists

  • False: do not attempt to create the table

load_table_arrow(self, table_name, data, preserve_index=False)

Load a pandas.DataFrame or a pyarrow Table or RecordBatch to the database using Arrow columnar format for interchange

Parameters
table_name: str
data: pandas.DataFrame, pyarrow.RecordBatch, pyarrow.Table
preserve_index: bool, default False

Whether to include the index of a pandas DataFrame when writing.

Examples

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']})
>>> con.load_table_arrow('foo', df, preserve_index=False)
load_table_columnar(self, table_name, data, preserve_index=False, chunk_size_bytes=0, col_names_from_schema=False)

Load a pandas DataFrame to the database using OmniSci’s Thrift-based columnar format

Parameters
table_name: str
data: DataFrame
preserve_index: bool, default False

Whether to include the index of a pandas DataFrame when writing.

chunk_size_bytes: integer, default 0

Chunk the loading of columns to prevent large Thrift requests. A value of 0 means do not chunk and send the dataframe as a single request

col_names_from_schema: bool, default False

Read the existing table schema to determine the column names. This will read the schema of an existing table in OmniSci and match those names to the column names of the dataframe. This is for user convenience when loading from data that is unordered, especially handy when a table has a large number of columns.

Notes

Use pymapd >= 0.11.0 while running with omnisci >= 4.6.0 in order to avoid loading inconsistent values into DATE column.

Examples

>>> df = pd.DataFrame({"a": [1, 2, 3], "b": ['d', 'e', 'f']})
>>> con.load_table_columnar('foo', df, preserve_index=False)
load_table_rowwise(self, table_name, data)

Load data into a table row-wise

Parameters
table_name: str
data: Iterable of tuples

Each element of data should be a row to be inserted

Examples

>>> data = [(1, 'a'), (2, 'b'), (3, 'c')]
>>> con.load_table('bar', data)
render_vega(self, vega, compression_level=1)

Render vega data on the database backend, returning the image as a PNG.

Parameters
vega: dict

The vega specification to render.

compression_level: int

The level of compression for the rendered PNG. Ranges from 0 (low compression, faster) to 9 (high compression, slower).

select_ipc(self, operation, parameters=None, first_n=-1, release_memory=True)

Execute a SELECT operation using CPU shared memory

Parameters
operation: str

A SQL select statement

parameters: dict, optional

Parameters to insert for a parametrized query

first_n: int, optional

Number of records to return

release_memory: bool, optional

Call self.deallocate_ipc(df) after DataFrame created

Returns
df: pandas.DataFrame

Notes

This method requires the Python code to be executed on the same machine where OmniSci running.

select_ipc_gpu(self, operation, parameters=None, device_id=0, first_n=-1, release_memory=True)

Execute a SELECT operation using GPU memory.

Parameters
operation: str

A SQL statement

parameters: dict, optional

Parameters to insert into a parametrized query

device_id: int

GPU to return results to

first_n: int, optional

Number of records to return

release_memory: bool, optional

Call self.deallocate_ipc_gpu(df) after DataFrame created

Returns
gdf: cudf.GpuDataFrame

Notes

This method requires cudf and libcudf to be installed. An ImportError is raised if those aren’t available.

This method requires the Python code to be executed on the same machine where OmniSci running.

class pymapd.Cursor(connection)

A database cursor.

property arraysize

The number of rows to fetch at a time with fetchmany. Default 1.

See also

fetchmany
close(self)

Close this cursor.

property description

Read-only sequence describing columns of the result set. Each column is an instance of Description describing

  • name

  • type_code

  • display_size

  • internal_size

  • precision

  • scale

  • null_ok

We only use name, type_code, and null_ok; The rest are always None

execute(self, operation, parameters=None)

Execute a SQL statement.

Parameters
operation: str

A SQL query

parameters: dict

Parameters to substitute into operation.

Returns
selfCursor

Examples

>>> c = conn.cursor()
>>> c.execute("select symbol, qty from stocks")
>>> list(c)
[('RHAT', 100.0), ('IBM', 1000.0), ('MSFT', 1000.0), ('IBM', 500.0)]

Passing in parameters:

>>> c.execute("select symbol qty from stocks where qty <= :max_qty",
...           parameters={"max_qty": 500})
[('RHAT', 100.0), ('IBM', 500.0)]
executemany(self, operation, parameters)

Execute a SQL statement for many sets of parameters.

Parameters
operation: str
parameters: list of dict
Returns
results: list of lists
fetchmany(self, size=None)

Fetch size rows from the results set.

fetchone(self)

Fetch a single row from the results set

Exceptions

Define exceptions as specified by the DB API 2.0 spec.

Includes some helper methods for translating thrift exceptions to the ones defined here.

exception pymapd.exceptions.Error

Base class for all pymapd errors.

exception pymapd.exceptions.InterfaceError

Raised whenever you use pymapd interface incorrectly.

exception pymapd.exceptions.DatabaseError

Raised when the database encounters an error.

exception pymapd.exceptions.OperationalError

Raised for non-programmer related database errors, e.g. an unexpected disconnect.

exception pymapd.exceptions.IntegrityError

Raised when the relational integrity of the database is affected.

exception pymapd.exceptions.InternalError

Raised for errors internal to the database, e.g. and invalid cursor.

exception pymapd.exceptions.ProgrammingError

Raised for programming errors, e.g. syntax errors, table already exists.

exception pymapd.exceptions.NotSupportedError

Raised when an API not supported by the database is used.