API¶
DB, Array, and Operator¶
Classes for connecting to SciDB and executing queries.
- class scidbpy.db.DB(scidb_url=None, scidb_auth=None, http_auth=None, verify=None, admin=False, namespace=None, use_arrow=False, result_size_limit=256, no_ops=False)[source]¶
SciDB Shim connection object.
>>> DB() ... DB('http://localhost:8080', None, None, None, False, None, False, 256, False)
>>> print(DB()) scidb_url = http://localhost:8080 scidb_auth = None http_auth = None verify = None admin = False namespace = None use_arrow = False result_size_limit = 256 no_ops = False
Constructor parameters:
- Parameters
scidb_url (string) – SciDB connection URL. The URL for the Shim server. If
None
, use the value of theSCIDB_URL
environment variable, if present (defaulthttp://localhost:8080
)scidb_auth (tuple) – Tuple with username and password for connecting to SciDB, if password authentication method is used (default
None
)http_auth (tuple) – Tuple with username and password for connecting to Shim, if Shim authentication is used (default
None
)verify (bool) – If
False
, HTTPS certificates are not verified. This value is passed to the Pythonrequests
library. See Python requests library SSL Cert Verification section for details on theverify
argument (defaultNone
)admin (bool) – Set to
True
to open a higher-priority session. This is identical with the--admin
flag for theiquery
SciDB client, see SciDB Documentation for details (defaultFalse
)namespace (string) – Initial namespace for the connection. Only applicable for SciDB Enterprise Edition. The namespace can changed at any time using the
set_namespace
SciDB operator (defaultNone
)use_arrow (bool) – If
True
, download SciDB array using Apache Arrow library. Requiresaccelerated_io_tools
andaio
enabled inShim
. IfTrue
, a Pandas DataFrame is returned (as_dataframe
has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo
has no effect). It can be overriden for eachiquery
call (defaultFalse
)result_size_limit (int) – absolute limit of the output file in Megabytes. Effective only when the accelerated_io_tools plug-in is installed in SciDB and aio is enabled in Shim (default 256 MB)
no_ops (bool) – If
True
, the list of operators is not fetched at this time and the connection is not implicitly verified. This expedites the execution of the function but disallows for calling the SciDB operators directly from theDB
instance e.g.,db.scan
(defaultFalse
)
- iquery(query, fetch=False, use_arrow=None, atts_only=False, as_dataframe=True, dataframe_promo=True, schema=None, upload_data=None, upload_schema=None)[source]¶
Execute query in SciDB
- Parameters
query (string) – SciDB AFL query to execute
fetch (bool) – If
True
, download SciDB array (defaultFalse
)use_arrow (bool) –
If
True
, download SciDB array using Apache Arrow library. Requiresaccelerated_io_tools
andaio
enabled inShim
. IfTrue
, a Pandas DataFrame is returned (as_dataframe
has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo
has no effect). IfNone
theuse_arrow
value set at connection time is used (defaultNone
)atts_only (bool) – If
True
, download only SciDB array attributes without dimensions (defaultFalse
)as_dataframe (bool) – If
True
, return a Pandas DataFrame. IfFalse
, return a NumPy array (defaultTrue
)dataframe_promo (bool) –
If
True
, null-able types are promoted as per Pandas promotion scheme IfFalse
, object records are used for null-able types (defaultTrue
)schema – Schema of the SciDB array to use when downloading the array. Schema is not verified. If schema is a Schema instance, it is copied. Otherwise, a :py:class:
Schema
object is built using :py:func:Schema.fromstring
(defaultNone
)
>>> DB().iquery('build(<x:int64>[i=0:1; j=0:1], i + j)', fetch=True) i j x 0 0 0 0.0 1 0 1 1.0 2 1 0 1.0 3 1 1 2.0
>>> DB().iquery("input({sch}, '{fn}', 0, '{fmt}')", ... fetch=True, ... upload_data=numpy.arange(3, 6)) i x 0 0 3 1 1 4 2 2 5
- class scidbpy.db.Operator(db, name, upload_data=None, upload_schema=None, *args)[source]¶
Store SciDB operator and arguments. Hungry operators (e.g., remove, store, etc.) evaluate immediately. Lazy operators evaluate on data fetch.
- scidbpy.db.connect¶
alias of
scidbpy.db.DB
- scidbpy.db.iquery(self, query, fetch=False, use_arrow=None, atts_only=False, as_dataframe=True, dataframe_promo=True, schema=None, upload_data=None, upload_schema=None)¶
Execute query in SciDB
- Parameters
query (string) – SciDB AFL query to execute
fetch (bool) – If
True
, download SciDB array (defaultFalse
)use_arrow (bool) –
If
True
, download SciDB array using Apache Arrow library. Requiresaccelerated_io_tools
andaio
enabled inShim
. IfTrue
, a Pandas DataFrame is returned (as_dataframe
has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo
has no effect). IfNone
theuse_arrow
value set at connection time is used (defaultNone
)atts_only (bool) – If
True
, download only SciDB array attributes without dimensions (defaultFalse
)as_dataframe (bool) – If
True
, return a Pandas DataFrame. IfFalse
, return a NumPy array (defaultTrue
)dataframe_promo (bool) –
If
True
, null-able types are promoted as per Pandas promotion scheme IfFalse
, object records are used for null-able types (defaultTrue
)schema – Schema of the SciDB array to use when downloading the array. Schema is not verified. If schema is a Schema instance, it is copied. Otherwise, a :py:class:
Schema
object is built using :py:func:Schema.fromstring
(defaultNone
)
>>> DB().iquery('build(<x:int64>[i=0:1; j=0:1], i + j)', fetch=True) i j x 0 0 0 0.0 1 0 1 1.0 2 1 0 1.0 3 1 1 2.0
>>> DB().iquery("input({sch}, '{fn}', 0, '{fmt}')", ... fetch=True, ... upload_data=numpy.arange(3, 6)) i x 0 0 3 1 1 4 2 2 5
Attribute, Dimension, and Schema¶
Classes for accessing SciDB data and schemas.
- class scidbpy.schema.Attribute(name, type_name, not_null=False, default=None, compression=None)[source]¶
Represent SciDB array attribute
Construct an attribute using Attribute constructor:
>>> Attribute('foo', 'int64', not_null=True) ... Attribute(name='foo', type_name='int64', not_null=True, default=None, compression=None)
>>> Attribute('foo', 'int64', default=100, compression='zlib') ... Attribute(name='foo', type_name='int64', not_null=False, default=100, compression='zlib')
Construct an attribute from a string:
>>> Attribute.fromstring('foo:int64') ... Attribute(name='foo', type_name='int64', not_null=False, default=None, compression=None)
>>> Attribute.fromstring( ... "taz : string NOT null DEFAULT '' compression 'bzlib'") ... Attribute(name='taz', type_name='string', not_null=True, default="''", compression='bzlib')
- class scidbpy.schema.Dimension(name, low_value=None, high_value=None, chunk_overlap=None, chunk_length=None)[source]¶
Represent SciDB array dimension
Construct a dimension using the Dimension constructor:
>>> Dimension('foo') ... Dimension(name='foo', low_value=None, high_value=None, chunk_overlap=None, chunk_length=None)
>>> Dimension('foo', -100, '10', '?', '1000') ... Dimension(name='foo', low_value=-100, high_value=10, chunk_overlap='?', chunk_length=1000)
Construct a dimension from a string:
>>> Dimension.fromstring('foo') ... Dimension(name='foo', low_value=None, high_value=None, chunk_overlap=None, chunk_length=None)
>>> Dimension.fromstring('foo=-100:*:?:10') ... Dimension(name='foo', low_value=-100, high_value='*', chunk_overlap='?', chunk_length=10)
- class scidbpy.schema.Schema(name=None, atts=(), dims=())[source]¶
Represent SciDB array schema
Construct a schema using Schema, Attribute, and Dimension constructors:
>>> Schema('foo', (Attribute('x', 'int64'),), (Dimension('i', 0, 10),)) ... Schema(name='foo', atts=(Attribute(name='x', type_name='int64', not_null=False, default=None, compression=None),), dims=(Dimension(name='i', low_value=0, high_value=10, chunk_overlap=None, chunk_length=None),))
Construct a schema using Schema constructor and fromstring methods of Attribute and Dimension:
>>> Schema('foo', ... (Attribute.fromstring('x:int64'),), ... (Dimension.fromstring('i=0:10'),)) ... Schema(name='foo', atts=(Attribute(name='x', type_name='int64', not_null=False, default=None, compression=None),), dims=(Dimension(name='i', low_value=0, high_value=10, chunk_overlap=None, chunk_length=None),))
Construct a schema from a string:
>>> Schema.fromstring( ... 'foo@1<x:int64 not null, y:double>[i=0:*; j=-100:0:0:10]') ... Schema(name='foo@1', atts=(Attribute(name='x', type_name='int64', not_null=True, default=None, compression=None), Attribute(name='y', type_name='double', not_null=False, default=None, compression=None)), dims=(Dimension(name='i', low_value=0, high_value='*', chunk_overlap=None, chunk_length=None), Dimension(name='j', low_value=-100, high_value=0, chunk_overlap=0, chunk_length=10)))
Print a schema constructed from a string:
>>> print(Schema.fromstring('<x:int64,y:float> [i=0:2:0:1000000; j=0:*]')) ... <x:int64,y:float> [i=0:2:0:1000000; j=0:*]
Format Schema object to only print the schema part without the array name:
>>> '{:h}'.format(Schema.fromstring('foo<x:int64>[i]')) '<x:int64> [i]'
- make_dims_atts()[source]¶
Make attributes from dimensions and pre-append them to the attributes list.
>>> s = Schema(None, (Attribute('x', 'bool'),), (Dimension('i'),)) >>> print(s) <x:bool> [i] >>> s.make_dims_atts() >>> print(s) <i:int64 NOT NULL,x:bool> [i]
>>> s = Schema.fromstring('<x:bool>[i;j]') >>> s.make_dims_atts() >>> print(s) <i:int64 NOT NULL,j:int64 NOT NULL,x:bool> [i; j]
- make_unique()[source]¶
Make dimension and attribute names unique within the schema. Return
True
if any dimension or attribute was renamed.>>> s = Schema(None, (Attribute('i', 'bool'),), (Dimension('i'),)) >>> print(s) <i:bool> [i] >>> s.make_unique() True >>> print(s) <i:bool> [i_1]
>>> s = Schema.fromstring('<i:bool, i:int64>[i;i_1;i]') >>> s.make_unique() True >>> print(s) <i:bool,i_2:int64> [i_3; i_1; i_4]
- promote(data)[source]¶
Promote nullable attributes in the DataFrame to types which support some type of null values as per Pandas promotion scheme