API

DB, Array, and Operator

Classes for connecting to SciDB and executing queries.

class scidbpy.db.Array(db, name, gc=False)[source]

Access to individual array

head(n=5, **kwargs)[source]

Similar to pandas.DataFrame.head. Makes use of the limit operator, if available.

class scidbpy.db.ArrayExp(exp)[source]

Access to individual attribute or dimension

class scidbpy.db.Arrays(db)[source]

Access to arrays available in SciDB

class scidbpy.db.DB(scidb_url=None, scidb_auth=None, http_auth=None, verify=None, admin=False, namespace=None, use_arrow=False, result_size_limit=256, no_ops=False)[source]

SciDB Shim connection object.

>>> DB()
... 
DB('http://localhost:8080',
   None,
   None,
   None,
   False,
   None,
   False,
   256,
   False)
>>> print(DB())
scidb_url         = http://localhost:8080
scidb_auth        = None
http_auth         = None
verify            = None
admin             = False
namespace         = None
use_arrow         = False
result_size_limit = 256
no_ops            = False

Constructor parameters:

Parameters
  • scidb_url (string) – SciDB connection URL. The URL for the Shim server. If None, use the value of the SCIDB_URL environment variable, if present (default http://localhost:8080)

  • scidb_auth (tuple) – Tuple with username and password for connecting to SciDB, if password authentication method is used (default None)

  • http_auth (tuple) – Tuple with username and password for connecting to Shim, if Shim authentication is used (default None)

  • verify (bool) – If False, HTTPS certificates are not verified. This value is passed to the Python requests library. See Python requests library SSL Cert Verification section for details on the verify argument (default None)

  • admin (bool) – Set to True to open a higher-priority session. This is identical with the --admin flag for the iquery SciDB client, see SciDB Documentation for details (default False)

  • namespace (string) – Initial namespace for the connection. Only applicable for SciDB Enterprise Edition. The namespace can changed at any time using the set_namespace SciDB operator (default None)

  • use_arrow (bool) – If True, download SciDB array using Apache Arrow library. Requires accelerated_io_tools and aio enabled in Shim. If True, a Pandas DataFrame is returned (as_dataframe has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo has no effect). It can be overriden for each iquery call (default False)

  • result_size_limit (int) – absolute limit of the output file in Megabytes. Effective only when the accelerated_io_tools plug-in is installed in SciDB and aio is enabled in Shim (default 256 MB)

  • no_ops (bool) – If True, the list of operators is not fetched at this time and the connection is not implicitly verified. This expedites the execution of the function but disallows for calling the SciDB operators directly from the DB instance e.g., db.scan (default False)

iquery(query, fetch=False, use_arrow=None, atts_only=False, as_dataframe=True, dataframe_promo=True, schema=None, upload_data=None, upload_schema=None)[source]

Execute query in SciDB

Parameters
  • query (string) – SciDB AFL query to execute

  • fetch (bool) – If True, download SciDB array (default False)

  • use_arrow (bool) –

    If True, download SciDB array using Apache Arrow library. Requires accelerated_io_tools and aio enabled in Shim. If True, a Pandas DataFrame is returned (as_dataframe has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo has no effect). If None the use_arrow value set at connection time is used (default None)

  • atts_only (bool) – If True, download only SciDB array attributes without dimensions (default False)

  • as_dataframe (bool) – If True, return a Pandas DataFrame. If False, return a NumPy array (default True)

  • dataframe_promo (bool) –

    If True, null-able types are promoted as per Pandas promotion scheme If False, object records are used for null-able types (default True)

  • schema – Schema of the SciDB array to use when downloading the array. Schema is not verified. If schema is a Schema instance, it is copied. Otherwise, a :py:class:Schema object is built using :py:func:Schema.fromstring (default None)

>>> DB().iquery('build(<x:int64>[i=0:1; j=0:1], i + j)', fetch=True)
   i  j    x
0  0  0  0.0
1  0  1  1.0
2  1  0  1.0
3  1  1  2.0
>>> DB().iquery("input({sch}, '{fn}', 0, '{fmt}')",
...             fetch=True,
...             upload_data=numpy.arange(3, 6))
   i  x
0  0  3
1  1  4
2  2  5
iquery_readlines(query)[source]

Execute query in SciDB

>>> DB().iquery_readlines('build(<x:int64>[i=0:2], i * i)')
... 
[...'0', ...'1', ...'4']
>>> DB().iquery_readlines(
...   'apply(build(<x:int64>[i=0:2], i), y, i + 10)')
... 
[[...'0', ...'10'], [...'1', ...'11'], [...'2', ...'12']]
load_ops()[source]

Get list of operators and macros.

next_array_name()[source]

Generate a uniqu array name. Keep track on these names using the _uid field and a counter

class scidbpy.db.Operator(db, name, upload_data=None, upload_schema=None, *args)[source]

Store SciDB operator and arguments. Hungry operators (e.g., remove, store, etc.) evaluate immediately. Lazy operators evaluate on data fetch.

class scidbpy.db.Shim(value)[source]

An enumeration.

scidbpy.db.connect

alias of scidbpy.db.DB

scidbpy.db.iquery(self, query, fetch=False, use_arrow=None, atts_only=False, as_dataframe=True, dataframe_promo=True, schema=None, upload_data=None, upload_schema=None)

Execute query in SciDB

Parameters
  • query (string) – SciDB AFL query to execute

  • fetch (bool) – If True, download SciDB array (default False)

  • use_arrow (bool) –

    If True, download SciDB array using Apache Arrow library. Requires accelerated_io_tools and aio enabled in Shim. If True, a Pandas DataFrame is returned (as_dataframe has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo has no effect). If None the use_arrow value set at connection time is used (default None)

  • atts_only (bool) – If True, download only SciDB array attributes without dimensions (default False)

  • as_dataframe (bool) – If True, return a Pandas DataFrame. If False, return a NumPy array (default True)

  • dataframe_promo (bool) –

    If True, null-able types are promoted as per Pandas promotion scheme If False, object records are used for null-able types (default True)

  • schema – Schema of the SciDB array to use when downloading the array. Schema is not verified. If schema is a Schema instance, it is copied. Otherwise, a :py:class:Schema object is built using :py:func:Schema.fromstring (default None)

>>> DB().iquery('build(<x:int64>[i=0:1; j=0:1], i + j)', fetch=True)
   i  j    x
0  0  0  0.0
1  0  1  1.0
2  1  0  1.0
3  1  1  2.0
>>> DB().iquery("input({sch}, '{fn}', 0, '{fmt}')",
...             fetch=True,
...             upload_data=numpy.arange(3, 6))
   i  x
0  0  3
1  1  4
2  2  5

Attribute, Dimension, and Schema

Classes for accessing SciDB data and schemas.

class scidbpy.schema.Attribute(name, type_name, not_null=False, default=None, compression=None)[source]

Represent SciDB array attribute

Construct an attribute using Attribute constructor:

>>> Attribute('foo', 'int64', not_null=True)
... 
Attribute(name='foo',
          type_name='int64',
          not_null=True,
          default=None,
          compression=None)
>>> Attribute('foo', 'int64', default=100, compression='zlib')
... 
Attribute(name='foo',
          type_name='int64',
          not_null=False,
          default=100,
          compression='zlib')

Construct an attribute from a string:

>>> Attribute.fromstring('foo:int64')
... 
Attribute(name='foo',
          type_name='int64',
          not_null=False,
          default=None,
          compression=None)
>>> Attribute.fromstring(
...     "taz : string NOT null DEFAULT '' compression 'bzlib'")
... 
Attribute(name='taz',
          type_name='string',
          not_null=True,
          default="''",
          compression='bzlib')
class scidbpy.schema.Dimension(name, low_value=None, high_value=None, chunk_overlap=None, chunk_length=None)[source]

Represent SciDB array dimension

Construct a dimension using the Dimension constructor:

>>> Dimension('foo')
... 
Dimension(name='foo',
          low_value=None,
          high_value=None,
          chunk_overlap=None,
          chunk_length=None)
>>> Dimension('foo', -100, '10', '?', '1000')
... 
Dimension(name='foo',
          low_value=-100,
          high_value=10,
          chunk_overlap='?',
          chunk_length=1000)

Construct a dimension from a string:

>>> Dimension.fromstring('foo')
... 
Dimension(name='foo',
          low_value=None,
          high_value=None,
          chunk_overlap=None,
          chunk_length=None)
>>> Dimension.fromstring('foo=-100:*:?:10')
... 
Dimension(name='foo',
          low_value=-100,
          high_value='*',
          chunk_overlap='?',
          chunk_length=10)
class scidbpy.schema.Schema(name=None, atts=(), dims=())[source]

Represent SciDB array schema

Construct a schema using Schema, Attribute, and Dimension constructors:

>>> Schema('foo', (Attribute('x', 'int64'),), (Dimension('i', 0, 10),))
... 
Schema(name='foo',
       atts=(Attribute(name='x',
                       type_name='int64',
                       not_null=False,
                       default=None,
                       compression=None),),
       dims=(Dimension(name='i',
                       low_value=0,
                       high_value=10,
                       chunk_overlap=None,
                       chunk_length=None),))

Construct a schema using Schema constructor and fromstring methods of Attribute and Dimension:

>>> Schema('foo',
...        (Attribute.fromstring('x:int64'),),
...        (Dimension.fromstring('i=0:10'),))
... 
Schema(name='foo',
       atts=(Attribute(name='x',
                       type_name='int64',
                       not_null=False,
                       default=None,
                       compression=None),),
       dims=(Dimension(name='i',
                       low_value=0,
                       high_value=10,
                       chunk_overlap=None,
                       chunk_length=None),))

Construct a schema from a string:

>>> Schema.fromstring(
...     'foo@1<x:int64 not null, y:double>[i=0:*; j=-100:0:0:10]')
... 
Schema(name='foo@1',
       atts=(Attribute(name='x',
                       type_name='int64',
                       not_null=True,
                       default=None,
                       compression=None),
             Attribute(name='y',
                       type_name='double',
                       not_null=False,
                       default=None,
                       compression=None)),
       dims=(Dimension(name='i',
                       low_value=0,
                       high_value='*',
                       chunk_overlap=None,
                       chunk_length=None),
             Dimension(name='j',
                       low_value=-100,
                       high_value=0,
                       chunk_overlap=0,
                       chunk_length=10)))

Print a schema constructed from a string:

>>> print(Schema.fromstring('<x:int64,y:float> [i=0:2:0:1000000; j=0:*]'))
... 
<x:int64,y:float> [i=0:2:0:1000000; j=0:*]

Format Schema object to only print the schema part without the array name:

>>> '{:h}'.format(Schema.fromstring('foo<x:int64>[i]'))
'<x:int64> [i]'
make_dims_atts()[source]

Make attributes from dimensions and pre-append them to the attributes list.

>>> s = Schema(None, (Attribute('x', 'bool'),), (Dimension('i'),))
>>> print(s)
<x:bool> [i]
>>> s.make_dims_atts()
>>> print(s)
<i:int64 NOT NULL,x:bool> [i]
>>> s = Schema.fromstring('<x:bool>[i;j]')
>>> s.make_dims_atts()
>>> print(s)
<i:int64 NOT NULL,j:int64 NOT NULL,x:bool> [i; j]
make_unique()[source]

Make dimension and attribute names unique within the schema. Return True if any dimension or attribute was renamed.

>>> s = Schema(None, (Attribute('i', 'bool'),), (Dimension('i'),))
>>> print(s)
<i:bool> [i]
>>> s.make_unique()
True
>>> print(s)
<i:bool> [i_1]
>>> s = Schema.fromstring('<i:bool, i:int64>[i;i_1;i]')
>>> s.make_unique()
True
>>> print(s)
<i:bool,i_2:int64> [i_3; i_1; i_4]
promote(data)[source]

Promote nullable attributes in the DataFrame to types which support some type of null values as per Pandas promotion scheme