API

DB, Array, and Operator

Classes for connecting to SciDB and executing queries.

class scidbpy.db.Array(db, name, gc=False)[source]

Access to individual array

head(n=5, **kwargs)[source]

Similar to pandas.DataFrame.head. Makes use of the limit operator, if available.

class scidbpy.db.ArrayExp(exp)[source]

Access to individual attribute or dimension

class scidbpy.db.Arrays(db)[source]

Access to arrays available in SciDB

class scidbpy.db.DB(scidb_url=None, scidb_auth=None, http_auth=None, verify=None, admin=False, namespace=None, use_arrow=False, page_size=1000000, result_size_limit=256, no_ops=False, inactivity_timeout=None, progress_check_secs=None, progress_check_callback=None, reauth_callback=None, reauth_tries=None, backoff_fn=None)[source]

SciDB connection object.

>>> DB()
... 
DB('https://...',
   ...,
   None,
   ...,
   False,
   None,
   False,
   256,
   False)
>>> print(DB())
scidb_url         = https://...
scidb_auth        = ...
http_auth         = None
verify            = ...
admin             = False
namespace         = None
use_arrow         = False
result_size_limit = 256
no_ops            = False

Constructor parameters:

Parameters:
  • scidb_url (string) – SciDB connection URL. The URL for the Shim server or for the native client API. If None, use the value of the SCIDB_URL environment variable, if present (default http://localhost:8080)

  • scidb_auth (tuple) –

    Credentials for connecting to scidb, if scidb is configured to use password authentication. Either a (username, password) tuple, or the path to an authentication file which can be in INI or JSON format with the structure: {"user-name": "name", "user-password": "password"}.

    If not provided, credentials are read from a file in the first location that exists among:

    • $SCIDB_AUTH_FILE

    • $XDG_CONFIG_DIR/scidb/iquery.auth

    • ~/.config/scidb/iquery.auth

  • http_auth (tuple) – Tuple with username and password for connecting to Shim, if Shim authentication is used (default None)

  • verify (bool) –

    Either a bool, or a path to a cert file.

    • If True, the HTTPS certificate is verified against the system’s trusted CA store.

    • If False, the HTTPS certificate is not verified. This will generate a warning.

    • If a string, the string must be a path to a cert or ca-cert file. The connection’s HTTPS certificate is verified against that file.

    • If omitted or None, defaults to the setting in the SCIDBPY_VERIFY_HTTPS environment variable if present, otherwise defaults to True.

    See Python requests library SSL Cert Verification section for details on the verify argument (default None)

  • admin (bool) – Set to True to open a higher-priority session. This is identical with the --admin flag for the iquery SciDB client, see SciDB Documentation for details (default False)

  • namespace (string) – Initial namespace for the connection. Only applicable for SciDB Enterprise Edition. The namespace can changed at any time using the set_namespace SciDB operator (default None)

  • use_arrow (bool) – If True, download SciDB array using Apache Arrow library. Requires accelerated_io_tools and aio enabled in Shim. If True, a Pandas DataFrame is returned (as_dataframe has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo has no effect). It can be overridden for each iquery call (default False)

  • page_size (int) – Maximum number of cells per page of output when executing paged queries, that is, non-upload queries that save their output. Client API only; ignored for Shim. (default 1,000,000)

  • result_size_limit (int) – absolute limit of the output file in Megabytes. Effective only when the accelerated_io_tools plug-in is installed in SciDB and aio is enabled in Shim (default 256 MB)

  • inactivity_timeout (int) – Seconds until SciDB server will cancel a paged query, unless the client requests another page. Should only need to be increased for multithreaded apps where a thread holding the GIL may interfere with paged response processing. Client API only; ignored for Shim. (default 60s)

  • no_ops (bool) – If True, the list of operators is not fetched at this time and the connection is not implicitly verified. This expedites the execution of the function but disallows for calling the SciDB operators directly from the DB instance e.g., db.scan (default False)

  • progress_check_secs (int) – Client API only. After every interval of this many seconds, interrupt and resume the HTTP request so the client can check on the query’s progress, and so the server can confirm that this client is still active. The progress_check_callback is called, if provided. Setting this to a lower value provides more frequent progress updates, but might have the effect of slowing the query down because it gets interrupted and resumed more often. (This “progress check” is only enabled if the query is resumable.) (Default: 60 seconds)

  • progress_check_callback (func) –

    Client API only. A callback function of the form: func(query_info, page_number, response_number, cumulative_nbytes, ...) which returns None. This is called once when the page is first requested, and again after every progress_check_secs seconds while the query is executing. If the callback raises an exception, the query gets canceled.

    The callback arguments include:
    • query_info: an object with attributes that include id (the

      ID of the query) and schema (the SciDB schema returned by the query)

    • page_number: the page number being fetched (1 for the first page)

    • response_number: increments by 1 each time through the callback.

    • cumulative_nbytes: the cumulative total number of content bytes

      received for this page, including the current response and all previous responses for the page.

    The callback MUST have *args and **kwargs arguments to allow for future changes to the callback interface. It must ignore any arguments it doesn’t understand. (This callback is only enabled if the query is resumable.)

  • reauth_callback

    An optional function matching the signature:

    Callable[[requests.HTTPError], Optional[Tuple[str, str]]]
    

    or:

    Callable[[Exception], Optional[Tuple[str, str]]]
    

    The return value should be a (username, password) tuple or None.

    This function is called whenever a request fails with a 401 “Unauthorized” or 403 “Forbidden” response. The function can provide stored credentials, display a login prompt, trigger 2FA, etc. If the function returns None, the reauthentication is canceled and the error response is returned as usual.

    The function can inspect the HTTPError argument for details in order to generate a message to show the user (e.g. “session on <host> expired”, “second factor required to log in to <domain>”, “incorrect password”, etc.).

  • reauth_tries

    The maximum number of times to call reauth_callback() if a request returns a 401/403 response response and subsequent reauth attempts also return 401/403.

    • Set this to 0 to disable reauthentication.

    • Set this to 1 (default) if the callback returns stored credentials (i.e. if repeating the callback won’t change anything).

    • Set this to >1 if the callback prompts the user for credentials and you want to let the user try again after making a mistake.

  • backoff_fn

    A callback function that waits some number of seconds. It should have signature like:

    def backoff_fn(err: requests.HTTPError, delay: int)
    

    If we receive a 429 “Too Many Requests” response from the server, this lets the application perform other tasks while waiting for the server to become available.

    The HTTPError argument gives the app information it can use to display a message to the user, e.g. “the server is busy processing other queries”.

    Note that backoff_fn may choose to return earlier, or later, than delay. To stop waiting, it can raise the HTTPError it received as its first argument.

iquery(query, fetch=False, use_arrow=None, use_arrow_stream=False, atts_only=False, as_dataframe=True, dataframe_promo=True, schema=None, page_size=1000000, upload_data=None, upload_schema=None, **kwargs)[source]

Execute query in SciDB

Parameters:
  • query (string) – SciDB AFL query to execute

  • fetch (bool) – If True, download SciDB array (default False)

  • use_arrow (bool) –

    If True, download SciDB array using Apache Arrow library. Requires accelerated_io_tools and aio enabled in Shim. If True, a Pandas DataFrame is returned (as_dataframe has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo has no effect). If None the use_arrow value set at connection time is used (default None)

  • use_arrow_stream (bool) – If True, return a RecordBatchStreamReader object to the user. The user will extract the records from the stream reader. This parameter only had effect if use_arrow is set to True (default False)

  • atts_only (bool) – If True, download only SciDB array attributes without dimensions (default False)

  • as_dataframe (bool) – If True, return a Pandas DataFrame. If False, return a NumPy array (default True)

  • dataframe_promo (bool) –

    If True, null-able types are promoted as per Pandas promotion scheme If False, object records are used for null-able types (default True)

  • schema – Schema of the SciDB array to use when downloading the array. Schema is not verified. If schema is a Schema instance, it is copied. Otherwise, a :py:class:Schema object is built using :py:func:Schema.fromstring (default None)

  • page_size (int) – Maximum number of cells per page of output when executing paged queries, that is, non-upload queries that save their output. Client API only; ignored for Shim. (default 1,000,000)

>>> DB().iquery('build(<x:int64>[i=0:1; j=0:1], i + j)', fetch=True)
   i  j    x
0  0  0  0.0
1  0  1  1.0
2  1  0  1.0
3  1  1  2.0
>>> DB().iquery("input({sch}, '{fn}', 0, '{fmt}')",
...             fetch=True,
...             upload_data=numpy.arange(3, 6))
   i  x
0  0  3
1  1  4
2  2  5
iquery_readlines(query, page_size=1000000, **kwargs)[source]

Execute query in SciDB

>>> DB().iquery_readlines('build(<x:int64>[i=0:2], i * i)')
... 
[...'0', ...'1', ...'4']
>>> DB().iquery_readlines(
...   'apply(build(<x:int64>[i=0:2], i), y, i + 10)')
... 
[[...'0', ...'10'], [...'1', ...'11'], [...'2', ...'12']]
load_ops()[source]

Get list of operators and macros.

next_array_name()[source]

Generate a unique array name. Keep track on these names using the _uid field and a counter

uses_shim()[source]

Return True if this connection goes through Shim. False means the connection uses the SciDB Client HTTP API.

class scidbpy.db.Operator(db, name, upload_data=None, upload_schema=None, *args)[source]

Store SciDB operator and arguments. Hungry operators (e.g., remove, store, etc.) evaluate immediately. Lazy operators evaluate on data fetch.

scidbpy.db.connect

alias of DB

scidbpy.db.iquery(self, query, fetch=False, use_arrow=None, use_arrow_stream=False, atts_only=False, as_dataframe=True, dataframe_promo=True, schema=None, page_size=1000000, upload_data=None, upload_schema=None, **kwargs)

Execute query in SciDB

Parameters:
  • query (string) – SciDB AFL query to execute

  • fetch (bool) – If True, download SciDB array (default False)

  • use_arrow (bool) –

    If True, download SciDB array using Apache Arrow library. Requires accelerated_io_tools and aio enabled in Shim. If True, a Pandas DataFrame is returned (as_dataframe has no effect) and null-able types are promoted as per Pandas promotion scheme (dataframe_promo has no effect). If None the use_arrow value set at connection time is used (default None)

  • use_arrow_stream (bool) – If True, return a RecordBatchStreamReader object to the user. The user will extract the records from the stream reader. This parameter only had effect if use_arrow is set to True (default False)

  • atts_only (bool) – If True, download only SciDB array attributes without dimensions (default False)

  • as_dataframe (bool) – If True, return a Pandas DataFrame. If False, return a NumPy array (default True)

  • dataframe_promo (bool) –

    If True, null-able types are promoted as per Pandas promotion scheme If False, object records are used for null-able types (default True)

  • schema – Schema of the SciDB array to use when downloading the array. Schema is not verified. If schema is a Schema instance, it is copied. Otherwise, a :py:class:Schema object is built using :py:func:Schema.fromstring (default None)

  • page_size (int) – Maximum number of cells per page of output when executing paged queries, that is, non-upload queries that save their output. Client API only; ignored for Shim. (default 1,000,000)

>>> DB().iquery('build(<x:int64>[i=0:1; j=0:1], i + j)', fetch=True)
   i  j    x
0  0  0  0.0
1  0  1  1.0
2  1  0  1.0
3  1  1  2.0
>>> DB().iquery("input({sch}, '{fn}', 0, '{fmt}')",
...             fetch=True,
...             upload_data=numpy.arange(3, 6))
   i  x
0  0  3
1  1  4
2  2  5

Attribute, Dimension, and Schema

Classes for accessing SciDB data and schemas.

class scidbpy.schema.Attribute(name, type_name, not_null=False, default=None, compression=None)[source]

Represent SciDB array attribute

Construct an attribute using Attribute constructor:

>>> Attribute('foo', 'int64', not_null=True)
... 
Attribute(name='foo',
          type_name='int64',
          not_null=True,
          default=None,
          compression=None)
>>> Attribute('foo', 'int64', default=100, compression='zlib')
... 
Attribute(name='foo',
          type_name='int64',
          not_null=False,
          default=100,
          compression='zlib')

Construct an attribute from a string:

>>> Attribute.fromstring('foo:int64')
... 
Attribute(name='foo',
          type_name='int64',
          not_null=False,
          default=None,
          compression=None)
>>> Attribute.fromstring(
...     "taz : string NOT null DEFAULT '' compression 'bzlib'")
... 
Attribute(name='taz',
          type_name='string',
          not_null=True,
          default="''",
          compression='bzlib')
class scidbpy.schema.Dimension(name, low_value=None, high_value=None, chunk_overlap=None, chunk_length=None)[source]

Represent SciDB array dimension

Construct a dimension using the Dimension constructor:

>>> Dimension('foo')
... 
Dimension(name='foo',
          low_value=None,
          high_value=None,
          chunk_overlap=None,
          chunk_length=None)
>>> Dimension('foo', -100, '10', '?', '1000')
... 
Dimension(name='foo',
          low_value=-100,
          high_value=10,
          chunk_overlap='?',
          chunk_length=1000)

Construct a dimension from a string:

>>> Dimension.fromstring('foo')
... 
Dimension(name='foo',
          low_value=None,
          high_value=None,
          chunk_overlap=None,
          chunk_length=None)
>>> Dimension.fromstring('foo=-100:*:?:10')
... 
Dimension(name='foo',
          low_value=-100,
          high_value='*',
          chunk_overlap='?',
          chunk_length=10)
class scidbpy.schema.Schema(name=None, atts=(), dims=())[source]

Represent SciDB array schema

Construct a schema using Schema, Attribute, and Dimension constructors:

>>> Schema('foo', (Attribute('x', 'int64'),), (Dimension('i', 0, 10),))
... 
Schema(name='foo',
       atts=(Attribute(name='x',
                       type_name='int64',
                       not_null=False,
                       default=None,
                       compression=None),),
       dims=(Dimension(name='i',
                       low_value=0,
                       high_value=10,
                       chunk_overlap=None,
                       chunk_length=None),))

Construct a schema using Schema constructor and fromstring methods of Attribute and Dimension:

>>> Schema('foo',
...        (Attribute.fromstring('x:int64'),),
...        (Dimension.fromstring('i=0:10'),))
... 
Schema(name='foo',
       atts=(Attribute(name='x',
                       type_name='int64',
                       not_null=False,
                       default=None,
                       compression=None),),
       dims=(Dimension(name='i',
                       low_value=0,
                       high_value=10,
                       chunk_overlap=None,
                       chunk_length=None),))

Construct a schema from a string:

>>> Schema.fromstring(
...     'foo@1<x:int64 not null, y:double>[i=0:*; j=-100:0:0:10]')
... 
Schema(name='foo@1',
       atts=(Attribute(name='x',
                       type_name='int64',
                       not_null=True,
                       default=None,
                       compression=None),
             Attribute(name='y',
                       type_name='double',
                       not_null=False,
                       default=None,
                       compression=None)),
       dims=(Dimension(name='i',
                       low_value=0,
                       high_value='*',
                       chunk_overlap=None,
                       chunk_length=None),
             Dimension(name='j',
                       low_value=-100,
                       high_value=0,
                       chunk_overlap=0,
                       chunk_length=10)))

Print a schema constructed from a string:

>>> print(Schema.fromstring('<x:int64,y:float> [i=0:2:0:1000000; j=0:*]'))
... 
<x:int64,y:float> [i=0:2:0:1000000; j=0:*]

Format Schema object to only print the schema part without the array name:

>>> '{:h}'.format(Schema.fromstring('foo<x:int64>[i]'))
'<x:int64> [i]'
make_dims_atts()[source]

Make attributes from dimensions and pre-append them to the attributes list.

>>> s = Schema(None, (Attribute('x', 'bool'),), (Dimension('i'),))
>>> print(s)
<x:bool> [i]
>>> s.make_dims_atts()
>>> print(s)
<i:int64 NOT NULL,x:bool> [i]
>>> s = Schema.fromstring('<x:bool>[i;j]')
>>> s.make_dims_atts()
>>> print(s)
<i:int64 NOT NULL,j:int64 NOT NULL,x:bool> [i; j]
make_unique()[source]

Make dimension and attribute names unique within the schema. Return True if any dimension or attribute was renamed.

>>> s = Schema(None, (Attribute('i', 'bool'),), (Dimension('i'),))
>>> print(s)
<i:bool> [i]
>>> s.make_unique()
True
>>> print(s)
<i:bool> [i_1]
>>> s = Schema.fromstring('<i:bool, i:int64>[i;i_1;i]')
>>> s.make_unique()
True
>>> print(s)
<i:bool,i_2:int64> [i_3; i_1; i_4]
promote(data)[source]

Promote nullable attributes in the DataFrame to types which support some type of null values as per Pandas promotion scheme