SciDB Operators

What is a SciDB operator?

SciDB operators map zero or more input arguments to a SciDB array. The arguments may be a mix of constant scalar values, scalar-valued functions, and SciDB arrays depending on the operator. All operators produce output arrays. Some operators also produce side-effects by modifying SciDB system state or saving data to files, etc.

SciDB operators can be composed—that is, the output of an operator may provide array input into another operator.


aggregate

aggregate(array, aggregate_expression1 [as label1] [,aggregate_expression2 as label2, ...] [,dimension1, dimension2, ...])

Aggregate values through a SciDB aggregation function, optionally grouped by the specified dimensions. See the index for a list of available aggregation functions.

store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A) aggregate(A, avg(x) as mean, j)

allversions

allversions(array)

Create an array that contains all the versions of an existing array, organized along a new dimension called "VersionNo."

store(build(<x:double>[i=1:10,10,0],i), A) store(build(<x:double>[i=1:10,10,0],2*i), A) store(build(<x:double>[i=1:10,10,0],3*i), A) allversions(A)

analyze

analyze(array[,attribute1, attribute2, ...])

Create an array that characterize the contents of an existing array. Each cell in the result array includes the following attributes:

  • attribute_number: An index for the one-dimensional result array.
  • atttribute_name: The name of an attribute from the source array.
  • min: The lowest value for the attribute in the source array.
  • max: The highest value for the attribute in the source array.
  • distinct_count: An estimate of the number of different values appearing in the source array.
  • non_null_count: The number of cells in the array with non-null values for the attribute.

store(build_sparse(<x:double>[i=1:10,10,0],2*i,i<5), A) analyze(x)

apply

apply(source_array,new_attribute1,function1[,new_attribute2,function2]...)

Create an array with new attributes defined by scalar functions of existing attributes and/or constants.

store(build(<x:double>[i=1:10,10,0],i), A) apply(A, y, sqrt(x) + 1) # To see a list of available scalar functions, run the query: list('functions')

ApproxDC

ApproxDC(array[,attribute[,dimension_1,dimension_2,...])

Create an array that contains an estimate the number of distinct values of an array attribute, optionally grouped along one or more dimensions.

store(build(<x:double>[i=1:10,10,0],i/2), A) ApproxDC(A,x)

attribute_rename

attribute_rename(array,old_attribute1,new_attribute1[, old_attribute2,new_attribute2,...])

Create a duplicate array with renamed attributes.

store(build(<x:double>[i=1:10,10,0],i/2), A) attribute_rename(A,x,z)

attributes

attributes(array)

Create an array describing attributes of an existing named array.

store(apply(build(<x:double>[i=1:10,10,0],i/2),y,5), A) attributes(A)

avg_rank

avg_rank(array[, attribute[, dimension1[, dimension2, ...]]])

Create an array with an attribute that ranks an existing array attribute along one or more dimensions, averaging ties. The rank attribute name is the original attribute name plus "_rank." The original array attribute is also returned in the output array.

store(build(<x:double>[i=1:4,4,0,j=1:4,4,0],10*double(random())/2147483647 + 1),A) avg_rank(A,x,i)

between

between(array, low_coord1[, low_coord2, ...], high_coord1[, high_coord2, ...])

Create a new sparse array of the same shape as the input array with empty cells outside the specified rectangular coordinate range and copies of the input array cells elsewhere. Compare with the subarray operator.

store(build(<x:double>[i=1:10,10,0],i), A) between(A, 3, 7)

build

build(array | schema, expression)

Create a single-attribute array with values defined by the expression. If an array is supplied, it's schema will be used (that is, the first argument is either explicitly or implicitly a schema). The schema dimensions must have explicit bounds indicated.

build(<x:double>[i=1:10,10,0],sqrt(i)) create_array(A,<x:double>[i=1:10,10,0]) build(A,sqrt(i))

cancel

cancel(query id)

Cancel the specified active query id.

cast

cast(array, array | schema)

Duplicate an existing array, changing its schema. The example converts a string dimension to an integer dimension. Cast can also change dimension and attribute names.

create_array(A,<i:int64>[x(string)=10,10,0]) redimension_store(build(<x:string>[i=1:10,10,0],'x'+string(i)), A) cast(A,<i:int64>[x=0:9,10,0])

cross_join

cross_join(array1 [as label1], array2 [as label2] [, dimension1, dimension2[, dimension3, dimension4, ...]])

Create the cross-product array of two arrays with equality predicates applied to pairs of dimensions. The arrays must have int64 dimension types. Use the 'as' keyword to label arrays for reference within the query (see the example).

cross_join(build(<x:double>[i=1:10,5,0,j=0:2,1,0],1) as A, build(<y:double>[i=1:10,5,0],2) as B, A.i, B.i)

dimensions

dimensions(array)

Create an array that describes the dimensions of the specified array.

store(build(<x:double>[i=1:10,5,0,j=0:2,1,0],1), A) dimensions(A)

filter

filter(array, boolean_expression)

Create an array of the same shape as the specified array that copies cells that meet the Boolean expression and marks cells that don't meet the condition EMPTY. The expression may use attributes and/or dimensions.

store(build(<x:double>[i=1:10,5,0,j=0:2,1,0],5), A) filter(A, x + i < 10)

input

input(schema, input_file, instance_id, format [, max_num_errors])

Load data from a file, creating a new array. The schema parameter defines the data schema for the input array. The input_file parameter is a file path resolvable on specified instances. The instance_id parameter controls which instance(s) load the file as follows:

  1. Load all data using the coordinator instance of the query.
  2. Initiate the load from all instances. That is, the load is distributed to all instances, and the data is loaded concurrently.
  3. Load all data using the specified instance ID 0.
  4. Load all data using the specified instance ID 5, etc. ...
The format parameter is a character string specifying the file format--see the full documentation for examples. Set format to an empty character string to indicate default SciDB ASCII format.

The optional max_num_errors parameter specifies the maximum number of tolerated load errors with a default value of zero.

save(build(<x:double>[i=1:3,3,0,j=1:3,3,0],i+j),'/tmp/example.bin', 0, '(double)') input(<v:double>[m=1:3,3,0,n=1:3,3,0],'/tmp/example.bin',-2,'(double)')

insert

insert(input, array)

Create an array that inserts specified input values into an existing named array. This operator has the side effect of also updating the original named array with the inserted values. The arrays must be conformable with int64 dimension types.

store(build_sparse(<x:double>[i=1:10,5,0], sqrt(i), double(i)/2 = i/2),A) insert(build(<x:double>[i=1:10,5,0],i),A)

join

join(array1, array2)

Create an array that joins attributes from non-empty cells of two arrays at matching dimension values. The arrays must have the same dimension starting coordinates, chunk sizes and chunk overlaps.

store(build(<x:double>[i=1:10,5,0], sqrt(i)),A) join(build(<y:double>[i=1:50,5,0], i*i),A)

list

list(element)

Create an array that lists the requested elements in SciDB. 'element' is a string value that may be one of:

  • 'arrays'
  • 'aggregates'
  • 'functions'
  • 'operators'
  • 'types'
  • 'queries'
  • 'instances'
  • 'libraries'
Leaving element unspecified implies list('arrays').
list('arrays')

merge

merge(array1, array2)

Create an array that merges the contents of two arrays. The arrays must have the same number of attributes, attribute types, starting dimension values, chunk sizes and chunk overlaps. Non-empty cells from "array1" appear in the output, as do non-empty cells from "array2" that correspond to empty cells in "array1." Both arrays must have int64 dimension types.

store(build_sparse(<x:double>[i=1:10,5,0], sqrt(i), double(i)/2 = i/2), A) merge(A, build(<x:double>[i=1:10,5,0], i))

normalize

normalize(array)

Create a new array from the one-dimensional input array with a single, numeric-valued attribute, by dividing by the square root of the sum of the squares of the attribute values in the input array.

normalize(build(<val:double>[i=0:10,10,0],cos(i/3)))

project

project(array, attribute[, attribute[,...]])

Create a duplicate of the specified array restricted to the listed attributes.

store(apply(build(<val:double>[i=0:10,10,0],cos(i/3)), y, sin(i/3)), A) project(A, y)

rank

rank(array[, attribute[, dimension1[, dimension2, ...]]])

Create an array that assigns rank order to elements of specified array.

store(build(<x:double>[i=1:10,10,0], random()%7/1.0), A) rank(A, x)

redimension

redimension(source, schema[, aggregate (source_attribute) [as result_attribute]]...)

The redimension operator is a non-storing version of redimension_store described next. It creates an array that omits attributes or dimensions, promotes attributes from "source" as dimensions, or demotes dimensions from the "source" array to attributes, according to the "schema" parameter, optionally applying aggregates as it goes.

The "schema" parameter may only contain int64 dimension types, unlike more general redimension_store operator described next.

The example switches an integer dimension in the build array into an integer attribute in the output array.

redimension(build(<k:int64>[i=1:10,10,0],i-100),<i:int64>[k=-99:*,10,0])

regrid

regrid(array, grid_1[, grid2[, ...]], aggregate1[, aggregate2[, ...]])

Create an array that partitions the cells in the input array into blocks, and for each block, apply an aggregate operation over the values in the block. Regrid may only be applied to int64 integer dimensions. That means each grid argument must correspond to an integer-valued (of type int64) dimension, or be 1 (implying no aggregation over that dimension). The output array dimension values begin at the same starting values as the input array, counting sequentially up to the total number of grid divisions in each dimension.


The example produces a 5x5 array that sums values and computes the maximum value over 3x2 blocks from the input array.
store(build(<x:double>[i=2:10,10,0,j=1:10,10,0],10*double(random())/2147483647 + 1),A) regrid(A, 3, 2, sum(x) as sum, max(x) as max) # Recover original dimension values along the regridded data: apply( regrid(A, 3, 2, sum(x) as sum, max(x) as max), i_original, 2 + (i-2)*3, j_original, 1 + (j-1)*2 )

rename

rename(array, new_name)

Rename an array. The new name is specified without quoting. This operator returns nothing, and is only used for its side effect.

store(build(<x:double>[i=1:10,10,0,j=1:10,10,0],i+j),A) rename(A, A1)

repart

repart(array1, array2 | schema)

Duplicate the contents of array1, applying the schema defined in array2 or schema. The schema must match attribute and dimension names and types from array1, but is free to define chunk size, chunk overlap, and dimension upper bounds. The operator returns the duplicated array and, if an output array is specified as array2, also stores its output there.

store(build(<x:double>[i=1:10,10,0,j=1:10,10,0],i+j),A) repart(A, <x:double>[i=1:10,5,1,j=1:10,2,0])

reshape

reshape(array1, array2 | schema)

Duplicate the contents of array1 into the schema defined by the specified output array2 or schema. The schema must match the attributes of the source array1, and its dimensions must define an array with the same number of cells as array1. The reshape operator does not work on arrays with nonzero chunk overlap.

store(build(<x:double>[i=1:10,10,0,j=1:10,10,1],i+j),A) reshape(A, <x:double>[i=1:100,10,0]) reshape(A, <x:double>[i=1:5,5,0,j=1:5,5,0,k=1:4,4,0])

reverse

reverse(array)

Create a copy of the input array with cells reversed in each array dimension.

store(build(<x:double>[i=1:10,10,0,j=1:10,10,1],i+j),A) reverse(A)

save

save(array, path[, instance[, format]])

Save an array to the indicated path, returning the array to the caller. The instance indicates on which SciDB instance filesystem to save the file (see list('instances')— 0 means the coordinator instance). Format is a string that indicates the saved file format. Here are some possible options (see the SciDB reference guide for more):

Format stringDescription
'lcsv+'Dimension and attribute values separated by commas
'dcsv'Dimension values in braces separated by commans, then attribute values separated by commas
'(type1[, type2[, ...]])'Attribute values in binary save format (see below).

store(build(<x:double>[i=1:3,3,0,j=1:3,3,0],i+j),A) save(A, '/tmp/A.csv', 0, 'lcsv+') save(A, '/tmp/A.bin', 0, '(double)')

Examples of reading the output files produced by the example using GNU command-line utilities follow:

cat /tmp/A.csv # ASCII comma-separated value format od -tf8 /tmp/A.bin # Binary output format

scan

scan(array)

Identity function—return the array.

store(build(<x:double>[i=1:3,3,0,j=1:3,3,0],i+j),A) scan(A)

show

show(array | schema | SciDB query expression[, 'afl'])

Create an array containing the schema of the specified stored array, schema or SciDB query expression. When a SciDB query is specified, use the optional 'afl' argument if the query is in AFL format. The default assumes queries are in AQL format.

show('sort(build(<x:double>[i=1:3,3,0],random()))', 'afl')

slice

slice(array, dimension1, value1[, dimension2, value2[, ...]])


Create a new array that subsets the input array along specified dimension values. The output array has lower dimension than the input array.

store(build(<x:double>[i=1:3,3,0,j=1:3,3,0],i+j),A) slice(A, i, 2)

sort

sort(array, attribute [asc | desc][, attribute [asc | desc][, ...]][, chunk_size])

Create an array containing a sorted representation of the input array by attribute values, optionally specifying ascending or descending order and output chunk size. The output array is one-dimensional with a single signed int64 dimension "n" (don't use sort on an input array attribute named "n" or you will get a name collision!).

If you don't specify an output array chunk size, then the output will have a chunk size equal to the extent of the data or one million, whichever is smaller.

The sorted output array will have zero chunk overlap.

store(build(<x:double>[i=1:10,10,0],10*double(random())/2147483647 + 1),A) store(apply(A, s, 'x' + string(x)), B) sort(B, x, s)

store

store(result, array)

Store the output result of a SciDB operator to the array name specified, returning the output array for use by the caller.

sum(store(build(<x:double>[i=1:10,10,0],i), A)) store(build(<x:double>[i=1:10,10,0],i), A) sum(A)

subarray

subarray(array, low_coord1[, low_coord2[, ...]], high_coord1[, high_coord2[, ...]])

Create an array containing a clipped, rectilinear subset of the input array. The coordinates specified must, in order, indicate the low boundaries in each array dimension, followed by the high boundaries in each array dimension. Use the value null for a low or high coordinate to use the minimum and maximum coordinate values of the array.

Integer dimensions coordinates in the output array are changed to start at zero. (Mapped non-integer coordinates are not changed.) Upper dimension bounds are adjusted to contain the extent of the data. This means that each output coordinate axis of subarray are limited to a 62-bit unsigned integer range.


store(build(<x:double>[i=1:10,10,0,j=1:10,10,0],10*double(random())/2147483647 + 1),A) subarray(A, 2, 2, 5, 5)

substitute

substitute(array1, array2[, attribute1[, attribute2[, ...]]])

Create an array that substitutes null values in the indicated attributes in array1 with non-null values from the single-attribute array2. If no attributes are specified, null values in all attributes will be substituted. The single substitute value will be taken from the attribute value of array2 at dimension coordinate zero. Both array dimension start indices must be zero. The indicated output array attributes will be set to non-nullable.

The example creates an array with two attributes x and y with identical values that are null below coordinate i=5, then substitutes 0 for the null values in the y attribute.

store ( apply ( build(<x:double null>[i=0:9,10,0],iif(i<5,null,i)), y, x ), A ) substitute(A, build(<u:double>[j=0:0,1,0],0), y)

thin

thin(array, start1, step1, start2, step2, ...)

Create an array that selects data from an array at fixed intervals along each dimension. A dimension chunk size must be evenly divisible by its step size and must evenly divide the corresponding array dimension upper bound (in the case of an explicitly bounded dimension).

store(build(<x:double>[i=1:10,10,0,j=1:10,10,0], i+j),A) thin(A, 2, 5, 1, 2)

unpack

unpack(array, dimension_name[, chunk_size])

Create a 1-D array version of any array, converting input array dimension values to attributes in the output array. The output array has a single int64 dimension, named according to the dimension_name argument. (The output array resembles the SciDB "lcsv+" formatted output.)

store(build(<x:double>[i=1:10,10,0,j=1:10,10,0], i+j),A) unpack(A, k)

versions

versions(array)

Create an array that contains version information for the specified array, which must be stored in SciDB. The version information includes a sequential ID and date/time of update.

store(build_sparse(<x:double>[i=1:10,5,0], sqrt(i), double(i)/2 = i/2),A) versions(A) insert(build(<x:double>[i=1:10,5,0],i),A) versions(A)

xgrid

xgrid(array, scale1[, scale2[, ...]])

Create an array that prolongs each dimension of the specified input array by a scale factor. Input array cells are replicated to fill the new array. The number of supplied scale arguments must match the number of dimensions of the array. Each scale argument must be integer-valued (type int32).

store(build_sparse(<x:double>[i=1:4,5,0], i, double(i)/2 = i/2),A) xgrid(A, 2)

The xgrid operator in some cases can be the inverse of the regrid operator:

store(build_sparse(<x:double>[i=1:4,4,0,j=1:4,4,0], i + j, double(j)/2 = j/2 or double(i)/2 = i/2),A) xgrid(A, 2, 2) regrid(xgrid(A,2,2), 2, 2, max(x) as x)





SciDB Aggregation functions

SciDB aggregates are functions of several values that produce a scalar value. They are used together with the aggregate, redimension_store, and redimension operators to compute data aggregations, optionally grouped along specified dimensions.


approxdc

approxdc(attribute values)

Approximate distinct count aggregate. The output type will be unsigned int64, and will be set nullable with a default value of null.

store(build_sparse(<x:double>[i=1:10,5,0], sqrt(i), double(i)/2 = i/2),A) aggregate(A,approxdc(x) as distinct)

avg

avg(attribute values)

Average values of aggregated attribute. The input types must be integer, float or real. The output type will be double and will be set nullable with a default value of null.

store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A) aggregate(A, avg(x) as mean, j)

count

count(attribute values)

Aggregated non-empty count of cells. The output type will be unsigned int64 and set nullable with a default value of null.

store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A) aggregate(A, count(x) as count, j)

max

max(attribute values)

Maximum of aggregated attribute values. The attribute type must be able to be ordered. The output type is the same as the input type, but will be nullable with a default value of null.

store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A) aggregate(A, max(x) as max, j)

min

min(attribute values)

Minium of aggregated attribute values. The attribute type must be able to be ordered. The output type is the same as the input type, but will be nullable with a default value of null.

store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A) aggregate(A, min(x) as min, j)

stdev

stdev(attribute values)

Standard deviation of the aggregated attribute values. The attribute type must be integer, float or double. The output type will be double and will be set nullable with a default value of null.

store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A) aggregate(A, stdev(x) as stdev, j)

sum

sum(attribute values)

Sum the aggregated attribute values. The input attribute type must be integer, float or double. The output type will be double for double-valued inputs, float for float-valued inputs, and int64 for integer inputs, and will be set nullable with a default value of null.

store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A) aggregate(A, sum(x) as sum, j)

var

var(attribute values)

Variance of the aggregated attribute values. The attribute type must be integer, float or double. The output type will be double and will be set nullable with a default value of null.

store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A) aggregate(A, var(x) as var, j)





Other SciDB AFL commands


cancel

cancel(query_id)

Cancel a query. Nothing is returned.

load_library

load_library('plugin_name')

Load a SciDB plugin.

load_library('dense_linear_algebra')

remove

remove(array)

Remove an array.

store(build(<x:double>[i=1:10,10,0,j=1:10,10,0],i+j),A) remove(A)