SciDB Operators
What is a SciDB operator?
SciDB operators map zero or more input
arguments to a SciDB array. The arguments may be a mix of constant
scalar values, scalar-valued functions,
and SciDB arrays depending on the operator. All
operators produce output arrays. Some operators also produce side-effects by
modifying SciDB system state or saving data to files, etc.
SciDB operators can be composed—that is, the output of an operator
may provide array input into another operator.
aggregate
aggregate(array, aggregate_expression1 [as label1] [,aggregate_expression2 as label2, ...] [,dimension1, dimension2, ...])
Aggregate values through a SciDB aggregation function, optionally grouped
by the specified dimensions. See the index for a list of available aggregation
functions.
store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A)
aggregate(A, avg(x) as mean, j)
allversions
allversions(array)
Create an array that contains all the versions of an existing array,
organized along a new dimension called "VersionNo."
store(build(<x:double>[i=1:10,10,0],i), A)
store(build(<x:double>[i=1:10,10,0],2*i), A)
store(build(<x:double>[i=1:10,10,0],3*i), A)
allversions(A)
apply
apply(source_array,new_attribute1,function1[,new_attribute2,function2]...)
Create an array with new attributes defined by scalar functions of existing attributes and/or constants.
store(build(<x:double>[i=1:10,10,0],i), A)
apply(A, y, sqrt(x) + 1)
# To see a list of available scalar functions, run the query:
list('functions')
ApproxDC
ApproxDC(array[,attribute[,dimension_1,dimension_2,...])
Create an array that contains an estimate the number of distinct values of an array attribute,
optionally grouped along one or more dimensions.
store(build(<x:double>[i=1:10,10,0],i/2), A)
ApproxDC(A,x)
attribute_rename
attribute_rename(array,old_attribute1,new_attribute1[, old_attribute2,new_attribute2,...])
Create a duplicate array with renamed attributes.
store(build(<x:double>[i=1:10,10,0],i/2), A)
attribute_rename(A,x,z)
attributes
attributes(array)
Create an array describing attributes of an existing named array.
store(apply(build(<x:double>[i=1:10,10,0],i/2),y,5), A)
attributes(A)
avg_rank
avg_rank(array[, attribute[, dimension1[, dimension2, ...]]])
Create an array with an attribute that ranks an existing array attribute along one or more dimensions, averaging ties. The rank attribute name is the original attribute name plus "_rank." The original array attribute is also returned in the output array.
avg_rank(
build(<x:double>[i=1:4,4,0,j=1:4,4,0],10*double(random())/2147483647 + 1),
x, i)
between
between(array, low_coord1[, low_coord2, ...], high_coord1[, high_coord2, ...])
Create a new sparse array
of the same shape as the input array with
empty cells outside the specified rectangular coordinate range and copies of
the input array cells elsewhere.
Compare with the
subarray operator.
store(build(<x:double>[i=1:10,10,0],i), A)
between(A, 3, 7)
build
build(array | schema, expression)
Create a single-attribute array with values defined by the expression. If an array is supplied,
it's schema will be used (that is, the first argument is either explicitly or implicitly a schema).
The schema dimensions must have explicit bounds indicated.
build(<x:double>[i=1:10,10,0],sqrt(i))
create_array(A,<x:double>[i=1:10,10,0])
build(A,sqrt(i))
cancel
cancel(query id)
Cancel the specified active query id.
cast
cast(array, array | schema)
Duplicate an existing array, changing its schema. The example converts
a string dimension to an integer dimension. Cast can also change dimension
and attribute names.
create_array(A,<i:int64>[x(string)=10,10,0])
redimension_store(build(<x:string>[i=1:10,10,0],'x'+string(i)), A)
cast(A,<i:int64>[x=0:9,10,0])
cross_join
cross_join(array1 [as label1], array2 [as label2] [, dimension1, dimension2[, dimension3, dimension4, ...]])
Create the cross-product array of two arrays with equality predicates applied to
pairs of dimensions. The arrays must have int64 dimension types. Use the
'as' keyword to label arrays for reference within the query (see the example).
cross_join(build(<x:double>[i=1:10,5,0,j=0:2,1,0],1) as A,
build(<y:double>[i=1:10,5,0],2) as B, A.i, B.i)
dimensions
dimensions(array)
Create an array that describes the dimensions of the specified array.
store(build(<x:double>[i=1:10,5,0,j=0:2,1,0],1), A)
dimensions(A)
filter
filter(array, boolean_expression)
Create an array of the same shape as the specified array that copies cells that
meet the Boolean expression and marks cells that don't meet the condition EMPTY. The
expression may use attributes and/or dimensions.
store(build(<x:double>[i=1:10,5,0,j=0:2,1,0],5), A)
filter(A, x + i < 10)
input
input(schema, input_file, instance_id, format [, max_num_errors])
Load data from a file, creating a new array.
The schema parameter defines the data schema for the input array.
The input_file parameter is a file path resolvable on specified instances.
The instance_id parameter controls which instance(s) load the file as follows:
- Load all data using the coordinator instance of the query.
- Initiate the load from all instances. That is, the load is distributed to all instances, and the data is loaded concurrently.
- Load all data using the specified instance ID 0.
- Load all data using the specified instance ID 5, etc. ...
The
format parameter is a character string specifying the file format--see the full documentation for examples. Set
format to an empty character string to indicate default SciDB ASCII format.
The optional
max_num_errors parameter specifies the maximum number of tolerated load errors with a default value of zero.
save(build(<x:double>[i=1:3,3,0,j=1:3,3,0],i+j),'/tmp/example.bin', 0, '(double)')
input(<v:double>[m=1:3,3,0,n=1:3,3,0],'/tmp/example.bin',-2,'(double)')
insert
insert(input, array)
Insert specified input values into an existing named array
(the output is a new version number of the named array input).
This operator has the side effect of also updating the original named array with
the inserted values. The arrays must have identical schema.
Insert is essentially a fast version of
store(merge(X,Y),Y).
store(build_sparse(<x:double>[i=1:10,5,0], sqrt(i), double(i)/2 = i/2),A)
insert(build(<x:double>[i=1:10,5,0],i),A)
join
join(array1, array2)
Create an array that joins attributes from non-empty cells of two arrays at matching dimension values.
The arrays must have the same dimension starting coordinates, chunk sizes and chunk overlaps.
store(build(<x:double>[i=1:10,5,0], sqrt(i)),A)
join(build(<y:double>[i=1:50,5,0], i*i),A)
list
list(element)
Create an array that lists the requested elements in SciDB. 'element' is a string value that may be one of:
- 'arrays'
- 'aggregates'
- 'functions'
- 'operators'
- 'types'
- 'queries'
- 'instances'
- 'libraries'
Leaving element unspecified implies list('arrays').
list('arrays')
merge
merge(array1, array2)
Create an array that merges the contents of two arrays. The arrays must
have the same number of attributes, attribute types, starting dimension values, chunk sizes
and chunk overlaps. Non-empty cells from "array1" appear in the output, as do
non-empty cells from "array2" that correspond to empty cells in "array1." Both arrays
must have int64 dimension types.
store(build_sparse(<x:double>[i=1:10,5,0], sqrt(i), double(i)/2 = i/2), A)
merge(A, build(<x:double>[i=1:10,5,0], i))
project
project(array, attribute[, attribute[,...]])
Create a duplicate of the specified array restricted to
the listed attributes.
store(apply(build(<val:double>[i=0:10,10,0],cos(i/3)), y, sin(i/3)), A)
project(A, y)
rank
rank(array[, attribute[, dimension1[, dimension2, ...]]])
Create an array that assigns rank order to elements of specified attribute.
The rank attribute name is the original attribute name plus "_rank." The original array attribute is also returned in the output array.
rank(
build([i=1:10,10,0], random()%7/1.0),
x)
redimension
redimension(source, schema[, aggregate (source_attribute) [as result_attribute]]...)
The
redimension operator is a non-storing version of
redimension_store described next. It creates an array that omits
attributes or dimensions, promotes attributes from "source" as dimensions, or
demotes dimensions from the "source" array to attributes, according to the
"schema" parameter, optionally applying aggregates as it goes.
The "schema" parameter may only contain int64 dimension types, unlike
more general
redimension_store operator described next.
The example switches an integer dimension in the build array into
an integer attribute in the output array.
redimension(build(<k:int64>[i=1:10,10,0],i-100),<i:int64>[k=-99:*,10,0])
regrid
regrid(array, grid_1[, grid2[, ...]], aggregate1[, aggregate2[, ...]])
Create an array that partitions the cells in the input array into blocks, and
for each block, apply an aggregate operation over the values in the block.
Regrid may only be applied to int64 integer dimensions. That means each grid
argument must correspond to an integer-valued (of type int64) dimension, or be
1 (implying no aggregation over that dimension). The output array dimension
values begin at the same starting values as the input array, counting
sequentially up to the total number of grid divisions in each dimension.
The example produces a 5x5 array that sums values and computes the maximum value over 3x2 blocks from the input array.
store(build(<x:double>[i=2:10,10,0,j=1:10,10,0],10*double(random())/2147483647 + 1),A)
regrid(A, 3, 2, sum(x) as sum, max(x) as max)
# Recover original dimension values along the regridded data:
apply(
regrid(A, 3, 2, sum(x) as sum, max(x) as max),
i_original, 2 + (i-2)*3,
j_original, 1 + (j-1)*2
)
rename
rename(array, new_name)
Rename an array. The new name is specified without quoting. This
operator returns nothing, and is only used for its side effect.
store(build(<x:double>[i=1:10,10,0,j=1:10,10,0],i+j),A)
rename(A, A1)
repart
repart(array1, array2 | schema)
Duplicate the contents of array1, applying the schema
defined in array2 or schema. The schema must match attribute
and dimension names and types from array1, but is free to define
chunk size, chunk overlap, and dimension upper bounds.
The operator returns the duplicated
array and, if an output array is specified as array2, also stores
its output there.
store(build(<x:double>[i=1:10,10,0,j=1:10,10,0],i+j),A)
repart(A, <x:double>[i=1:10,5,1,j=1:10,2,0])
reshape
reshape(array1, array2 | schema)
Duplicate the contents of array1 into the schema defined by
the specified output array2 or schema. The schema must match
the attributes of the source array1, and its dimensions must
define an array with the same number of cells as array1.
The reshape operator does not work on arrays with nonzero
chunk overlap.
store(build(<x:double>[i=1:10,10,0,j=1:10,10,1],i+j),A)
reshape(A, <x:double>[i=1:100,10,0])
reshape(A, <x:double>[i=1:5,5,0,j=1:5,5,0,k=1:4,4,0])
save
save(array, path[, instance[, format]])
Save an array to the indicated path, returning the array to the caller.
The instance indicates on which SciDB instance filesystem to save the
file (see
list('instances')— 0 means the coordinator instance).
Format is a string that indicates the saved file format. Here are some
possible options (see the SciDB reference guide for more):
Format string | Description |
'lcsv+' | Dimension and attribute values separated by commas |
'dcsv' | Dimension values in braces separated by commans, then attribute values separated by commas |
'(type1[, type2[, ...]])' | Attribute values in binary save format (see below). |
store(build(<x:double>[i=1:3,3,0,j=1:3,3,0],i+j),A)
save(A, '/tmp/A.csv', 0, 'lcsv+')
save(A, '/tmp/A.bin', 0, '(double)')
Examples of reading the output files produced by the example using
GNU command-line utilities follow:
cat /tmp/A.csv # ASCII comma-separated value format
od -tf8 /tmp/A.bin # Binary output format
scan
scan(array)
Identity function—return the array.
store(build(<x:double>[i=1:3,3,0,j=1:3,3,0],i+j),A)
scan(A)
show
show(array | schema | SciDB query expression[, 'afl'])
Create an array containing the schema of the specified stored
array, schema or SciDB query expression. When a SciDB query is
specified, use the optional 'afl' argument if the query is in
AFL format. The default assumes queries are in AQL format.
show('sort(build(<x:double>[i=1:3,3,0],random()))', 'afl')
slice
slice(array, dimension1, value1[, dimension2, value2[, ...]])
Create a new array that subsets the input array
along specified dimension values. The output array has lower
dimension than the input array.
store(build(<x:double>[i=1:3,3,0,j=1:3,3,0],i+j),A)
slice(A, i, 2)
sort
sort(array, attribute [asc | desc][, attribute [asc | desc][, ...]][, chunk_size])
Create an array containing a sorted representation of the input array by
attribute values, optionally specifying ascending or descending order and
output chunk size. The output array is one-dimensional with a single signed
int64 dimension "n" (don't use sort on an input array attribute named "n"
or you will get a name collision!).
If you don't specify an output array chunk size, then the output will have
a chunk size equal to the extent of the data or one million, whichever is
smaller.
The sorted output array will have zero chunk overlap.
store(build(<x:double>[i=1:10,10,0],10*double(random())/2147483647 + 1),A)
store(apply(A, s, 'x' + string(x)), B)
sort(B, x, s)
store
store(result, array)
Store the output result of a SciDB operator to the array name
specified, returning the output array for use by the caller.
sum(store(build(<x:double>[i=1:10,10,0],i), A))
store(build(<x:double>[i=1:10,10,0],i), A)
sum(A)
subarray
subarray(array, low_coord1[, low_coord2[, ...]], high_coord1[, high_coord2[, ...]])
Create an array containing a clipped, rectilinear subset of the
input array. The coordinates specified must, in order, indicate the
low boundaries in each array dimension, followed by the high boundaries
in each array dimension. Use the value null for a low or high coordinate
to use the minimum and maximum coordinate values of the array.
Integer dimensions coordinates in the output array are changed to start
at zero. (Mapped non-integer coordinates are not changed.) Upper dimension
bounds are adjusted to contain the extent of the data. This means that
each output coordinate axis of subarray are limited to a 62-bit unsigned
integer range.
store(build(<x:double>[i=1:10,10,0,j=1:10,10,0],10*double(random())/2147483647 + 1),A)
subarray(A, 2, 2, 5, 5)
substitute
substitute(array1, array2[, attribute1[, attribute2[, ...]]])
Create an array that substitutes null values in the indicated attributes in array1 with non-null values from the single-attribute array2.
If no attributes are specified, null values in all attributes will be
substituted.
The single substitute value will be taken from the attribute value of array2 at dimension coordinate zero.
Both array dimension start indices must be zero.
The indicated output array attributes will be set to non-nullable.
The example creates an array with two attributes x and y with identical values
that are null below coordinate i=5, then substitutes 0 for the null values in
the y attribute.
store
(
apply
(
build(<x:double null>[i=0:9,10,0],iif(i<5,null,i)),
y, x
),
A
)
substitute(A, build(<u:double>[j=0:0,1,0],0), y)
unpack
unpack(array, dimension_name[, chunk_size])
Create a 1-D array version of any array, converting input array dimension values to attributes in the output array. The output array has a single int64 dimension, named according to the dimension_name argument. (The output array resembles
the SciDB "lcsv+" formatted output.)
store(build(<x:double>[i=1:10,10,0,j=1:10,10,0], i+j),A)
unpack(A, k)
versions
versions(array)
Create an array that contains version information for the specified array,
which must be stored in SciDB. The version information includes a sequential
ID and date/time of update.
store(build_sparse(<x:double>[i=1:10,5,0], sqrt(i), double(i)/2 = i/2),A)
versions(A)
insert(build(<x:double>[i=1:10,5,0],i),A)
versions(A)
xgrid
xgrid(array, scale1[, scale2[, ...]])
Create an array that prolongs each dimension of the specified input array
by a scale factor. Input array cells are replicated to fill the new
array. The number of supplied scale arguments must match the number of
dimensions of the array. Each scale argument must be integer-valued
(type int32).
store(build_sparse(<x:double>[i=1:4,5,0], i, double(i)/2 = i/2),A)
xgrid(A, 2)
The xgrid operator in some cases can be the inverse of the regrid operator:
store(build_sparse(<x:double>[i=1:4,4,0,j=1:4,4,0], i + j, double(j)/2 = j/2 or double(i)/2 = i/2),A)
xgrid(A, 2, 2)
regrid(xgrid(A,2,2), 2, 2, max(x) as x)
SciDB Aggregation functions
SciDB aggregates are functions of several values that produce a
scalar value. They are used together with the aggregate,
redimension_store, and
redimension operators to compute data aggregations, optionally
grouped along specified dimensions.
approxdc
approxdc(attribute values)
Approximate distinct count aggregate. The output type will be
unsigned int64, and will be set nullable with a default value of null.
store(build_sparse(<x:double>[i=1:10,5,0], sqrt(i), double(i)/2 = i/2),A)
aggregate(A,approxdc(x) as distinct)
avg
avg(attribute values)
Average values of aggregated attribute. The input types must
be integer, float or real. The output type will be double
and will be set nullable with a default value of null.
store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A)
aggregate(A, avg(x) as mean, j)
count
count(attribute values)
Aggregated non-empty count of cells. The output type
will be unsigned int64 and set nullable with a default value of null.
store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A)
aggregate(A, count(x) as count, j)
max
max(attribute values)
Maximum of aggregated attribute values. The attribute type must
be able to be ordered. The output type is the same as the input
type, but will be nullable with a default value of null.
store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A)
aggregate(A, max(x) as max, j)
min
min(attribute values)
Minium of aggregated attribute values. The attribute type must
be able to be ordered. The output type is the same as the input
type, but will be nullable with a default value of null.
store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A)
aggregate(A, min(x) as min, j)
stdev
stdev(attribute values)
Standard deviation of the aggregated attribute values. The attribute
type must be integer, float or double. The output type will be double
and will be set nullable with a default value of null.
store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A)
aggregate(A, stdev(x) as stdev, j)
sum
sum(attribute values)
Sum the aggregated attribute values. The input attribute
type must be integer, float or double. The output type will be double
for double-valued inputs, float for float-valued inputs, and
int64 for integer inputs,
and will be set nullable with a default value of null.
store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A)
aggregate(A, sum(x) as sum, j)
var
var(attribute values)
Variance of the aggregated attribute values. The attribute
type must be integer, float or double. The output type will be double
and will be set nullable with a default value of null.
store(build(<x:double>[i=1:10,10,0,j=1:5,5,0],i*j), A)
aggregate(A, var(x) as var, j)
Other SciDB AFL commands
cancel
cancel(query_id)
Cancel a query. Nothing is returned.
load_library
load_library('plugin_name')
Load a SciDB plugin.
load_library('dense_linear_algebra')
remove
remove(array)
Remove an array.
store(build(<x:double>[i=1:10,10,0,j=1:10,10,0],i+j),A)
remove(A)