Binding SciDB arrays and query expressions to R variables
SciDB array view objects
The scidb()
function returns an object of class scidb that contains a reference to a SciDB array or query expression. The returned object is presented with a data frame-like view that lists the SciDB schema components.
x <- scidb::scidb(db, "build(<v:double>[i=1:2,2,0, j=1:3,1,0], i*j)")
x
SciDB expression build(<v:double>[i=1:2,2,0, j=1:3,1...
SciDB schema <v:double> [i=1:2:0:2; j=1:3:0:1]
variable dimension type nullable start end chunk
1 i TRUE int64 FALSE 1 2 2
2 j TRUE int64 FALSE 1 3 1
3 v FALSE double TRUE
The R variable x
is a sort of SciDB array view; an un-evaluated SciDB query expression. The value of x
is evaluated by SciDB either lazily when needed or when explicitly requested. It is an S4 R object; the “name” slot contains the AFL expression corresponding to the object.
Storing query results in SciDB
Use the store()
function to evaluate and materialize views in SciDB. The store()
function stores the evaluation result into a new named SciDB array in the database, returning a new scidb R object that points to the SciDB array. The array name may be optionally specified or automatically generated and the array may optionally be stored in a SciDB temporary array.
y <- scidb::store(db, x, temp=TRUE)
y
SciDB expression R_arrayc22c32414c07_173133300951694...
SciDB schema <v:double> [i=1:2:0:2; j=1:3:0:1]
variable dimension type nullable start end chunk
1 i TRUE int64 FALSE 1 2 2
2 j TRUE int64 FALSE 1 3 1
3 v FALSE double TRUE
Note that the SciDB expression associated with the y
R variable is now a named SciDB array (automatically named in this case); compare with the SciDB expression for x
above.
Lifetime and garbage collection
SciDB array values associated with R variables are tied to R’s garbage collector by default (unless the argument gc=FALSE
is specified). When the R variable’s contents are garbage-collected by R, the associated SciDB array is removed.
# observe that y's corresponding array is in the list)
yname <- y@name
yname %in% scidb::ls(db)$name
[1] TRUE
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 1061084 56.7 2110147 112.7 1481941 79.2
Vcells 1876948 14.4 8388608 64.0 3325121 25.4
[1] FALSE
The schema()
function
Use the schema()
function to display the SciDB schema of scidb objects verbatim or in parsed detail for attributes and dimensions.
scidb::schema(x)
[1] "<v:double> [i=1:2:0:2; j=1:3:0:1]"
scidb::schema(x, "attributes")
name type nullable
1 v double TRUE
scidb::schema(x, "dimensions")
name start end chunk overlap
1 i 1 2 2 0
2 j 1 3 1 0
Uploading R data frames to SciDB
The package provides limited convenience functions for converting and uploading R values to SciDB. The upload mechanism is much less efficient than available SciDB bulk load methods and should only be used for small to moderate-sized data. The following R objects are supported:
- logical, integer, numeric, and character vectors
- numeric matrices
- data frames with logical, character, integer, and numeric values (variable names may be changed to conform to SciDB naming convention)
- numeric column-compressed sparse matrices (class “dgCMatrix”)
- raw objects
Factor values are uploaded as character value, replacing factor levels with their corresponding character strings.
The as.scidb()
function returns a reference to a SciDB array containing the uploaded data. The following example uploads a data frame to SciDB, warning us that variable names were changed because SciDB does not support dots in names, and then downloads the resulting SciDB object data back into R.
Warning in .PreprocessDfTypes(X, types, use_aio_input): Attribute names have been changed
scidb::as.R(x)
i Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1 1 5.1 3.5 1.4 0.2 setosa
2 2 4.9 3.0 1.4 0.2 setosa
3 3 4.7 3.2 1.3 0.2 setosa
4 4 4.6 3.1 1.5 0.2 setosa
5 5 5.0 3.6 1.4 0.2 setosa
6 6 5.4 3.9 1.7 0.4 setosa
Note that SciDB dimension indices (i
above) are appended to the data when downloaded.