Skip to contents

Binding SciDB arrays and query expressions to R variables

SciDB array view objects

The scidb() function returns an object of class scidb that contains a reference to a SciDB array or query expression. The returned object is presented with a data frame-like view that lists the SciDB schema components.

x <- scidb::scidb(db, "build(<v:double>[i=1:2,2,0, j=1:3,1,0], i*j)")
x
SciDB expression  build(<v:double>[i=1:2,2,0, j=1:3,1...
SciDB schema  <v:double> [i=1:2:0:2; j=1:3:0:1] 
  variable dimension   type nullable start end chunk
1        i      TRUE  int64    FALSE     1   2     2
2        j      TRUE  int64    FALSE     1   3     1
3        v     FALSE double     TRUE                

The R variable x is a sort of SciDB array view; an un-evaluated SciDB query expression. The value of x is evaluated by SciDB either lazily when needed or when explicitly requested. It is an S4 R object; the “name” slot contains the AFL expression corresponding to the object.

Storing query results in SciDB

Use the store() function to evaluate and materialize views in SciDB. The store() function stores the evaluation result into a new named SciDB array in the database, returning a new scidb R object that points to the SciDB array. The array name may be optionally specified or automatically generated and the array may optionally be stored in a SciDB temporary array.

y <- scidb::store(db, x, temp=TRUE)
y
SciDB expression  R_arrayc22c32414c07_173133300951694...
SciDB schema  <v:double> [i=1:2:0:2; j=1:3:0:1] 
  variable dimension   type nullable start end chunk
1        i      TRUE  int64    FALSE     1   2     2
2        j      TRUE  int64    FALSE     1   3     1
3        v     FALSE double     TRUE                

Note that the SciDB expression associated with the y R variable is now a named SciDB array (automatically named in this case); compare with the SciDB expression for x above.

Lifetime and garbage collection

SciDB array values associated with R variables are tied to R’s garbage collector by default (unless the argument gc=FALSE is specified). When the R variable’s contents are garbage-collected by R, the associated SciDB array is removed.

# observe that y's corresponding array is in the list)
yname <- y@name
yname %in% scidb::ls(db)$name
[1] TRUE
# Remove and garbage collect
rm(y)
gc()
          used (Mb) gc trigger  (Mb) max used (Mb)
Ncells 1061084 56.7    2110147 112.7  1481941 79.2
Vcells 1876948 14.4    8388608  64.0  3325121 25.4
# Observe that y's corresponding array is no longer in the list
yname %in% scidb::ls(db)$name  
[1] FALSE

The schema() function

Use the schema() function to display the SciDB schema of scidb objects verbatim or in parsed detail for attributes and dimensions.

scidb::schema(x)
[1] "<v:double> [i=1:2:0:2; j=1:3:0:1]"
scidb::schema(x, "attributes")
  name   type nullable
1    v double     TRUE
scidb::schema(x, "dimensions")
  name start end chunk overlap
1    i     1   2     2       0
2    j     1   3     1       0

Uploading R data frames to SciDB

The package provides limited convenience functions for converting and uploading R values to SciDB. The upload mechanism is much less efficient than available SciDB bulk load methods and should only be used for small to moderate-sized data. The following R objects are supported:

  • logical, integer, numeric, and character vectors
  • numeric matrices
  • data frames with logical, character, integer, and numeric values (variable names may be changed to conform to SciDB naming convention)
  • numeric column-compressed sparse matrices (class “dgCMatrix”)
  • raw objects

Factor values are uploaded as character value, replacing factor levels with their corresponding character strings.

The as.scidb() function returns a reference to a SciDB array containing the uploaded data. The following example uploads a data frame to SciDB, warning us that variable names were changed because SciDB does not support dots in names, and then downloads the resulting SciDB object data back into R.

x <- scidb::as.scidb(db, head(iris))
Warning in .PreprocessDfTypes(X, types, use_aio_input): Attribute names have been changed
scidb::as.R(x)
  i Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1 1          5.1         3.5          1.4         0.2  setosa
2 2          4.9         3.0          1.4         0.2  setosa
3 3          4.7         3.2          1.3         0.2  setosa
4 4          4.6         3.1          1.5         0.2  setosa
5 5          5.0         3.6          1.4         0.2  setosa
6 6          5.4         3.9          1.7         0.4  setosa

Note that SciDB dimension indices (i above) are appended to the data when downloaded.