This vignette illustrates using SciDB from R by example. For more detailed information on the functions described in this vignette, see the manual pages in the package.
Installing the package
From CRAN (stable, but may lag features on GitHub by several months):
install.packages("scidb")
From the development repository on GitHub (stable branch):
devtools::install_github("Paradigm4/SciDBR")
“Stable” means that all CRAN checks and package unit tests pass when tested using the current SciDB release. We try to make sure that the scidb package works with all previous versions of SciDB but we only actively test against the current release version of the database. Other experimental and development branches exist; see the GitHub repository for a list.
Connecting to SciDB
The scidbconnect()
function establishes a connection to SciDB, either to a simple HTTP network service called shim (https://github.com/Paradigm4/shim) running on a SciDB coordinator instance or directly to SciDB’s HTTP API (for SciDB version 23.2 and higher). The function may be safely called multiple times. The function return value contains the SciDB database connection state in an object of class afl
.
The network interface optionally supports SSL encryption and SciDB authentication or HTTP digest user authentication.
Connect to SciDB on the default shim port and localhost
library("scidb")
db <- scidbconnect()
Connect to shim on an encrypted port 8083 with example SciDB authentication
db <- scidbconnect(port=8083, username="root", password="password", protocol = 'https')
Use encrypted sessions when communicating with SciDB over public networks. SciDB user authentication is only supported by SciDB versions 15.7 and greater and only works over encrypted connections.
Listing SciDB arrays and operators
The scidb::scidbconnect()
function returns a SciDB connection object of class afl. In addition to storing the connection state, the returned object has a few special methods. Printing the object shows a summary of the connection state. Applying the ls()
function to the object returns a list of SciDB arrays (subject to any potential namespace-setting prefix expression). And the object itself is really a list that contains available SciDB AFL operator and macro functions established upon connection. Apply the ls.str()
to the object to list all AFL operators and macros.
print(db) # summarize connection
scidb::ls(db) # quick list of arrays
ls.str(db) # quick list of AFL operators
The function ls.str(db)
shows the formal AFL operator arguments for each function. These functions can be used to compose AFL expressions from R, discussed in more detail below.
Additionally, each listed AFL operator is present as an R function for the object db
. This experimental method for generating AFL operations functionally is documented at vignette("afl_generation")
.
NOTE The list of operators and macros is established at connection time. If the database operators change after establishing the connection, for instance by loading a new SciDB plugin, then those changes will not be shown in the database connection object. New connection objects will show the current list of operators and macros.
Running AFL queries via iquery()
The simplest way to compose and execute SciDB queries is to use the iquery()
function. This directly runs arbitrary SciDB AFL queries supplied as character strings or scidb objects, optionally returning results to R as a data frame:
scidb::iquery(db, "build(<v:double>[i=1:2,2,0, j=1:3,1,0], i*j)", return=TRUE)
i j v
1 1 1 1
2 2 1 2
3 1 2 2
4 2 2 4
5 1 3 3
6 2 3 6
Building AFL expression strings
In real usage, queries usually are not entirely literal AFL expressions but depend on parameters and variables in the R environment, such as the name of an array or one of its attributes. This simple example shows query involving the grouped_aggregate
operator applied to an array created by uploading an R data frame (see vignette("advanced")
) by interpolating its name into the query:
x <- scidb::as.scidb(db, iris) # upload the iris data frame to SciDB
Warning in .PreprocessDfTypes(X, types, use_aio_input): Attribute names have been changed
scidb::iquery(db, paste("grouped_aggregate(", x@name, ", Species, avg(Petal_Length) as avg)"), return=TRUE)
instance_id value_no Species avg
1 0 0 setosa 1.462
2 0 1 virginica 5.552
3 2 0 versicolor 4.260
Experimental methods for building AFL more programatically are shown in vignette("afl_generation")
.
Using the arrow return format
When returning the data from a call to iquery()
, the options binary=FALSE, arrow=TRUE
steer the communication and parsing of the dataset to use the arrow IPC format, which can be much more performant when the returned array is large.
Advanced topics
Binding with R data frames and variables
Support for uploading R data frames to SciDB arrays, and conversely for directly downloading SciDB arrays as dataframes outside of the iquery()
convenience functions, is documented atvignette("advanced")
.
Composing AFL expressions
In addition to directly running AFL expressions via iquery()
, the scidb package supports for programatically composing AFL expressions. These composition methods include using the database connection AFL functions and mapping R expressions to AFL expressions, and are documented at vignette("afl_generation")
.
Package options
Detailed package options, especially regarding authentication, namespaces, and roles, is available in vignette("options")
.