This vignette illustrates using SciDB from R by example. For more detailed information on the functions described in this vignette, see the manual pages in the package.
Installing the package
From CRAN (stable, but may lag features on GitHub by several months):
install.packages("scidb")From the development repository on GitHub (stable branch):
devtools::install_github("Paradigm4/SciDBR")“Stable” means that all CRAN checks and package unit tests pass when tested using the current SciDB release. We try to make sure that the scidb package works with all previous versions of SciDB but we only actively test against the current release version of the database. Other experimental and development branches exist; see the GitHub repository for a list.
Connecting to SciDB
The scidbconnect() function establishes a connection to
SciDB, either to a simple HTTP network service called shim (https://github.com/Paradigm4/shim) running on a SciDB
coordinator instance or directly to SciDB’s HTTP API (for SciDB version
23.2 and higher). The function may be safely called multiple times. The
function return value contains the SciDB database connection state in an
object of class afl.
The network interface optionally supports SSL encryption and SciDB authentication or HTTP digest user authentication.
Connect to SciDB on the default shim port and localhost
library("scidb")
db <- scidbconnect()Connect to shim on an encrypted port 8083 with example SciDB authentication
db <- scidbconnect(port=8083, username="root", password="password", protocol = 'https')Use encrypted sessions when communicating with SciDB over public networks. SciDB user authentication is only supported by SciDB versions 15.7 and greater and only works over encrypted connections.
Listing SciDB arrays and operators
The scidb::scidbconnect() function returns a SciDB
connection object of class afl. In addition to storing the connection
state, the returned object has a few special methods. Printing the
object shows a summary of the connection state. Applying the
ls() function to the object returns a list of SciDB arrays
(subject to any potential namespace-setting prefix expression). And the
object itself is really a list that contains available SciDB AFL
operator and macro functions established upon connection. Apply the
ls.str() to the object to list all AFL operators and
macros.
print(db) # summarize connection
scidb::ls(db) # quick list of arrays
ls.str(db) # quick list of AFL operatorsThe function ls.str(db) shows the formal AFL operator
arguments for each function. These functions can be used to compose AFL
expressions from R, discussed in more detail below.
Additionally, each listed AFL operator is present as an R function
for the object db. This experimental method for generating
AFL operations functionally is documented at
vignette("afl_generation").
NOTE The list of operators and macros is established at connection time. If the database operators change after establishing the connection, for instance by loading a new SciDB plugin, then those changes will not be shown in the database connection object. New connection objects will show the current list of operators and macros.
Running AFL queries via iquery()
The simplest way to compose and execute SciDB queries is to use the
iquery() function. This directly runs arbitrary SciDB AFL
queries supplied as character strings or scidb objects, optionally
returning results to R as a data frame:
scidb::iquery(db, "build(<v:double>[i=1:2,2,0, j=1:3,1,0], i*j)", return=TRUE) i j v
1 1 1 1
2 2 1 2
3 1 2 2
4 2 2 4
5 1 3 3
6 2 3 6
Building AFL expression strings
In real usage, queries usually are not entirely literal AFL
expressions but depend on parameters and variables in the R environment,
such as the name of an array or one of its attributes. This simple
example shows query involving the grouped_aggregate
operator applied to an array created by uploading an R data frame (see
vignette("advanced")) by interpolating its name into the
query:
x <- scidb::as.scidb(db, iris) # upload the iris data frame to SciDBWarning in .PreprocessDfTypes(payload, desc@attr_types, use_aio_input): Attribute names have been changed
scidb::iquery(db, paste("grouped_aggregate(", x@name, ", Species, avg(Petal_Length) as avg)"), return=TRUE) instance_id value_no Species avg
1 0 0 setosa 1.462
2 0 1 virginica 5.552
3 2 0 versicolor 4.260
Experimental methods for building AFL more programatically are shown
in vignette("afl_generation").
Using the arrow return format
When returning the data from a call to iquery(), the
options binary=FALSE, arrow=TRUE steer the communication
and parsing of the dataset to use the arrow IPC format, which can be
much more performant when the returned array is large.
Advanced topics
Binding with R data frames and variables
Support for uploading R data frames to SciDB arrays, and conversely
for directly downloading SciDB arrays as dataframes outside of the
iquery() convenience functions, is documented
atvignette("advanced").
Composing AFL expressions
In addition to directly running AFL expressions via
iquery(), the scidb package supports for programatically
composing AFL expressions. These composition methods include using the
database connection AFL functions and mapping R expressions to AFL
expressions, and are documented at
vignette("afl_generation").
Package options
Detailed package options, especially regarding authentication,
namespaces, and roles, is available in
vignette("options").