A connection object that talks to SciDB
The connection object creates ArrayOp instances, execute AFL queries of arrayOps, and download qurey results as R data frames.
new()
Create a new ScidbConnection instance
This function is only for package internal use.
Please call arrayop::db_connect
to get a ScidbConnection object
ScidbConnection$new()
connect()
Connect to scidb with a list of connection arguments
Calling this function will update the current connection object's internal state. If the connection has timed out, just call this function without args will re-establish the connection with previously configured connection args.
connection_args
NULL or a list of connection args.
save_token
Whether to save token/password. Default FALSE. Do not set to TRUE in production env.
db
A connection object returned by scidb::scidbconnect
.
Default NULL means using the previous connection args.
A list of connection args follow the same names as params in db_connect
.
conn_args()
Return a list of connection args used to establish the scidb connection
Connection args follow the same names as db_connect
args.
query_all()
Download a data frame of the query result with all dimensions and attributes
query()
Download a data frame of the query result with all attributes
execute()
Execute AFL expression without result
Use this for pure side effect and no result is downloaded. E.g. create arrays, remove arrays, remove array versions, update arrays.
create_array()
Create a new scidb array and return the arrayOp instance for it
name
Scidb array name. E.g. 'myNamespace.myArray'. If no namespace in array name, it will be created in the 'public' namespace.
schema_template
A scidb schema string or an arrayOp instance, used as the schema template for the newly created array.
.temp
Boolean. Whether to create the array as a scidb temporary array
execute_mquery()
Execute multiple AFL statments wrapped in one mquery
with transcation gurantee
mquery
currently only supports top operators insert
and delete
array()
Get an ArrayOp instance of an existing scidb array
The scidb array denoted by the array_name must exsit in scidb.
array_from_schema()
Create an ArrayOp instance from array schema
Useful in creating an arrayOp as a template for other arrayOp operations. If an array name provided in the schema_str, the array of the same name does not have to exist in scidb. Obviously, you cannnot download data from a non-existent array, but it can be used as template for other operations.
schema_str
A scidb-format array schema. The array name is optional.
E.g. <fa:int32, fb:string COMPRESSION 'zlib'> [i;j]
creates an arrayOp
with the specified attributes and dimensions, and an empty afl string.
E.g. myArray <fa:int32, fb:string COMPRESSION 'zlib'> [i;j]
creates an arrayOp
with the specified attributes and dimensions, and encapsulates an afl string of "myArray".
The array named "myArray" does not need to exist in scidb since this is done
only in local R env without checking in scidb.
afl_expr()
Create an arrayOp instance from an AFL expression
Infers the schema using the scidb 'show' operator.
afl2()
Create an arrayOp instance from an AFL expression
When the input is a string, treats this is raw AFL
When it is an expression, it is deparsed according to
"AFL-like" syntax; see deparse_to_afl()
for a definition
of what this entails.
Infers the schema using the scidb 'show' operator.
array_from_df()
Get an arrayOp instance from an R data frame
Implemented by scidb 'build' operator or SciDBR scidb::as.scidb
function
which uploads a data frame to scidb.
If the number of cells (nrow * ncolumns) of the data frame is smaller than the 'build_or_upload_threshold',
use 'build' operator to create a build literal array; otherwise,
a persistent scidb array is created by uploading the R data frame into scidb.
ScidbConnection$array_from_df(
df,
template = NULL,
build_or_upload_threshold = 8000L,
build_dim_spec = .random_field_name(),
as_scidb_data_frame = FALSE,
skip_scidb_schema_check = FALSE,
force_template_schema = FALSE,
...
)
df
an R data frame
template
The array schema template can be NULL, an arrayOp, or a scidb schema string. If NULL, inferr scidb data types from the classes of data frame columns. If arrayOp, use the actual scidb field types for matching columns of the R data frame. If schmea string, infer field types the same way as an arrayOp instance.
build_or_upload_threshold
An integer, below which the scidb 'build'
operator is used to create a build literal; otherwise, upload the data frame
into scidb with SciDBR's scidb::as.scidb
function.
build_dim_spec
The build dimension spec if 'build' operator is chosen. Can be either a simple field name or a full dimension spec. E.g. "z", or "z=0:*:0:100"
as_scidb_data_frame
Boolean. If FALSE (default), create a scidb data frame (no explicit dimensions); otherwise, create a regular scidb array. Applicable for 'build' literal only.
skip_scidb_schema_check
Boolean. If FALSE (default), check with scidb to determine the exact schema of result arrayOp; otherwise, infer the schema locally (which is not accurate; but saves a round trip to scidb server and work in most cases if not used as an template).
force_template_schema
Boolean. If FALSE (default), do not change the result arrayOp schema to be compatible with the template using 'redimension' operator. If TRUE, force the result arrayOp to use the same schema as the template, which must be provided (not NULL).
...
other params used in upload_df
function.
upload_df()
Get an arrayOp instance from uploaded R data frame
Implemented by SciDBR scidb::as.scidb
function which uploads a
data frame or vector(s) to scidb.
By default, the uploaded R data frame is saved to scidb as a persistent array.
If upload_by_vecotr = TRUE
, multiple scidb arrays are created, each for
one of the R data frame column by uploading individual vectors, which is
faster but suffers from bugs in ScidbR.
ScidbConnection$upload_df(
df,
template = NULL,
name = dbutils$random_array_name(),
force_template_schema = FALSE,
upload_by_vector = FALSE,
.use_aio_input = FALSE,
.temp = FALSE,
.gc = TRUE
)
df
an R data frame
template
The array schema template can be NULL, an arrayOp, or a scidb schema string. If NULL, inferr scidb data types from the classes of data frame columns. If arrayOp, use the actual scidb field types for matching columns of the R data frame. If schmea string, infer field types the same way as an arrayOp instance.
name
A string as the uploaded scidb array name, only applicable when
upload_by_vector = FALSE
in which case a single scidb array is created.
force_template_schema
Boolean. If FALSE (default), do not change the result arrayOp schema to be compatible with the template using 'redimension' operator. If TRUE, force the result arrayOp to use the same schema as the template, which must be provided (not NULL).
upload_by_vector
Boolean. If TRUE, upload R data frame by its vectors, which is faster than upload R data frame as a whole but suffers from unresolved ScidbR bugs. If FALSE (default), upload R data frame as a whole as a sicdb array.
.use_aio_input
Boolean, default FALSE. Whether to use 'aio_input' to import the uploaded data frame on scidb server side. If TRUE, 'aio_input' is faster than the default 'input' operator, but suffers from some bugs in the 'aio_input' scidb plugin.
.temp
Boolean, default FALSE. Whether to save the uploaded data frame as a temporary scidb array.
.gc
Boolean, default TRUE. Whether to remove the uploaded scidb array once the encapsulating arrayOp goes out of scodb in R.
compile_df()
Get an arrayOp instance by compiling an R data frame into a scidb build literal
Implemented by scidb 'build' operator. No persistent scidb array will be created.
ScidbConnection$compile_df(
df,
template = NULL,
build_dim_spec = dbutils$random_field_name(),
force_template_schema = FALSE,
as_scidb_data_frame = FALSE,
skip_scidb_schema_check = FALSE
)
df
an R data frame
template
The array schema template can be NULL, an arrayOp, or a scidb schema string. If NULL, inferr scidb data types from the classes of data frame columns. If arrayOp, use the actual scidb field types for matching columns of the R data frame. If schmea string, infer field types the same way as an arrayOp instance.
build_dim_spec
The build dimension spec if 'build' operator is chosen. Can be either a simple field name or a full dimension spec. E.g. "z", or "z=0:*:0:100"
force_template_schema
Boolean. If FALSE (default), do not change the result arrayOp schema to be compatible with the template using 'redimension' operator. If TRUE, force the result arrayOp to use the same schema as the template, which must be provided (not NULL).
as_scidb_data_frame
Boolean. If FALSE (default), create a scidb data frame (no explicit dimensions); otherwise, create a regular scidb array. Applicable for 'build' literal only.
skip_scidb_schema_check
Boolean. If FALSE (default), check with scidb to determine the exact schema of result arrayOp; otherwise, infer the schema locally (which is not accurate; but saves a round trip to scidb server and work in most cases if not used as an template).
fread()
Get an arrayOp instance that encapsulates an array operation using 'aio_input' operator to read content from a file
Param convenctions similar to data.table::fread
function.
We can choose if column names/types should be inferred from peaking into
the file by setting header = T
and nrow = 10
for how many rows to peek
for inference.
Unique to scidb, we can control how the file columns are converted and whether to use multiple scidb instances to read multiple files in parallel.
ScidbConnection$fread(
file_path,
template = NULL,
header = TRUE,
sep = "\t",
col.names = NULL,
mutate_fields = NULL,
auto_dcast = FALSE,
nrow = 10L,
instances = NULL,
.aio_settings = NULL
)
file_path
A single string or a string vector, for a local file
path or a list of paths. If multiple paths provided, the instances
param
must be set to the same number as file_path
.
template
The array schema template can be NULL, an arrayOp, or a scidb
schema string. If NULL, inferr scidb data types by peeking into the file
and read a small data frame of nrows
with data.table::fread
. Sensible
data type conversion between R and scidb will be performed.
If arrayOp, use the actual scidb field types for matching file columns.
If schmea string, infer field types the same way as an arrayOp instance.
header
Boolean, default TRUE. Whether to use the first file row to infer file column names and data types.
sep
A single character string as the field delimiter, default "\t"
for TSV files. Set to ","
for CSV files.
col.names
NULL (default) or a string vector.
If col.names = NULL, header = T
, file column names are inferred from the
first file row.
If col.names = NULL, header = F, tempalte = NULL
, file column names follow the data.table::fread
convention and are named as V1, V2, ... etc
.
If col.names = NULL, header = F, tempalte = anTemplate
, assume file columns
are in the same order as the template's dimensions + attributes.
If set to a string vector, its length must match the actual file columns,
and the acutal file column names are replaced with the provided col.names
, but
data types are still inferred from the actual file columns.
mutate_fields
NULL or a list of R expressions. When auto_dcast = T
,
this setting prevails. Similar to ArrayOpBase$mutate
.
E.g. a = b + 2, name = first + "-" + last, chrom = if(chrom == 'x') 23 else if(chrom == 'y') 24 else chrom
auto_dcast
Boolean, default FALSE. If TRUE, all non-string fields are
dcast'ed with dcast(ax, int64(null)), where ax is the 0-indexed mapping attribute name (e.g. a0, a1, etc), and int64 is the template field type. If FALSE, force coerce file columns into scidb types for all non-string fields, e.g. double(a0), int32(a1). Error will be thrown if incompatible field content is read during execution of this
freadfunction, not the
fread` itself since
it doesn't actually execute any operation.
Even if auto_dcast = T
which is useful in many cases when file is not strictly
formatted, we can still overwrite the dcast
rule by setting a mutate_fields
expression
list, as seen in ArrayOpBase$mutate
.
nrow
An integer, default 10. How many rows to peek into the file to infer
column names and data types using data.table::fread
.
instances
NULL (default) or an integer vector. For single file path, set to NULL. For multiple file paths, set the same number of instances as the file paths, each reading from a file path in parallel.
.aio_settings
NULL (default) or a list of extra aio_input settings. Basic aio_input settings including path, num_attributes, and header are generated automatically and should not be manually provided. See scidb doc on extra aio_input settings.
## ------------------------------------------------
## Method `ScidbConnection$execute_mquery`
## ------------------------------------------------
if (FALSE) { # \dontrun{
conn$execute_mquery(
target$filter(conc < 50)$update(target),
target$delete_cells(Plant %contains% "3", uptake > 10)
)
} # }