ArrayOp base class that encapsulates a scidb array or array operations

ArrayOp class instances denote scidb array operations and operands, hence the name.

Operands can be plain scidb array names or (potentially nested) operations on arrays.

Most ArrayOp class methods return a new ArrayOp instance and the original instance on which methods are invoked from remains the same, i.e. ArrayOp instances are immutable.

Details

One ArrayOp operation may involve one or multiple scidb operators and any number of operands. in another operation. Operands and Opreration results can all be denoted by ArrayOp.

Sub-classes of ArrayOpBase deal with any syntax or operator changes in different SciDB version so that the ArrayOpBase class can provide a unified API on all supported SciDB versions. Currently SciDB V18 and V19 are supported.

Users of arrayop package shouldn't be concerned with a specific sub-class since the ScidbConnection object automatically chooses the correct class version and creates instances based on the scidb version it connects to.

Get arrayOp instances from the default ScidbConnection object. See arrayop::get_default_connection for details.

Active bindings

dims: Dimension names
attrs: Attribute names
selected: Selected dimension and/or attribute names
dtypes: A named list, where key is dim/attr name and value is respective SciDB data type as string
raw_dtypes: A named list, where key is dim/attr name and value is first part of respective SciDB data type as string
dims_n_attrs: Dimension and attribute names
attrs_n_dims: Attribute and dimension names
is_schema_from_scidb: If the array schema is retrieved from SciDB or inferred locally in R
is_scidb_data_frame: Whether current array_op is a regular array or SciDB data frame (array with hidden dimensions; not to be confused with R data frames)
.private: For internal testing only. Do not access this field to avoid unintended consequences!!!

Methods

Public methods

ArrayOpBase$new()
ArrayOpBase$inherit_refs()
ArrayOpBase$filter()
ArrayOpBase$mutate()
ArrayOpBase$transmute()
ArrayOpBase$mutate_by()
ArrayOpBase$inner_join()
ArrayOpBase$left_join()
ArrayOpBase$right_join()
ArrayOpBase$full_join()
ArrayOpBase$semi_join()
ArrayOpBase$group_by()
ArrayOpBase$summarize()
ArrayOpBase$set_auto_fields()
ArrayOpBase$update()
ArrayOpBase$overwrite()
ArrayOpBase$delete_cells()
ArrayOpBase$project()
ArrayOpBase$select()
ArrayOpBase$to_df_all()
ArrayOpBase$to_df()
ArrayOpBase$execute()
ArrayOpBase$persist()
ArrayOpBase$change_schema()
ArrayOpBase$drop_dims()
ArrayOpBase$sync_schema()
ArrayOpBase$spawn()
ArrayOpBase$to_afl()
ArrayOpBase$to_schema_str()
ArrayOpBase$limit()
ArrayOpBase$cell_count()
ArrayOpBase$summarize_array()
ArrayOpBase$list_versions()
ArrayOpBase$version()
ArrayOpBase$is_persistent()
ArrayOpBase$exists_persistent_array()
ArrayOpBase$array_meta_data()
ArrayOpBase$remove_versions()
ArrayOpBase$remove_array()
ArrayOpBase$finalize()

Method `new()`

Base class initialize function, to be called in sub-class internally.

Always use ScidbConnection to get array_op instances.

Usage

ArrayOpBase$new(
  raw_afl,
  dims = as.character(c()),
  attrs = as.character(c()),
  dtypes = list(),
  dim_specs = list(),
  ...,
  meta_list
)

Arguments

raw_afl: AFL expression (array name or operations) as string
dims: A string vector used as dimension names
attrs: A string vector used as attribute names
dtypes: A named list of strings, where names are attribute names and values are full scidb data types. E.g. dtypes = list(field_str = "string NOT NULL", field_int32 = "int32")
dim_specs: A named list of string, where names are dimension names and values are dimension specs. E.g. dim_specs = list(da = "0:*:0:*", chrom = "1:24:0:1").
...: A named list of metadata items, where names are used as keys in private$set_meta and private$get_meta functions.
meta_list: A list that stores ArrayOp meta data, e.g. field types If provided, other regular params are not allowed.

Method `inherit_refs()`

Add the references of a another ArrayOpBase object to this object

To be used when creating a new ArrayOpBase object that is a function of self but multiple other objects as well

Usage

ArrayOpBase$inherit_refs(rhs)

Arguments

rhs: object from which to inherit SciDBR references

Method `filter()`

Create a new ArrayOp instance with filter expressions

Similar to dplyr::filter, fields are not quoted.

Operators for any type of fields include ==, !=, %in%, %not_in%. To test whether a field is null, use unary operators: is_null, not_null.

Special binary operators for string fields include: %contains%, %starts_with%, %ends_with%, %like%, where only %like% takes a regular expression and other operators escape any special characters in the right operand.

Operators for numeric fields include: >, <, >=, <=

Usage

ArrayOpBase$filter(
  ...,
  .expr = NULL,
  .validate_fields = TRUE,
  .regex_func = getOption("arrayop.regex_func", default = NULL),
  .ignore_case = getOption("arrayop.ignore_case", default = NULL)
)

Arguments

...: Filter expression(s) in R syntax. These expression(s) are not evaluated in R but first captured then converted to scidb expressions with appropriate syntax.
.expr: A single R expression, or a list of R exprs, or NULL. If provided, ... is ignored. Multiple exprs are joined by 'and'. This param is useful when we want to pass an already captured R expression.
.validate_fields: Boolean, default TRUE, whether to validate fields in filter expressions. Throw error if invalid fields exist when set to TRUE.
.regex_func: deprecated option
.ignore_case: deprecated option

Returns

A new arrayOp

Method `mutate()`

Create a new ArrayOp instance with mutated fields

Similar to dplyr::mutate, fields of source (self) can be removed or added to the result arrayOp Any field that are not in the mutate expressions remain unchanged.

Usage

ArrayOpBase$mutate(..., .dots = NULL, .sync_schema = TRUE)

Arguments

...

Named R expressions. Names are field names in the result arrayOp and must not be empty. Set field = NULL to remove existing fields. E.g. abcd = NULL, def = def removes field 'abcd' and keep field 'def'.

Values are R expressions similar to the filter method. E.g. a = b + 2, name = first + "-" + last, chrom = if(chrom == 'x') 23 else if(chrom == 'y') 24 else chrom

.dots

A list of SciDBR expressions, R expressions, or NULL. If provided, the ... param is ignored. Useful when an a list of mutation expressions is already created and can be passed around.

.sync_schema

Whether to get the exact schema from scidb. Default TRUE will cause a scidb query to get the schema. Set to FALSE to avoid schema checking.

Returns

a new ArrayOp instance

Method `transmute()`

Create a new ArrayOp instance with mutated fields

Similar to dplyr::transmute, only listed fields are retained in the result arrayOp NOTE: Any field that are not in the mutate expressions will be discarded.

Usage

ArrayOpBase$transmute(..., .dots = NULL, .sync_schema = TRUE)

Arguments

...

R expressions. Names are optional. For each named expression, the name is used as field name in the result arrayOp. Unnamed expressions must be existing field names, unquoted, which result in unchanged source fields of self.

Values are R expressions similar to the filter method. E.g. a = b + 2, name = first + "-" + last, chrom = if(chrom == 'x') 23 else if(chrom == 'y') 24 else chrom

.dots

A named list of SciDBR expressions, R expressions, or NULL. If provided, the ... param is ignored. Useful when an a list of mutation expressions is already created and can be passed around.

.sync_schema

Whether to get the exact schema from scidb. Default TRUE will cause a scidb query to get the schema. Set to FALSE to avoid schema checking.

Returns

a new ArrayOp instance

Method `mutate_by()`

Create a ArrayOp instance with the same schema of self, but different cells from 'data_array' for the 'updated_fields'.

Usage

ArrayOpBase$mutate_by(
  data_array,
  keys = NULL,
  updated_fields = NULL,
  .redimension_setting = NULL,
  .join_setting = NULL
)

Arguments

data_array: An ArrayOp instance that have at least two overlapping fields with self.
keys: Field names in both self and data_array. Cell content of these fields are from the 'self' arrayOp rather than 'data_array'.
updated_fields: Field names in both self and data_array. Cell content of these fields are from the 'data_array', NOT 'self'.
.redimension_setting: A list of strings used as the settings of scidb 'redimension' operator. Only applicable when a 'redimension' is needed.
.join_setting: A list of strings used as the settings of scidb 'join' operator. Only applicable when a 'join' is needed.

Returns

A new arrayOp with the same schema as self

Method `inner_join()`

Inner join two arrays: 'self' (left) and 'right'

Similar to dplyr::inner_join, the result arrayOp performs an inner join. For both left and right arrays, only selected fields are included in the result arrayOp. If no fields are selected, then all fields are treated as selected.

Usage

ArrayOpBase$inner_join(
  right,
  by.x = NULL,
  by.y = NULL,
  by = NULL,
  left_alias = "_L",
  right_alias = "_R",
  join_mode = "equi_join",
  settings = NULL
)

Arguments

right: An arrayOp instance
by.x: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'.
by.y: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'.
by: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'. If not NULL, must be fields of both operands.
left_alias: Alias for left array to resolve potential conflicting fields in result
right_alias: Alias for right array to resolve potential conflicting fields in result
join_mode: String 'equi_join', 'apply_join', or 'cross_join'. The second always replicates the right-hand array to all instances, with the benefit of non-materializing result in scidb. The third requires the join keys to all be dimensions of both operands, more stringent than 'equi_join' but again with the benefit of non-materializing result in scidb.
settings: A named list as join settings. E.g. list(algorithm = "'hash_replicate_right'")

Returns

A new arrayOp instance

Method `left_join()`

Left join two arrays: 'self' (left) and 'right'

Similar to dplyr::left_join, the result arrayOp performs a left join. For both left and right arrays, only selected fields are included in the result arrayOp. If no fields are selected, then all fields are treated as selected.

Usage

ArrayOpBase$left_join(
  right,
  by.x = NULL,
  by.y = NULL,
  by = NULL,
  left_alias = "_L",
  right_alias = "_R",
  join_mode = "equi_join",
  settings = NULL
)

Arguments

right: An arrayOp instance
by.x: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'.
by.y: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'.
by: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'. If not NULL, must be fields of both operands.
left_alias: Alias for left array to resolve potential conflicting fields in result
right_alias: Alias for right array to resolve potential conflicting fields in result
join_mode: String 'equi_join' or 'apply_join'. The second always replicates the right-hand array to all instances, with the benefit of non-materializing result in scidb.
settings: A named list as join settings. E.g. list(algorithm = "'hash_replicate_right'")

Returns

A new arrayOp instance

Method `right_join()`

Right join two arrays: 'self' (left) and 'right'

Similar to dplyr::right_join, the result arrayOp performs a right join. For both left and right arrays, only selected fields are included in the result arrayOp. If no fields are selected, then all fields are treated as selected.

Usage

ArrayOpBase$right_join(
  right,
  by.x = NULL,
  by.y = NULL,
  by = NULL,
  left_alias = "_L",
  right_alias = "_R",
  settings = NULL
)

Arguments

right: An arrayOp instance
by.x: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'.
by.y: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'.
by: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'. If not NULL, must be fields of both operands.
left_alias: Alias for left array to resolve potential conflicting fields in result
right_alias: Alias for right array to resolve potential conflicting fields in result
settings: A named list as join settings. E.g. list(algorithm = "'hash_replicate_right'")

Returns

A new arrayOp instance

Method `full_join()`

Full join two arrays: 'self' (left) and 'right'

Similar to dplyr::full_join, the result arrayOp performs a full join. For both left and right arrays, only selected fields are included in the result arrayOp. If no fields are selected, then all fields are treated as selected.

Usage

ArrayOpBase$full_join(
  right,
  by.x = NULL,
  by.y = NULL,
  by = NULL,
  left_alias = "_L",
  right_alias = "_R",
  settings = NULL
)

Arguments

right: An arrayOp instance
by.x: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'.
by.y: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'.
by: NULL or a string vector as join keys. If set to NULL, join keys are inferred as shared fields of 'left' and 'right'. If not NULL, must be fields of both operands.
left_alias: Alias for left array to resolve potential conflicting fields in result
right_alias: Alias for right array to resolve potential conflicting fields in result
settings: A named list as join settings. E.g. list(algorithm = "'hash_replicate_right'")

Returns

A new arrayOp instance

Method `semi_join()`

Return an arrayOp instance with same schema as self and content cells that match the cells of 'df_or_arrayop'.

Similar to dplyr::semi_join, the result has the same schema as the left operand 'self' and with content filtered by 'df_or_arrayop'.

params field_mapping, lower_bound and upper_bound, if provided, must be named list, where the.names are from the source array (i.e. self), and values are from the right operand df_or_arrayop

Usage

ArrayOpBase$semi_join(
  df_or_arrayop,
  field_mapping = NULL,
  lower_bound = NULL,
  upper_bound = NULL,
  mode = "auto",
  filter_threshold = 200L,
  upload_threshold = 6000L
)

Arguments

df_or_arrayop

An R data frame or arrayOp instance.

field_mapping

NULL or a named list of strings. Only applicable when mode is 'index_lookup' or 'cross_between', ignored in other modes.

lower_bound

NULL or a named list of strings. Only applicable when mode is 'filter' or 'cross_between'. Names of the list are fields of self, and value strings are fields or columns of the df_or_arrayop which are treated as lower bound to matching fields rather than exact match.

In 'filter' mode, the self fields in lower_bound can be any numeric fields. In 'cross_between' mode, the self fields in lower_bound must be array dimensions.

upper_bound

NULL or a named list of strings. Only applicable when mode is 'filter' or 'cross_between'. Names of the list are fields of self, and value strings are fields or columns of the df_or_arrayop which are treated as upper bound to matching fields rather than exact match.

In 'filter' mode, the self fields in upper_bound can be any numeric fields. In 'cross_between' mode, the self fields in upper_bound must be array dimensions.

mode

String of 'filter', 'cross_between', 'index_lookup', 'equi_join' or 'auto'

filter_threshold

A number below which the 'filter' mode is used unless a mode other than 'auto' is provided.

upload_threshold

A number below which the 'df_or_arrayop' data frame is compiled into a build literal array; otherwise uploaded to scidb as a regular array. Only applicable when 'df_or_arrayop' is an R data frame.

Returns

An arrayOp instance

Method `group_by()`

Create a new arrayOp with 'group by' fields

The result arrayOp is identical to self except for the 'group_by' fields. When called before summarize function, result arrayOp will be converted into grouped_aggregate operation.

Usage

ArrayOpBase$group_by(...)

Arguments

...: field names as strings or string vectors, which will be merged into a single string vector with c(...)

Returns

An arrayOp instance with group_by fields

Method `summarize()`

Create a new arrayOp with aggregated fields

Usage

ArrayOpBase$summarize(..., .dots = NULL)

Arguments

...: aggregation expressions in R syntax. Names of expressions are optional. If provided, names will be the fields of result arrayOp; otherwise field names are auto generated by scidb. Same syntax as ... in 'filter' and 'mutate' functions.
.dots: a list of aggregation expressions. Similar to '.dots' in 'mutate' and 'transmute'.

Returns

A new arrayOp instance

Method `set_auto_fields()`

Create a new ArrayOp instance that has auto incremented fields and/or anti-collision fields according to a template arrayOp

If the dimension count, attribute count and data types match between the source(self) and target, then no redimension will be performed, otherwise redimension on the source first.

Redimension mode requires all target fields exist on the source disregard of being attributes or dimensions. Redimension mode does not check on whether source data types match the target because auto data conversion occurs within scidb where necessary/applicable.

Usage

ArrayOpBase$set_auto_fields(
  target,
  source_auto_increment = NULL,
  target_auto_increment = NULL,
  anti_collision_field = NULL,
  join_setting = NULL,
  source_anti_collision_dim_spec = NULL
)

Arguments

target: A target ArrayOp the source data is written to.
source_auto_increment: A single named integer, a single string, or NULL. Eg. c(z=0) for field 'z' in the source (ie. self) starting from 0; or a single string 'z' equivalent to c(z=0). If NULL, assume it to be the only dimension in self, normally from an artificial dimension of a build literal or unpack operation.
target_auto_increment: a named number vector or string vector or NULL. where the name is a target field and value is the starting index. E.g. c(aid=0, bid=1) means to set auto fields 'aid', 'bid' according to the target fields of the same name. If 'target' doesn't have a cell, then default values start from 0 and 1 for aid and bid, respectively. A string vector c("aid", "bid") is equvilant to c(aid=0, bid=0). NULL means treat all missing fields (absent in self but present in target) as 0-based auto increment fields. Here the target_auto_increment param only affects the initial load when the field is still null in the target array.
anti_collision_field: a target dimension name which exsits only to resolve cell collision (ie. cells with the same dimension coordinate).
join_setting: NULL or a named list. When not NULL, it is converted to settings for scidb equi_join operator, only applicable when anti_collision_field is not NULL.
source_anti_collision_dim_spec: NULL or a string. If NULL, the dimension spec for the anti-collision dimension in source (self) is taken from self's schema. In rare cases, we need to set the dimension spec to control the chunk size in the 'redimension' operation, e.g. source_anti_collision_dim_spec = "0:*:0:123456"

Returns

A new arrayop instance

Method `update()`

Update the target array with self's content

Similar behavior to scidb insert operator. Require numbers of attributes and dimensions of self and target arrays match. Field names are irrelevant.

This function only returns an arrayOp with the update operation AFL encapsulated. No real action is performed in scidb until source$update(target)$execute() is called.

Usage

ArrayOpBase$update(target)

Arguments

target: An arrayOp instance where self's content is updated. Must be a persistent array, since it is meaningless to update an array operation.

Returns

A new arrayOp that encapsulates the update operation

Method `overwrite()`

Overwrite the target array with self's content

Similar behavior to scidb store operator. Require numbers of attributes and dimensions of self and target arrays match. Field names are irrelevant.

This function only returns an arrayOp with the update operation AFL encapsulated. No real action is performed in scidb until source$overwrite(target)$execute() is called.

Warning: Target's content will be erased and filled with self's content.

Usage

ArrayOpBase$overwrite(target)

Arguments

target: An arrayOp instance where self's content is written to. Must be a persistent array either preexist or does not exist in scidb.

Returns

A new arrayOp that encapsulates the overwrite operation

Method `delete_cells()`

Create a new ArrayOp instance that encapsulates a delete operation

Implemented by scidb delete operator.

Operators for any type of fields include ==, !=, %in%, %not_in%. To test whether a field is null, use unary operators: is_null, not_null.

Operators for numeric fields include: >, <, >=, <=

Usage

ArrayOpBase$delete_cells(
  ...,
  .expr = NULL,
  .regex_func = getOption("arrayop.regex_func", default = NULL),
  .ignore_case = getOption("arrayop.ignore_case", default = NULL)
)

Arguments

...: Filter expression(s) in R syntax. These expression(s) are not evaluated in R but first captured then converted to scidb expressions with appropriate syntax.
.expr: A single R expression, or a list of R exprs, or NULL. If provided, ... is ignored. Multiple exprs are joined by 'and'. This param is useful when we want to pass an already captured R expression.
.regex_func: deprecated option
.ignore_case: deprecated option

Returns

A new arrayOp

Method `project()`

Create a new ArrayOp instance with projected attributes

Usage

ArrayOpBase$project(...)

Arguments

...: Which field(s) to retain.

Returns

A new arrayOp

Method `select()`

Create a new ArrayOp instance with selected fields

NOTE: this does NOT change the to_afl output, but explicitly state which field(s) should be retained if used in a parent operation that changes its schema, e.g. inner_join, left_join, right_join and to_df.

The selected fields are passed on to derived ArrayOp instances.

In all join operations, if no field is explicitly selected, then all fields are assumed be retained. In to_df, if no field is explicitly selected, only the attributes are retrieved as data frame columns. In to_df_all, if no field is explicitly selected, it is equivalent to select all dimensions and attributes.

Usage

ArrayOpBase$select(...)

Arguments

...: Which field(s) to retain.

Returns

A new arrayOp

Method `to_df_all()`

Download query result of self's AFL string with all self's fields.

Usage

ArrayOpBase$to_df_all(arrow = NULL)

Arguments

arrow: whether to return data as an Arrow object; FALSE by default

Returns

An R data frame with columns from self's dimensions and attributes if no fields are selected, or the selected fields.

Method `to_df()`

Download query result of self's AFL string with all self's attributes.

Usage

ArrayOpBase$to_df(arrow = NULL)

Arguments

arrow: whether to return data as an Arrow object; FALSE by default

Returns

An R data frame with columns from self's attributes if no fields are selected, or the selected fields only.

Method `execute()`

Execute the AFL string for pure side effect without result returned.

Usage

ArrayOpBase$execute()

Returns

self

Method `persist()`

Persist array operation as scidb array

If self is a persistent array and no save_array_name provided, then self is returned.

Otherwise, save self's AFL as a new scidb array. This includes two cases:

self is persistent and save_array_name is provided, i.e. explicit persistence
self is array operation(s), then a new array is created regardless of save_array_name

From users perspective,

When we need to ensure a handle to a persistent array and do not care whether it is a new or existing array, we should leave out save_array_name to avoid unnecessary array copying. E.g. conn$array_from_df may return a build literal or uploaded persistent array, call conn$array_from_df(...)$persist() to ensure a persistent array.
When we need to backup an array, then provide a save_array_name explicitly.

Parameters .gc and .temp are only applicable when a new array is created.

Usage

ArrayOpBase$persist(save_array_name = NULL, .temp = FALSE, .gc = TRUE)

Arguments

save_array_name: NULL or String. The new array name to save self's AFL as. If NULL, the array name is randomly generated when a new array is created.
.temp: Boolean, default FALSE. Whether to create a temporary scidb array.
.gc: Boolean, default TRUE. Whether to remove the persisted scidb array once the encapsulating arrayOp goes out of scidb in R. Set to FALSE if we need to keep the array indefinitely.

Returns

A new arrayOp instance or self; or throw an exception if save_array_name is provided and the array already exists

Method `change_schema()`

Create a new ArrayOp instance whose schema is the same as the template.

This operation throws away any fields that do not exist in template while keeping the self's data of the matching fields.

Implemented by scidb redimension operator, but it allows for partial-fields match if strict=F.

Usage

ArrayOpBase$change_schema(template, strict = TRUE, .setting = NULL)

Arguments

template: an ArrayOp instance as the schema template.
strict: If TRUE(default), requires self has all the template fields.
.setting: a string vector, where each item will be appended to the redimension operand. E.g. .setting = c('false', 'cells_per_chunk: 1234') ==> redimension(source, template, false, cells_per_chunk: 1234)

Returns

A new arrayOp instance

Method `drop_dims()`

Create a new arrayOp by dropping dimensions of 'self'.

Use mode = 'unpack' to still keep an artificial dimension in result arrayOp. The dimension is 0-based, auto-incremented up until self$cell_count() - 1. 'unpack' mode is useful in taking advantage of this artificial dimension to auto populate other fields, e.g. in set_auto_fields.

Use mode = 'flatten' to return a scidb data frame which has no explicit dimensions.

Result arrayOp in both modes has attributes of self's attributes and dimensions.

Usage

ArrayOpBase$drop_dims(
  mode = "unpack",
  .chunk_size = NULL,
  .unpack_dim = dbutils$random_field_name()
)

Arguments

mode: String 'unpack' (default) or 'flatten'.
.chunk_size: NULL or an integer. Converted to the 'chunk_size' param in 'unpack' mode; and 'cells_per_chunk' in 'flatten' mode.
.unpack_dim: NULL (default) or string as the dimension if 'unpack' mode is chosen. NULL defaults to a random field name.

Returns

A new arrayOp instance

Method `sync_schema()`

Create a new arrayOp with actual schema from SciDB or 'self' if self$is_schema_from_scidb == T.

Useful in confirming the schema of complex array operations. If the array schema is already retrieved from SciDB, then just return self.

Usage

ArrayOpBase$sync_schema()

Returns

An arrayOp instance

Method `spawn()`

Create a new ArrayOp instance using 'self' as a template

This function is mainly for array schema string generation when we want to rename, add, and/or exclude certain fields of self, but still keep other unspecified fields unchanged.

Data types and dimension specs of existing fields are inherited from 'self' unless provided explicitly. New field data types default to NAs unless provided explicitly.

This function is normally used internally for arrayOp generation.

Usage

ArrayOpBase$spawn(
  afl_str = "spawned array_op (as template only)",
  renamed = NULL,
  added = NULL,
  excluded = NULL,
  dtypes = NULL,
  dim_specs = NULL
)

Arguments

afl_str: An AFL expression. In case of using the spawned result as a schema template only, the afl_str does not need to be provided. Otherwise, it should conform with the actual resultant arrayOp instance, which is very rare.
renamed: A list of renamed fields where names are old fields and values are new field names.
added: New fields added to result arrayOp. String vector or NULL.
excluded: Fields excluded from self. String vector or NULL.
dtypes: NULL or a named list of data types for fields of the result arrayOp, where names are field names, values (strings) are data types.
dim_specs: NULL or a named list of array dimension specs, where names are dimension names, values (strings) are dimension specs in scidb format.

Returns

A new arrayOp instance

Method `to_afl()`

AFL string encapsulated by of the self ArrayOp

AFL can be either an scidb array name or array operation(s) on array(s).

The ArrayOp instance may have 'selected' fields but they are not reflected in the result. 'selected' fields are not reflected here, but determines which fields are retained in to_df() calls.

Usage

ArrayOpBase$to_afl()

Returns

an AFL expression string

Method `to_schema_str()`

Return a schema representation of the ArrayOp <attr1:type1 [, attr2:type2 ...]> [dim1 [;dim2]]

Unless sync_schema() is called, the schema may be inferred locally in R to save round trips between R and SciDB server. SciDB data frames have hidden dimensions that start with $

Usage

ArrayOpBase$to_schema_str()

Returns

SciDB schema string

Method `limit()`

Create a new arrayOp that encapsulate AFL for the first n cells of 'self'

We still need to append a to_df call to download the result as data frame.

Usage

ArrayOpBase$limit(n = 5, skip = NULL)

Arguments

n: How many cells to take
skip: How many rows to skip before taking

Returns

A new ArrayOp instance

Method `cell_count()`

Return the number of cells of 'self'

Usage

ArrayOpBase$cell_count()

Returns

A number of cells if the AFL that 'self' encapsulates is run.

Method `summarize_array()`

Return a data frame of the summary of the 'self' array

Implemented by scidb 'summarize' operator

Usage

ArrayOpBase$summarize_array(by_attribute = FALSE, by_instance = FALSE)

Arguments

by_attribute: Summarize by array attributes
by_instance: Summarize by array scidb instances
return: A data frame of the 'self' array summary

Method `list_versions()`

Return a data frame of all self's versions

Usage

ArrayOpBase$list_versions()

Returns

An R data frame with columns: version_id and timestamp

Method `version()`

Get an arrayOp instance that encapsulates a version snapshot of a persistent scidb array

The function does not perform version check in scidb. It only construct an arrayOp locally to represent a specific version. If a non-existent version_id is later used in scidb related operations, an error will be thrown by SciDB.

Usage

ArrayOpBase$version(version_id)

Arguments

version_id: A number of the array version_id

Returns

An arrayOp instance with the same schema as self

Method `is_persistent()`

Returns whether the current arrayOp instance encapsulates a persistent scidb array name that may or may not exist on the scidb server

No checking with scidb server is performed. Only validate the arrayOp's AFL with regex and see if it matches an array name. E.g. "myNamespace.myArray" or "myArrayInPublicNamespace".

Usage

ArrayOpBase$is_persistent()

Returns

TRUE or FALSE

Method `exists_persistent_array()`

Returns whether the current arrayOp instance encapsulates a persistent scidb array that exists on the scidb server

If current arrayOp encapsulates an array operation, then it returns FALSE without checking with scidb server.

Usage

ArrayOpBase$exists_persistent_array()

Returns

TRUE or FALSE

Method `array_meta_data()`

Download the array meta as an R data frame

The array metadata is retrieved from executing the scidb 'show' operator in the array namespace and match for the current array name. Array metadata include fields: "name", "uaid", "aid", "schema", "availability", "temporary", "namespace", "distribution", "etcomp"

Usage

ArrayOpBase$array_meta_data()

Returns

An R data frame

Method `remove_versions()`

Remove array versions of self

Only applicable to persistent arrays. Warning: This function will be executed effectively in scidb without extra 'execute()' and cannot be undone.

Usage

ArrayOpBase$remove_versions(version_id = NULL)

Arguments

version_id: NULL or a number. When set to NULL, all array versions are removed except for the latest one. When set to an number, must be a valid version_id of self, in which case all versions up to the 'version_id' are removed.

Returns

NULL

Method `remove_array()`

Remove array versions of self

Only applicable to persistent arrays. Warning: This function will be executed effectively in scidb without extra 'execute()' and cannot be undone.

Usage

ArrayOpBase$remove_array()

Returns

NULL

Method `finalize()`

A finalize function executed when the 'self' instance is garbage collected in R

If an arrayOp is marked as .gc = T, then it will be removed from scidb when this function is executed.

We don't normally call this function except in testing.

Usage

ArrayOpBase$finalize()

ArrayOp base class that encapsulates a scidb array or array operations

Details

Active bindings

Methods

Public methods

Method new()

Usage

Arguments

Method inherit_refs()

Usage

Arguments

Method filter()

Usage

Arguments

Returns

Method mutate()

Usage

Arguments

Returns

Method transmute()

Usage

Arguments

Returns

Method mutate_by()

Usage

Arguments

Returns

Method inner_join()

Usage

Arguments

Returns

Method left_join()

Usage

Arguments

Returns

Method right_join()

Usage

Arguments

Returns

Method full_join()

Usage

Arguments

Returns

Method semi_join()

Usage

Arguments

Returns

Method group_by()

Usage

Arguments

Returns

Method summarize()

Usage

Arguments

Returns

Method set_auto_fields()

Usage

Arguments

Returns

Method update()

Usage

Arguments

Returns

Method overwrite()

Usage

Arguments

Returns

Method delete_cells()

Usage

Arguments

Returns

Method project()

Usage

Arguments

Returns

Method select()

Usage

Arguments

Returns

Method to_df_all()

Method `new()`

Method `inherit_refs()`

Method `filter()`

Method `mutate()`

Method `transmute()`

Method `mutate_by()`

Method `inner_join()`

Method `left_join()`

Method `right_join()`

Method `full_join()`

Method `semi_join()`

Method `group_by()`

Method `summarize()`

Method `set_auto_fields()`

Method `update()`

Method `overwrite()`

Method `delete_cells()`

Method `project()`

Method `select()`

Method `to_df_all()`

Method `to_df()`

Method `execute()`

Method `persist()`

Method `change_schema()`

Method `drop_dims()`

Method `sync_schema()`

Method `spawn()`

Method `to_afl()`

Method `to_schema_str()`

Method `limit()`

Method `cell_count()`

Method `summarize_array()`

Method `list_versions()`

Method `version()`

Method `is_persistent()`

Method `exists_persistent_array()`

Method `array_meta_data()`

Method `remove_versions()`

Method `remove_array()`

Method `finalize()`