What’s new (for SciDB 15.12)

This is likely to be the last version of the shim. Bryan will be switching the R client to use JDBC, and all other clients are encouraged to migrate too.

Support for the SciDB advanced I/O toolbox (aio_tools)

Use the -a command line flag, or add the line

aio=1

to the /var/lib/shim.conf file to save data using the SciDB aio_tools plugin (required).

SciDB native authentication

But only for the /execute_query and /cancel services (the only services that really talk directly to SciDB). If you need to protect the other shim services, consider using digest authentication.

Streaming and compression options no longer supported

The execute_query service no longer supports either option, if you use these options they will be ignored.

Overview

Shim is a web service that exposes a very simple API for clients to interact with SciDB over HTTP connections. The API consists of a small number of services (described in detail below), including: /new_session, /release_session, /execute_query, /cancel, /read_lines, /read_bytes, /upload_file, /upload, /version.

There are more direct ways to talk to SciDB over a network. See https://github.com/artyom-smirnov/scidb4py for a great example of direct network communication with SciDB using Python and Google protocol buffers (written by Artyom Smirnov, one of the core SciDB developers). Artyom’s approach is comparatively low-level, but possibly more efficient than shim. Shim is designed to be a convenient and easy way to talk to SciDB. Note that both shim and Artyom’s Python interface are community open source projects; they are not official components of SciDB.

Shim clients begin by requesting a session ID from the service, then running a query and releasing the session ID when done. Session IDs are distinct from SciDB query IDs–a shim session ID groups a SciDB query together with server resources for input and output to the client.

Configuration

Shim runs as a system service or can be invoked directly from the command line. See the shim manual page for command-line options (type man shim from a terminal). Service configuration is determined by the /var/lib/shim/conf configuration file. The default conf file is a sample that displays the default configuration options, which are listed as one key=value pair per line. Available options include:

#ports=8080,8083s
scidbport=1239
instance=0
tmp=/home/scidb/scidbdata/000/0/tmp
#user=root
#max_sessions=50
#timeout=60
#aio=1

Each option is described below.

Ports and Network Interfaces

Shim listens on default ports 8080 (open, not encrypted), and 8083 (TLS encrypted) on all available network interfaces. Ports and listening interfaces are configured with the command line ‘-p’ option or with the ‘ports=’ option in the /var/lib/shim/conf file when shim is run as a service. The ports/interface specification uses the following syntax:

[address:]port[s][,[address:]port[s]][,...]

where:

  • address indicates an optional I.P. address associated with a specific available network device, only specify this if you want to restrict shim to operate on a specific network device.
  • port indicates a required available port number
  • s is an optional single character ‘s’ suffix indicating that TLS/SSL should be used on that port.

Here are some examples of possible port configurations:

5555s Listen only on port 5555 (TLS/SSL).
127.0.0.1:8080,1234s List on port 8080 but only on the local loopback interface; listen on port 1234(TLS/SSL) on all interfaces.

SciDB Port

Shim runs on the same computer as a SciDB coordinator. Set the ‘scidbport’ option to select the coordinator database port to locally connect to. The default SciDB database port value is 1239 (see the SciDB configuration manual for more information). Since any SciDB instance can act as a query coordinator, it is possible to configure multiple shim services, for example one shim service per computer IP address.

Instance

Set the SciDB instance number to use as a query coordinator. Make sure that this option is set together with the corresponding SciDB port number for the instance, and also set a corresponding temporary I/O location that the instance has read/write access to.

Temporary I/O space

Shim’s default behavior caches the output of SciDB queries in files on the SciDB server; set that file directory location with the config file tmp option or the command-line -t argument. This temporary directory is also used to upload data from clients over the http connection for input into SciDB. Select a directory that is writable by the shim user (see the user option).

If you install shim from an RPM or Debian package as a service, the package will configure shim to use a SciDB data directory for temporary storage. You can edit the config file and restart shim to change that default.

User

The user that the shim service runs under (shim can run as a non-root user).

Max sessions

Set the maximum number of concurrent shim sessions, beyond which clients receive an HTTP ‘resource unavailable’ error.

Timeout

Set the time in seconds after which an inactive session is considered timed out and a candidate for resource de-allocation. After sessions time out their resources are not freed unless the need to be to satisfy additional session demands. See the lazy timeout section below. Active sessions that are waiting on SciDB query results or transferring data are not subject to timeout and may run indefinitely.

AIO plugin

Set aio=1 in the config file to enable fast AIO save using the SciDB aio_tools plugin.

TLS/SSL Certificate

Shim supports TLS/SSL encryption. Packaged versions of shim (RPM and Debian packages) generate a self-signed certificate and 4096-bit RSA key when shim is installed. The certificate is placed in /var/lib/shim/ssl_cert.pem. If you would prefer to use a different certificate, replace the automatically generated one.

API Reference

Examples use the URL http://localhost:8080 or https://localhost:8083 (TLS) below. Parameters are required unless marked optional. All shim API services support CORS, see http://www.w3.org/TR/cors/ .

Limits

HTTP 1.1 clients or greater are required.

All HTTP query parameters are passed to the service as string values. They are limited to a maximum of 4096 characters unless otherwise indicated (a notable exception is the SciDB query string parameter, limited to 262,144 characters).

HTTP query string parameters that represent numbers have limits. Unless otherwise indicated whole-number values (session ID, number of bytes to return, etc.) are interpreted by shim as signed 32-bit integers and are generally limited to values between zero and 2147483647. Values outside that range will result in an HTTP 400 error (invalid query).

Response codes

Possible responses for each URI are listed below. HTTP status code 200 always indicates success; other standard HTTP status codes indicate various errors. The returned data may be UTF-8 or binary depending on the request and is always returned using the generic application/octet-stream MIME type. Depending on the request, data may used chunked HTTP transfer encoding and may also use gzip content encoding.

Basic digest access authentication

Shim supports basic digest access authentication. (See https://en.wikipedia.org/wiki/Digest_access_authentication and the references therein for a good description of the method.) Enable digest access authentication by creating an .htpasswd file in shim’s default /var/lib/shim/wwwroot directory (the .htpasswd file must be located in shim’s wwwroot directory, which can be changed with the command line switch -r). The format of the file must be:

username1:password1
username2:password2
...

Use plain text passwords in the file, and consider changing the permissions of the file to restrict access. Delete the .htpasswd file to disable basic digest access authentication.

Basic digest authentication works on plain or TLS-encrypted connections but can not be used in combination with SciDB authentication (see below).

TLS/SSL encryption

Shim optionally exposes both open and encrypted (HTTPS/TLS) services. You can provide a signed certificate in the /var/lib/shim directory. A generic random unsigned certific is automatically generated for you if you install shim using either the .deb or .rpm package installer.

SciDB authentication

See the /execute_query service documentation below.

Generic API Workflow

/new_session
/execute_query
/read_lines or /read_bytes
/release_session

API Service Endpoints

The R examples below use the httr package. We try to illustrate API calls with real examples using either curl or R. See https://github.com/Paradigm4/shim/tree/master/tests for additional examples.

/version

DESCRIPTION Print the shim code version string
METHOD GET
PARAMETERS
RESPONSE Success HTTP 200 and text version string value in text/plain payload
EXAMPLE (curl)
curl -f -s http://localhost:8080/version 
v15.12-6-gbeea-dirty
EXAMPLE (R)
httr::GET("http://localhost:8080/version")
Response [http://localhost:8080/version]
  Date: 2016-05-05 09:36
  Status: 200
  Content-Type: text/plain
  Size: 20 B
No encoding supplied: defaulting to UTF-8.

/new_session

DESCRIPTION Request a new HTTP session from the service.
METHOD GET
PARAMETERS
RESPONSE
  • Success: HTTP 200 and text session ID value in text/plain payload
  • Failure (out of resources/server unavailable): HTTP 503
  • Invalid request: HTTP 400
EXAMPLE (curl)
curl -s http://localhost:8080/new_session 
31
EXAMPLE (R)
id = httr::GET("http://localhost:8080/new_session")
(id = rawToChar(id$content))
[1] "32"

/release_session

DESCRIPTION Release an HTTP session.
METHOD GET
PARAMETERS
  • id an HTTP session ID obtained from /new_session
RESPONSE
  • Success: HTTP 200
  • Failure (Session not found): HTTP 404
  • Failure (invalid http query): HTTP 400
  • EXAMPLE (R)
    id = httr::GET("http://localhost:8080/new_session")
    (id = rawToChar(id$content))
    [1] "33"
    httr::GET(sprintf("http://localhost:8080/release_session?id=%s",id))
    Response [http://localhost:8080/release_session?id=33]
      Date: 2016-05-05 09:36
      Status: 200
      Content-Type: text/plain
    <EMPTY BODY>
    EXAMPLE (curl)
    s=`curl -s "http://localhost:8080/new_session"`
    curl -s "http://localhost:8080/release_session?id=${s}"

    /execute_query

    DESCRIPTION Execute a SciDB AFL query.
    METHOD GET
    PARAMETERS
    • id an HTTP session ID obtained from /new_session
    • user optional SciDB authentication user name (TLS connections only)
    • password optional encoded SciDB authentication password (TLS connections only)
    • query AFL query string, encoded for use in URL as required, limited to a maximum of 262,144 characters
    • save optional SciDB save format string, limited to a maximum of 4096 characters; Save the query output in the specified format for subsequent download by read_lines or read_bytes. If the save parameter is not specified, don’t save the query output.
    • release optional 0 or 1: if 1 then release_session as soon as query completes. The default value is 0 if not specified (see additional notes below).
    RESPONSE
  • Success: HTTP 200 text/plain (SciDB Query ID)
  • Failure (SciDB not available error): HTTP 503 text/plain (ERROR TEXT)
  • Failure (SciDB query error): HTTP 500 text/plain (SCIDB ERROR TEXT)
  • Failure (out of memory error): HTTP 507 text/plain (SCIDB ERROR TEXT)
  • Failure (Invalid session): HTTP 404
  • Failure (invalid http query): HTTP 400
  • Not authorized (encrypted only): HTTP 401
  • NOTES Shim only supports AFL queries.

    Remember to URL-encode the SciDB query string.

    Specify optional user and password information for SciDB authentication. The password must be encoded as base64( sha512("plain text password") ) – authentication requires a TLS encrypted connection.

    500 and 503 errors result in removal of the web session ID and related resources (thus, release_session does not have to be called after such an error).

    This method blocks until the query completes.

    Do not specify the option release=1 when the save option is also set, or output will not be available to read_bytes or read_lines. Instead, explicitly call release_session after reading is complete.

    EXAMPLE (R)
    # Obtain a shim session ID
    id = httr::GET("http://localhost:8080/new_session")
    session = rawToChar(id$content)
    
    # Construct the query request
    query = sprintf("http://localhost:8080/execute_query?id=%s&query=consume(list())&release=1",
                    session)
    ans = httr::GET(query)
    
    # The response in this example is just the SciDB query ID:
    (rawToChar(ans$content))
    [1] "1462455412676688501"
    EXAMPLE (curl)
    s=`curl -s "http://localhost:8080/new_session"`
    curl -s "http://localhost:8080/execute_query?id=${s}&query=consume(list())&release=1"
    1462455412784226760
    EXAMPLE w/ERROR (R)
    id = httr::GET("http://localhost:8080/new_session")
    session = rawToChar(id$content)
    query = sprintf("http://localhost:8080/execute_query?id=%s&query=consume(42)&release=1",
                    session)
    httr::GET(query)
    Response [http://localhost:8080/execute_query?id=37&query=consume(42)&release=1]
      Date: 2016-05-05 09:36
      Status: 500
      Content-Type: text/html
      Size: 279 B
    No encoding supplied: defaulting to UTF-8.
    UserQueryException in file: src/query/parser/Translator.cpp function: ma...
    Error id: scidb::SCIDB_SE_QPROC::SCIDB_LE_WRONG_OPERATOR_ARGUMENT2
    Error description: Query processor error. Parameter must be array name o...
    consume(42)
    See /read_lines and /read_bytes below for running queries that return results and downloading them.

    /cancel

    DESCRIPTION Cancel a SciDB query associated with a shim session ID.
    METHOD GET
    PARAMETERS
    • id an HTTP session ID obtained from /new_session
    • user optional SciDB authentication user name (TLS connections only)
    • password optional encoded SciDB authentication password (TLS connections only)
    NOTES

    Specify optional user and password information for SciDB authentication. The password must be encoded as base64( sha512("plain text password") ) – authentication requires a TLS encrypted connection.

    RESPONSE
  • Success: HTTP 200
  • Failure (session not found): HTTP 404
  • Failure (invalid http query): HTTP 400
  • Not authorized (encrypted only): HTTP 401
  • Failure (could not connect to SciDB): HTTP 503
  • EXAMPLE (R)
    # An example cancellation of a query associated with shim ID 19 (not run)
    httr::GET("http://localhost:8080/cancel?id=19")

    /read_lines

    DESCRIPTION Read text lines from a query that saves its output.
    METHOD GET
    PARAMETERS
    • id an HTTP session ID obtained from /new_session
    • n the maximum number of lines to read and return between 0 and 2147483647.
    RESPONSE
  • Success: HTTP 200 followed by application/octet-stream query result (up to n lines)
  • Failure (invalid HTTP query string): HTTP 400
  • Failure (session not found): HTTP 404
  • Failure (end of file): HTTP 410
  • Failure (invalid request): HTTP 414
  • Failure (SciDB server error): HTTP 500
  • Failure (could not connect to SciDB server error): HTTP 503
  • Failure (server out of memory): HTTP 507
  • NOTES Set n=0 to download the entire output buffer. You should almost always set n=0.

    Be sure to properly url-encode special characters like the plus sign (+) in the request.

    When n>0, iterative requests to read_lines are allowed, and will return at most the next n lines of output. Use the 410 error code to detect end of file output. Don’t use this option if you can avoid it.

    Note that query results are always returned as application/octet-stream.

    EXAMPLE (curl)
    s=`curl -s "http://localhost:8080/new_session"`
    curl -s "http://localhost:8080/execute_query?id=${s}&query=list('functions')&save=dcsv"
    curl -s "http://localhost:8080/read_lines?id=${s}&n=10"
    curl -s "http://localhost:8080/release_session?id=${s}"
    1462455412936613490{No} name,profile,deterministic,library
    {0} '%','double %(double,double)',true,'scidb'
    {1} '%','int16 %(int16,int16)',true,'scidb'
    {2} '%','int32 %(int32,int32)',true,'scidb'
    {3} '%','int64 %(int64,int64)',true,'scidb'
    {4} '%','int8 %(int8,int8)',true,'scidb'
    {5} '%','uint16 %(uint16,uint16)',true,'scidb'
    {6} '%','uint32 %(uint32,uint32)',true,'scidb'
    {7} '%','uint64 %(uint64,uint64)',true,'scidb'
    {8} '%','uint8 %(uint8,uint8)',true,'scidb'

    /read_bytes

    DESCRIPTION Read bytes lines from a query that saves its output.
    METHOD GET
    PARAMETERS
    • id an HTTP session ID obtained from /new_session
    • n the maximum number of bytes to read and return between 0 and 2147483647.
    RESPONSE
  • Success: HTTP 200 followed by application/octet-stream query result (up to n lines)
  • Failure (invalid HTTP query string): HTTP 400
  • Failure (session not found): HTTP 404
  • Failure (end of file): HTTP 410
  • Failure (invalid request): HTTP 414
  • Failure (SciDB server error): HTTP 500
  • Failure (could not connect to SciDB server error): HTTP 503
  • Failure (server out of memory): HTTP 507
  • NOTES Set n=0 to download the entire output buffer. You should almost always set n=0.

    Be sure to properly url-encode special characters like the plus sign (+) in the request.

    When n>0, iterative requests to read_bytes are allowed, and will return at most the next n lines of output. Use the 410 error code to detect end of file output. Don’t use this option if you can avoid it.

    Note that query results are always returned as application/octet-stream.

    EXAMPLE (curl)
    # Obtain a new shim session ID
    s=`curl -s "http://localhost:8080/new_session"`
    
    # The URL-encoded SciDB query in the next line is just:
    # build(<x:double>[i=1:10,10,0],u)
    curl -s "http://localhost:8080/execute_query?id=${s}&query=build(%3Cx:double%3E%5Bi=1:10,10,0%5D,i)&save=(double)"
    
    # Pass the double-precision binary result through the `od` program to view:
    curl -s "http://localhost:8080/read_bytes?id=${s}" | od -t f8
    
    # Release the session
    curl -s "http://localhost:8080/release_session?id=${s}"
    14624554130491278940000000                        1                        2
    0000020                        3                        4
    0000040                        5                        6
    0000060                        7                        8
    0000100                        9                       10
    0000120

    /upload_file

    DESCRIPTION Upload a file to the HTTP service using a multipart/file POST method.
    METHOD POST
    PARAMETERS
    • id an HTTP session ID obtained from /new_session
    • A valid multipart/file POST body – see the example below
    RESPONSE
  • Success: HTTP 200 and the name of the file uploaded to the server in a text/plain response.
  • Failure (invalid HTTP query string): HTTP 400
  • Failure (Session not found): HTTP 404
  • Failure (Server error): HTTP 500
  • NOTES Try to avoid using this method. It’s fairly slow to transfer data and difficult to get the POST body message right. Instead use the faster and simpler /upload method shown below.
    EXAMPLE (curl)
    # Upload 5 MB of random bytes
    id=$(curl -s  "http://localhost:8080/new_session")
    dd if=/dev/urandom bs=1M count=5  | \
      curl -s --form "fileupload=@-;filename=data" "http://localhost:8080/upload_file?id=${id}"
    curl -s "http://localhost:8080/release_session?id=${id}"
    ## 5+0 records in
    ## 5+0 records out
    ## 5242880 bytes (5.2 MB) copied, 0.933192 s, 5.6 MB/s
    ## /dev/shm/0/0/shim_input_buf_amMI5x

    /upload

    DESCRIPTION Upload data to the HTTP service using a basic POST method.
    METHOD POST
    PARAMETERS
    • id an HTTP session ID obtained from /new_session
    • A valid POST body – see the example below
    RESPONSE
  • Success: HTTP 200 and the name of the file uploaded to the server in a text/plain response.
  • Failure (invalid HTTP query string): HTTP 400
  • Failure (Session not found): HTTP 404
  • Failure (Server error): HTTP 500
  • NOTES Use the returned server-side file name in later calls, for example to execute_query.

    This method is faster and easier to use than the older /upload_file method.

    EXAMPLE (curl)
    id=$(curl -s "http://localhost:8080/new_session")
    
    # Upload 5 MB of random bytes
    dd if=/dev/urandom bs=1M count=5  | \
      curl -s --data-binary @- "http://localhost:8080/upload?id=${id}"
    
    curl -s "http://localhost:8080/release_session?id=${id}"
    ## 5+0 records in
    ## 5+0 records out
    ## 5242880 bytes (5.2 MB) copied, 0.928488 s, 5.6 MB/s
    ## /dev/shm/0/0/shim_input_buf_TJq1G0
    EXAMPLE (R)
    # Obtain a shim session ID
    id = httr::GET("http://localhost:8080/new_session")
    session = rawToChar(id$content)
    
    # Upload a character string:
    httr::POST(sprintf("http://localhost:8080/upload?id=%s", session), body="Hello shim")
    ## Response [http://localhost:8080/upload?id=42]
    ##   Date: 2016-05-05 09:36
    ##   Status: 200
    ##   Content-Type: text/plain
    ##   Size: 34 B
    ## No encoding supplied: defaulting to UTF-8.
    # Release our session ID
    httr::GET(sprintf("http://localhost:8080/release_session?id=%s", session))
    ## Response [http://localhost:8080/release_session?id=42]
    ##   Date: 2016-05-05 09:36
    ##   Status: 200
    ##   Content-Type: text/plain
    ## <EMPTY BODY>

    Orphaned Sessions

    Shim limits the number of simultaneous open sessions. Absent-minded or malicious clients are prevented from opening too many new sessions repeatedly without closing them (which could eventually result in denial of service). Shim uses a lazy timeout mechanism to detect unused sessions and reclaim them. It works like this:

    The above scheme is called lazy as sessions are only harvested when a new session request is unable to be satisfied. Until that event occurs, sessions are free to last indefinitely.

    Shim does not protect against uploading gigantic files nor from running many long-running SciDB queries. The service may become unavailable if too many query and/or upload operations are in flight; an HTTP 503 (Service Unavailable) error code is returned in that case.

    Copyright (C) 2015, Paradigm4.