The Shim HTTP API for SciDB

Shim is a network service that exposes a very simple API for clients to interact with SciDB over HTTP connections. The API consists of a small number of URIs (described in detail below), including: /login, /new_session, /release_session, /execute_query, /cancel, /read_lines, /read_bytes, /upload_file, /version, and /logout.

Note that there are more direct ways to talk to SciDB over a network. See https://github.com/artyom-smirnov/scidb4py for a great example of direct network communication with SciDB using Python and Google protocol buffers (written by Artyom Smirnov, one of the core SciDB developers). The protocol buffers approach is comparatively low-level, but possibly more efficient. Shim is designed to be a convenient and super easy way to talk to SciDB. Note that both shim and Artyom's Python interface are community open source projects; they are not official components of SciDB.

Shim clients begin by requesting a session ID from the service, then running a query and releasing the session ID when done. Note that session IDs are distinct from SciDB query IDs--a session ID groups a SciDB query together with server resources for input and output to the client.

Configuration

Shim runs as a system service or may be invoked directly from the command line. See the shim manual page for command-line options (type 'man shim' from a terminal). Service configuration is determined by the /var/lib/shim/conf configuration file. The default conf file is a sample that displays the default configuration options, which are listed as one key=value pair per line. Available options include:

ports=8080,80803s
auth=login
scidbport=1239
user=root
tmp=/tmp
Each option is described below.

Ports and Network Interfaces

Shim listens on default ports 8080 (open, not encrypted), and 8083 (TLS encrypted) on all available network interfaces. Ports and listening interfaces are configured with the command line '-p' option or with the 'ports=' option in the /var/lib/shim/conf file when shim is run as a service. The ports/interface specification uses the following syntax:

[address:]port[s][,[address:]port[s]][,...]
where: Here are some examples of possible port configurations:

5555s      Listen only on port 5555 (TLS/SSL).
127.0.0.1:8080,1234sList on port 8080 but only on the local loopback interface; listen on port 1234(TLS/SSL) on all interfaces.

SciDB Port

Shim runs on the same computer as a SciDB coordinator. Set the 'scidbport' option to select the coordinator database port to locally connect to. The default SciDB database port value is 1239 (see the SciDB configuration manual for more information).

Authentication

The shim API uses PAM-based authentication for most API functions over SSL connections. Shim may use any available PAM service (usually listed in /etc/pam.d), and defaults to the login service. That means client authentication is controlled by the usernames and passwords available to the computer that shim is installed on.

Basic digest access authentication is supported too (see https://en.wikipedia.org/wiki/Digest_access_authentication). See below for examples.

User

The user that the shim service runs under. Shim can run as a non-root user, but then SSL authenticated port logins are limited to the user that shim is running under.

Temporary I/O space and streaming

Shim's default behavior caches the output of SciDB queries in files on the SciDB server; set that file directory location with the config file tmp option or the command-line -t argument. This temporary director is also used to upload data from clients over the http connection for input into SciDB. Select a directory that is writable by the shim user (see the user option).

Specify stream=1 in the /execute_query service endpoint to stream SciDB results directly to the client using the HTTP 1.1 chunked transfer encoding mechanism. With this option, a named pipe is created in the temporary I/O directory instead of a file. See the /execute_query doc below for more information.

TLS/SSL Certificate

Shim supports TLS/SSL encryption. Packaged versions of shim (RPM and deb packages) generate a self-signed certificate and 4096-bit RSA key when shim is installed. The certificate is placed in /var/lib/shim/ssl_cert.pem. If you would prefer to use a different certificate, replace the automatically generated one.


API Reference

Examples use the URL http://localhost:8080 or https://localhost:8083 (TLS) below. Parameters are required unless marked optional. All shim API services support CORS.

Limits

HTTP 1.1 clients or greater are required.

All HTTP query parameters are passed to the service as string values. They are limited to a maximum of 4096 characters unless otherwise indicated (a notable exception is the SciDB query string parameter, limited to 1,000,000 characers).

HTTP query string parameters that represent numbers have limits. Unless otherwise indicated whole-number values (session ID, number of bytes to return, etc.) are interpreted by shim as signed 32-bit integers and are generally limited to values between zero and 2147483647. Values outside that range will result in an HTTP 400 error (invalid query).

Response codes

Possible responses for each URI are listed below. HTTP status code 200 always indicates success; other standard HTTP status codes indicate various errors. Returned data may be ASCII or binary depending on the request.

Authentication

Basic digest access authentication

Shim supports basic digest access authentication. (See https://en.wikipedia.org/wiki/Digest_access_authentication and the references therein for a good description of the method.) Enable digest access authentication by creating an .htpasswd file in shim's default /var/lib/shim/wwwroot directory (the .htpasswd file must be located in shim's wwwroot directory, which can be changed with the command line switch -r. The format of the file must be:

username1:password1
username2:password2
...
Use plain text passwords in the file, and consider changing the permissions of the file to restrict access. Delete the .htpasswd file to disable basic digest access authentication.

TLS/SSL encryption and PAM authentication

Shim optionally exposes both open and encrypted (SSL or TLS) services. Access to most of the API over an encrypted connection requires authentication. API URIs that require authentication include: /new_session /upload_file /read_lines /read_bytes /execute_query and /cancel.

Authentication is very simple to use:

  1. Contact the /login service over SSL with username and password query string parameters. The /login service returns an authentication token if its successful.
  2. Use all the other URI services normally, but append the auth=<token> query string parameter, replacing <token> with the response from the /login service above.
  3. Log out of the system by contacting the /logout service with your authentication token.
Shim's authentication mechanism only works over encrypted connections. Username and password query string parameters only appear in encrypted form over the network, and are never logged by the shim service log.

Here is a full example using the wget program from the command line talking to shim running on localhost, port 8083.

# Log in with username=scidb password=paradigm4 (assuming that such a user exists on the system that SciDB is running on):
wget -q --no-check-certificate -O - "https://localhost:8083/login?username=scidb&password=paradigm4"
90362228960         # (authentication token returned on success)

# As a quick check, we'll try to obtain a new session without using the auth parameter (this will return an error):
wget --no-check-certificate -O - "https://localhost:8083/new_session"
HTTP request sent, awaiting response... 400 ERROR

# Now, let's try again with auth and run a simple query and this time get a new session ID:
wget -q --no-check-certificate -O - "https://localhost:8083/new_session?auth=90362228960"
0    # (our session ID)
wget -q --no-check-certificate -O - "https://localhost:8083/execute_query?id=0&query=list('functions')&save=dcsv&auth=90362228960"
wget -q --no-check-certificate -O - "https://localhost:8083/read_lines?id=0&n=0&auth=90362228960" | head -n 10
{No} name,profile,deterministic,library
{0} '%','double %(double,double)',true,'scidb'
{1} '%','int16 %(int16,int16)',true,'scidb'
{2} '%','int32 %(int32,int32)',true,'scidb'
{3} '%','int64 %(int64,int64)',true,'scidb'
{4} '%','int8 %(int8,int8)',true,'scidb'
{5} '%','uint16 %(uint16,uint16)',true,'scidb'
{6} '%','uint32 %(uint32,uint32)',true,'scidb'
{7} '%','uint64 %(uint64,uint64)',true,'scidb'
{8} '%','uint8 %(uint8,uint8)',true,'scidb'
{9} '*','double *(double,double)',true,'scidb'

wget -q --no-check-certificate -O - "https://localhost:8083/release_session?id=0&auth=90362228960"

# ... do some other things, always adding auth= ...  When you're done, log out of the system:
wget -q --no-check-certificate -O - "https://localhost:8083/logout?auth=90362228960"
Again, simply append your authentication token to your HTTP requests to use the authenticated service.

Example API Workflow

  1. /new_session
  2. /execute_query
  3. /read_lines or /read_bytes
  4. /release_session

Example Authenticated API Workflow

  1. /login
  2. /new_session
  3. /execute_query
  4. /read_lines or /read_bytes
  5. /release_session
  6. ...
  7. /logout

API Service URIs

/version

DESCRIPTION Print the shim code version string
METHODGET
PARAMETERS
RESPONSE Success: HTTP 200 and text version string value in text/plain payload
EXAMPLE http://localhost:8080/version

HTTP/1.1 200 OK
Content-Length: 16
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Content-Type: text/plain

v14.3-15-gd71f

/new_session

DESCRIPTION Request a new HTTP session from the service.
METHODGET
PARAMETERS auth optional authentication token (required for encrypted connections).
RESPONSE Success: HTTP 200 and text session ID value in text/plain payload
Failure (out of resources/server unavailable): HTTP 503
Invalid request (encrypted only -- this means auth is missing): HTTP 400
Not authorized (encrypted only): HTTP 401
EXAMPLE http://localhost:8080/new_session

HTTP/1.0 200 OK
Content-Length: 3
Content-Type: text/plain

0

/release_session

DESCRIPTION Release an HTTP session
METHODGET
PARAMETERS id (an HTTP session ID)
auth optional authentication token (required for encrypted connections).
RESPONSE Success: HTTP 200
Failure (Session not found): HTTP 404
Failure (invalid http query): HTTP 400
Not authorized (encrypted only): HTTP 401
EXAMPLE http://localhost:8080/release_session?id=0

HTTP/1.0 200 OK
Content-Length: 0
Content-Type: text/plain


/execute_query

DESCRIPTION Execute a SciDB AFL query
METHODGET
PARAMETERS id (an HTTP session ID)
query (AFL query string, encoded for use in URL as required, limited to a maximum of 1,000,000 characters)
save optional (format string, limited to a maximum of 4096 characters) Save the query output in the specified, format for subsequent download by read_lines or read_bytes. If the save parameter is not specified, don't save the query output.
release optional 0 or 1: if 1 then release_session as soon as query completes. The default value is 0 if not specified (see additional notes below).
stream optional 0 or 1: if 1 then stream query result; otherwise send query result through a server-side output file (the default). steam=1 also sets release=1.
auth optional authentication token (required for encrypted connections).
RESPONSE Success: HTTP 200 text/plain (SciDB Query ID)
Failure (SciDB not available error): HTTP 503 text/plain (ERROR TEXT)
Failure (SciDB query error): HTTP 500 text/plain (SCIDB ERROR TEXT)
Failure (out of memory error): HTTP 507 text/plain (SCIDB ERROR TEXT)
Failure (Invalid session): HTTP 404
Failure (invalid http query): HTTP 400
Not authorized (encrypted only): HTTP 401
NOTES Shim only supports AFL queries. AQL support will be available by fall of 2014.

500 and 503 errors result in removal of the web session ID and related resources (thus, release_session does not have to be called after such an error).

This method blocks until the query completes unless stream=1. When stream=1, this method returns immediately with the query ID (before the query complets).

Do not specify the option release=1 when stream=0 and when the save option is also set, or output will not be available to read_bytes or read_lines. Instead, explicitly call release_session after reading is complete.

Setting stream=1 is an experimental feature added after version 14.7. Setting stream=1 sends query results through a server-side named pipe into the client without storing the query results on the server. This option uses the HTTP 1.1 chunked transfer encoding; stream termination is indicated by a chunk of length zero.

When stream=1 is set then release=1 is also automatically set. That means that clients are free to not call /release_session after receiving their stream data via /read_lines or /read_bytes;however, calling /release_session will not cause problems and clients can still do it if they wish.

NOTE: WHEN stream=1, SciDB QUERIES REMAIN ACTIVE UNTIL THE CLIENT COMPLETES RECIEVING THE RESULT DATA. IT'S UP TO THE CLIENT TO RECEIVE THE DATA TO TERMINATE THE QUERY.
EXAMPLE http://localhost:8080/execute_query?id=0&query=remove(x)&release=1

HTTP/1.0 200 OK
Content-Length: 13
Content-Type: text/plain

1100993821834
EXAMPLE (ERROR) http://localhost:8080/execute_query?id=0&query=remove(x)&release=1

HTTP/1.0 500 ERROR
Content-Length: 286
Content-Type: text/plain

UserQueryException in file: src/query/parser/ALTranslator.cpp function: createArrayReferenceParam line: 863
Error id: scidb::SCIDB_SE_QPROC::SCIDB_LE_ARRAY_DOESNT_EXIST
Error description: Query processor error. Array 'x' does not exist.
remove(x)
       ^
Failed query id: 1100994052246

/cancel

DESCRIPTION Cancel a SciDB query associated with a session
METHODGET
PARAMETERS id (an HTTP session ID)
auth optional authentication token (required for encrypted connections).
RESPONSE Success: HTTP 200
Failure (session not found): HTTP 404
Failure (invalid http query): HTTP 400
Failure (could not connect to SciDB): HTTP 503
Not authorized (encrypted only): HTTP 401
EXAMPLE http://localhost:8080/cancel?id=0

HTTP/1.0 200 OK
Content-Length: 0
Content-Type: text/plain


/read_lines

DESCRIPTION Read text lines from a query that saves its output
METHODGET
PARAMETERS id (an HTTP session ID)
n (maximum number of lines to read and return between 0 and 2147483647)
auth optional authentication token (required for encrypted connections).
RESPONSE Success: HTTP 200 followed by text/plain query result (up to n lines)
Failure (invalid HTTP query string): HTTP 400
Failure (session not found): HTTP 404
Failure (end of file): HTTP 410
Failure (invalid request): HTTP 414
Failure (SciDB server error): HTTP 500
Failure (could not connect to SciDB server error): HTTP 503
Failure (server out of memory): HTTP 507
Not authorized (encrypted only): HTTP 401
EXAMPLE http://localhost:8080/new_session
http://localhost:8080/execute_query?id=0&query=list('functions')&save=dcsv
http://localhost:8080/read_lines?id=0&n=20

HTTP/1.0 200 OK
Content-Length: 903
Content-Type: text/plain

{No} name,profile,deterministic,library
{0} "%","double %(double,double)",true,"scidb"
{1} "%","int16 %(int16,int16)",true,"scidb"
{2} "%","int32 %(int32,int32)",true,"scidb"
{3} "%","int64 %(int64,int64)",true,"scidb"
{4} "%","int8 %(int8,int8)",true,"scidb"
{5} "%","uint16 %(uint16,uint16)",true,"scidb"
{6} "%","uint32 %(uint32,uint32)",true,"scidb"
{7} "%","uint64 %(uint64,uint64)",true,"scidb"
{8} "%","uint8 %(uint8,uint8)",true,"scidb"
{9} "*","double *(double,double)",true,"scidb"
{10} "*","float *(float,float)",true,"scidb"
{11} "*","int16 *(int16,int16)",true,"scidb"
{12} "*","int32 *(int32,int32)",true,"scidb"
{13} "*","int64 *(int64,int64)",true,"scidb"
{14} "*","int8 *(int8,int8)",true,"scidb"
{15} "*","uint16 *(uint16,uint16)",true,"scidb"
{16} "*","uint32 *(uint32,uint32)",true,"scidb"
{17} "*","uint64 *(uint64,uint64)",true,"scidb"
{18} "*","uint8 *(uint8,uint8)",true,"scidb"
NOTES
  1. Set n=0 to download the entire output buffer.
  2. Be sure to properly url-encode special characters like the plus sign (+) in the request.
  3. When n>0, iterative requests to read_lines are allowed, and will return at most the next n lines of output. Use the 410 error code to detect end of file output.

/read_bytes

DESCRIPTION Read bytes from a query that saves its output
METHODGET
PARAMETERS id (an HTTP session ID)
n (maximum number of bytes to read and return between 0 and 2147483647)
auth optional authentication token (required for encrypted connections).
RESPONSE Success: HTTP 200 followed by application/octet-stream binary query result (up to n bytes)
Failure (invalid HTTP query string): HTTP 400
Failure (session not found): HTTP 404
Failure (end of file): HTTP 410
Failure (invalid request): HTTP 414
Failure (SciDB server error): HTTP 500
Failure (could not connect to SciDB server error): HTTP 503
Failure (server out of memory): HTTP 507
Not authorized (encrypted only): HTTP 401
EXAMPLE http://localhost:8080/new_session
http://localhost:8080/execute_query?id=0&query=build(%3Cx:double%3E%5Bi=1:10,10,0%5D,random())&save=(double)
http://localhost:8080/read_bytes?id=0&n=20

HTTP/1.0 200 OK
Content-Length: 20
Content-Type: application/octet-stream

Š/�A�}��A�
NOTES
  1. Set n=0 to download the entire output buffer.
  2. Iterative requests to read_lines are allowed, and will print at most the next n bytes of output. Use the 410 error code to detect the end of output.
  3. Be sure to properly url-encode special characters in the request.

/upload_file

DESCRIPTION Upload a file to the HTTP service
METHODPOST/GET
PARAMETERS id (an HTTP session ID)
A valid file-upload HTTP POST message.
auth optional authentication token (required for encrypted connections).
RESPONSE Success: HTTP 200 and the name of the file uploaded to the server in a text/plain response.
Failure (invalid HTTP query string): HTTP 400
Failure (Session not found): HTTP 404
Failure (Server error): HTTP 500
Not authorized (encrypted only): HTTP 401
EXAMPLE Example POST to session id=0:
POST /upload_file?id=0 HTTP/1.1
Host: localhost:8080
Accept: */*
Content-Length: 526
Expect: 100-continue
Content-Type: multipart/form-data; boundary=----------------------------d1f47951faa4

------------------------------d1f47951faa4
Content-Disposition: form-data; name="file"; filename="data.csv"
Content-Type: application/octet-stream

"","Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
"1",5.1,3.5,1.4,0.2,"setosa"
"2",4.9,3,1.4,0.2,"setosa"
"3",4.7,3.2,1.3,0.2,"setosa"
"4",4.6,3.1,1.5,0.2,"setosa"
"5",5,3.6,1.4,0.2,"setosa"
"6",5.4,3.9,1.7,0.4,"setosa"
"7",4.6,3.4,1.4,0.3,"setosa"
"8",5,3.4,1.5,0.2,"setosa"
"9",4.4,2.9,1.4,0.2,"setosa"

------------------------------d1f47951faa4--


Example response:
HTTP/1.0 200 OK
Content-Length: 23
Content-Type: text/plain

/tmp/shim_file_Hrloh9
NOTES The file to upload can be binary. Use the returned server-side file name in a subsequent SciDB load query, for example. The file does not persist after the HTTP session is released.

/login

DESCRIPTION Authenticate a user with shim
METHODGET
PARAMETERS username (the PAM username)
password (the user password)
RESPONSE Success: HTTP 200 and a text authentication token to use with the auth parameter in subsequent HTTP queries.
Failure (invalid http query): HTTP 400
Not authorized: HTTP 401
EXAMPLE https://localhost:8083/login?username=scidb&password=paradigm4

HTTP/1.0 200 OK
Content-Length: 11
Content-Type: text/plain

90362228960
NOTES The /login service URI requires a TLS or SSL encrypted connection. The password and username query parameters only appear in encrypted form over the network.

/logout

DESCRIPTION Retire an authentication token.
METHODGET
PARAMETERS auth (authentication token from /login)
RESPONSE Success: HTTP 200 (empty response)
Failure (invalid http query): HTTP 400
EXAMPLE https://localhost:8083/logout?auth=90362228960

HTTP/1.0 200 OK
Content-Length: 0
Content-Type: text/plain


Orphaned Sessions

Shim limits the number of simultaneous open sessions. Absent-minded or malicious clients are prevented from opening too many new sessions repeatedly without closing them (which could eventually result in denial of service). Shim uses a lazy timeout mechanism to detect unused sessions and reclaim them. It works like this:

  1. The session time value is set to the current time when an API event finishes.
  2. If a new_session request fails to find any available session slots, it inspects the existing session time values for all the sessions, computing the difference between current time and the time value. If a session time difference exceeds a timeout value, then that session is harvested and returned as a new session.
  3. Operations that may take an indeterminate amount of time like file uploads or execution of SciDB queries are protected from harvesting until the associated operation completes.

The above scheme is called lazy as sessions are only harvested when a new session request is unable to be satisfied. Until that event occurs, sessions are free to last indefinitely.

Shim does not protect against uploading gigantic files nor from running many long-running SciDB queries. The service may become unavailable if too many query and/or upload operations are in flight; an HTTP 503 (Service Unavailable) error code is returned in that case.