Shim is a web service that exposes a very simple API for clients to interact with SciDB over HTTP connections. The API consists of a small number of URIs (described in detail below), including: /login, /new_session, /release_session, /execute_query, /cancel, /read_lines, /read_bytes, /upload_file, /version, and /logout.
There are more direct ways to talk to SciDB over a network. See https://github.com/artyom-smirnov/scidb4py for a great example of direct network communication with SciDB using Python and Google protocol buffers (written by Artyom Smirnov, one of the core SciDB developers). The protocol buffers approach is comparatively low-level, but possibly more efficient than shim. Shim is designed to be a convenient and easy way to talk to SciDB. Note that both shim and Artyom's Python interface are community open source projects; they are not official components of SciDB.
Shim clients begin by requesting a session ID from the service, then running a query and releasing the session ID when done. Session IDs are distinct from SciDB query IDs--a session ID groups a SciDB query together with server resources for input and output to the client.
ports=8080,80803s auth=login scidbport=1239 user=root tmp=/tmpEach option is described below.
Shim listens on default ports 8080 (open, not encrypted), and 8083 (TLS encrypted) on all available network interfaces. Ports and listening interfaces are configured with the command line '-p' option or with the 'ports=' option in the /var/lib/shim/conf file when shim is run as a service. The ports/interface specification uses the following syntax:
[address:]port[s][,[address:]port[s]][,...]where:
5555s | Listen only on port 5555 (TLS/SSL). | |
127.0.0.1:8080,1234s | List on port 8080 but only on the local loopback interface; listen on port 1234(TLS/SSL) on all interfaces. |
Basic digest access authentication is supported too (see https://en.wikipedia.org/wiki/Digest_access_authentication). See below for examples.
Specify stream>0 in the /execute_query service endpoint to stream SciDB results directly to the client using the HTTP 1.1 chunked transfer encoding mechanism. With this option, a named pipe is created in the temporary I/O directory instead of a file. See the /execute_query doc below for more information. This option is automatically set by the compression option below.
Specify compression=n, for values of n in 1,...,9 in /excute_query to stream gzip compressed (RFC 1952) results directly to the client using HTTP 1.1 chunked transfer encoding. The stream compression level is set by the compression query parameter in the range 1,...,9 (1 fast but poor compression to 9 slow but great compression). It's up to the client to decompress the data. Setting this option autmatically also sets the stream option to a nonzero value of 2 indicating gzip-compressed streaming.
Shim supports TLS/SSL encryption. Packaged versions of shim (RPM and deb packages) generate a self-signed certificate and 4096-bit RSA key when shim is installed. The certificate is placed in /var/lib/shim/ssl_cert.pem. If you would prefer to use a different certificate, replace the automatically generated one.
HTTP 1.1 clients or greater are required.
All HTTP query parameters are passed to the service as string values. They are limited to a maximum of 4096 characters unless otherwise indicated (a notable exception is the SciDB query string parameter, limited to 262,144 characers).
HTTP query string parameters that represent numbers have limits. Unless otherwise indicated whole-number values (session ID, number of bytes to return, etc.) are interpreted by shim as signed 32-bit integers and are generally limited to values between zero and 2147483647. Values outside that range will result in an HTTP 400 error (invalid query).
Possible responses for each URI are listed below. HTTP status code 200 always indicates success; other standard HTTP status codes indicate various errors. The returned data may be UTF-8 or binary depending on the request and is always returned using the generic application/octet-stream MIME type. Depending on the request, data may used chunked HTTP transfer encoding and may also use gzip content encoding.
Shim supports basic digest access authentication. (See https://en.wikipedia.org/wiki/Digest_access_authentication and the references therein for a good description of the method.) Enable digest access authentication by creating an .htpasswd file in shim's default /var/lib/shim/wwwroot directory (the .htpasswd file must be located in shim's wwwroot directory, which can be changed with the command line switch -r. The format of the file must be:
username1:password1 username2:password2 ...Use plain text passwords in the file, and consider changing the permissions of the file to restrict access. Delete the .htpasswd file to disable basic digest access authentication.
Basic digest authentication works on plain or TLS-encrypted connections and takes precedence over PAM authentication when used on TLS-encrypted connections (see below).
Shim optionally exposes both open and encrypted (TLS) services. Access to most of the API over an encrypted connection requires authentication. API URIs that require authentication include: /new_session /upload_file /read_lines /read_bytes /execute_query and /cancel.
Authentication is very simple to use:
Here is a full example using the wget program from the command line talking to shim running on localhost, port 8083.
# Log in with username=scidb password=paradigm4 (assuming that such a user exists on the system that SciDB is running on): wget -q --no-check-certificate -O - "https://localhost:8083/login?username=scidb&password=paradigm4" 90362228960 # (authentication token returned on success) # As a quick check, we'll try to obtain a new session without using the auth parameter (this will return an error): wget --no-check-certificate -O - "https://localhost:8083/new_session" HTTP request sent, awaiting response... 400 ERROR # Now, let's try again with auth and run a simple query and this time get a new session ID: wget -q --no-check-certificate -O - "https://localhost:8083/new_session?auth=90362228960" 0 # (our session ID) wget -q --no-check-certificate -O - "https://localhost:8083/execute_query?id=0&query=list('functions')&save=dcsv&auth=90362228960" wget -q --no-check-certificate -O - "https://localhost:8083/read_lines?id=0&n=0&auth=90362228960" | head -n 10 {No} name,profile,deterministic,library {0} '%','double %(double,double)',true,'scidb' {1} '%','int16 %(int16,int16)',true,'scidb' {2} '%','int32 %(int32,int32)',true,'scidb' {3} '%','int64 %(int64,int64)',true,'scidb' {4} '%','int8 %(int8,int8)',true,'scidb' {5} '%','uint16 %(uint16,uint16)',true,'scidb' {6} '%','uint32 %(uint32,uint32)',true,'scidb' {7} '%','uint64 %(uint64,uint64)',true,'scidb' {8} '%','uint8 %(uint8,uint8)',true,'scidb' {9} '*','double *(double,double)',true,'scidb' wget -q --no-check-certificate -O - "https://localhost:8083/release_session?id=0&auth=90362228960" # ... do some other things, always adding auth=Again, simply append your authentication token to your HTTP requests to use the authenticated service.... When you're done, log out of the system: wget -q --no-check-certificate -O - "https://localhost:8083/logout?auth=90362228960"
/version | ||
DESCRIPTION | Print the shim code version string | |
METHOD | GET | |
PARAMETERS | ||
RESPONSE | Success: HTTP 200 and text version string value in text/plain payload
| |
EXAMPLE |
http://localhost:8080/version
HTTP/1.1 200 OK Content-Length: 16 Cache-Control: no-cache Access-Control-Allow-Origin: * Content-Type: text/plain v14.3-15-gd71f | |
/new_session | ||
DESCRIPTION | Request a new HTTP session from the service. | |
METHOD | GET | |
PARAMETERS | auth optional authentication token (required for encrypted connections). | |
RESPONSE | Success: HTTP 200 and text session ID value in text/plain payload
Failure (out of resources/server unavailable): HTTP 503 Invalid request (encrypted only -- this means auth is missing): HTTP 400 Not authorized (encrypted only): HTTP 401 | |
EXAMPLE |
http://localhost:8080/new_session
HTTP/1.0 200 OK Content-Length: 3 Content-Type: text/plain 0 | |
/release_session | ||
DESCRIPTION | Release an HTTP session | |
METHOD | GET | |
PARAMETERS | id (an HTTP session ID)
auth optional authentication token (required for encrypted connections). | |
RESPONSE | Success: HTTP 200
Failure (Session not found): HTTP 404 Failure (invalid http query): HTTP 400 Not authorized (encrypted only): HTTP 401 | |
EXAMPLE |
http://localhost:8080/release_session?id=0
HTTP/1.0 200 OK Content-Length: 0 Content-Type: text/plain | |
/execute_query | ||
DESCRIPTION | Execute a SciDB AFL query | |
METHOD | GET | |
PARAMETERS |
id (an HTTP session ID) query (AFL query string, encoded for use in URL as required, limited to a maximum of 1,000,000 characters) save optional (format string, limited to a maximum of 4096 characters) Save the query output in the specified, format for subsequent download by read_lines or read_bytes. If the save parameter is not specified, don't save the query output. release optional 0 or 1: if 1 then release_session as soon as query completes. The default value is 0 if not specified (see additional notes below). stream optional 0, 1 or 2: if 1 then stream query result; if 2 then stream a gzip compressed (RFC 1952) query result with an unspecified (automatic) compression level; otherwise send query result through a server-side output file (the default). steam>0 also sets release=1. compression optional 0 to 9: Sets the steaming compression level from 0 (no compression), 1 (fast/light compression) to 9 (slow/high compression). Setting this option automatically also sets the stream option to stream=2. auth optional authentication token (required for encrypted connections). | |
RESPONSE |
Success: HTTP 200 text/plain (SciDB Query ID) Failure (SciDB not available error): HTTP 503 text/plain (ERROR TEXT) Failure (SciDB query error): HTTP 500 text/plain (SCIDB ERROR TEXT) Failure (out of memory error): HTTP 507 text/plain (SCIDB ERROR TEXT) Failure (Invalid session): HTTP 404 Failure (invalid http query): HTTP 400 Not authorized (encrypted only): HTTP 401 | |
NOTES |
Shim only supports AFL queries. AQL support will be available by
fall of 2014.
500 and 503 errors result in removal of the web session ID and related resources (thus, release_session does not have to be called after such an error). This method blocks until the query completes unless stream>0. When stream>0, this method returns immediately with the query ID (before the query completes). Do not specify the option release=1 when stream=0 and when the save option is also set, or output will not be available to read_bytes or read_lines. Instead, explicitly call release_session after reading is complete. Setting stream > 0 is an experimental feature added after version 14.7. Setting stream > 0 sends query results through a server-side named pipe into the client without storing the query results on the server. This option uses the HTTP 1.1 chunked transfer encoding; stream termination is indicated by a chunk of length zero. When stream>0 is set then release=1 is also automatically set. That means that clients are free to not call /release_session after receiving their stream data via /read_lines or /read_bytes;however, calling /release_session will not cause problems and clients can still do it if they wish. NOTE: WHEN stream>0, SciDB QUERIES REMAIN ACTIVE UNTIL THE CLIENT RECEIVES THE RESULT DATA. IT'S UP TO THE CLIENT TO RECEIVE THE DATA TO TERMINATE THE QUERY. Set stream=2 to enable gzip-compressed streaming (RFC 1952). When stream=2 one may also set the compression level with the compression option in the range 0 to 9. If the compression query parameter is specified, then stream is automatically set to 2. | |
EXAMPLE |
http://localhost:8080/execute_query?id=0&query=remove(x)&release=1
HTTP/1.0 200 OK Content-Length: 13 Content-Type: text/plain 1100993821834 | |
EXAMPLE (ERROR) |
http://localhost:8080/execute_query?id=0&query=remove(x)&release=1
HTTP/1.0 500 ERROR Content-Length: 286 Content-Type: text/plain UserQueryException in file: src/query/parser/ALTranslator.cpp function: createArrayReferenceParam line: 863 Error id: scidb::SCIDB_SE_QPROC::SCIDB_LE_ARRAY_DOESNT_EXIST Error description: Query processor error. Array 'x' does not exist. remove(x) ^ Failed query id: 1100994052246 | |
EXAMPLE STREAMING DATA USING wget |
# Obtain a shim session ID: s=`wget -O - -q http://localhost:8080/new_session` echo "session: $s" # Run a query, requesting compressed streaming output with high compression: wget -O - -q "http://localhost:8080/execute_query?id=${s}&query=list('functions')&save=dcsv&compression=9" # Retrieve the results, placing them in a file locally: wget -O - -q "http://localhost:8080/read_bytes?id=${s}&n=0" > /tmp/z.gz # now we can gunzip this, for example: gunzip /tmp/z.gz &&cat /tmp/z | |
/cancel | ||
DESCRIPTION | Cancel a SciDB query associated with a session | |
METHOD | GET | |
PARAMETERS | id (an HTTP session ID)
auth optional authentication token (required for encrypted connections). | |
RESPONSE | Success: HTTP 200
Failure (session not found): HTTP 404 Failure (invalid http query): HTTP 400 Failure (could not connect to SciDB): HTTP 503 Not authorized (encrypted only): HTTP 401 | |
EXAMPLE |
http://localhost:8080/cancel?id=0
HTTP/1.0 200 OK Content-Length: 0 Content-Type: text/plain | |
/read_lines | ||
DESCRIPTION | Read text lines from a query that saves its output | |
METHOD | GET | |
PARAMETERS |
id (an HTTP session ID) n (maximum number of lines to read and return between 0 and 2147483647) auth optional authentication token (required for encrypted connections). | |
RESPONSE | Success: HTTP 200 followed by application/octet-stream query result (up to n lines)
Failure (invalid HTTP query string): HTTP 400 Failure (session not found): HTTP 404 Failure (end of file): HTTP 410 Failure (invalid request): HTTP 414 Failure (SciDB server error): HTTP 500 Failure (could not connect to SciDB server error): HTTP 503 Failure (server out of memory): HTTP 507 Not authorized (encrypted only): HTTP 401 | |
Basic example |
s=`wget -O - -q "http://localhost:8080/new_session"` wget -O - -q "http://localhost:8080/execute_query?id=${s}&query=list('functions')&save=dcsv" wget -O - "http://localhost:8080/read_lines?id=${s}&n=20" wget -O - "http://localhost:8080/release_query?id=${s}" HTTP/1.0 200 OK Content-Length: 903 Content-Type: application/octet-stream {No} name,profile,deterministic,library {0} "%","double %(double,double)",true,"scidb" {1} "%","int16 %(int16,int16)",true,"scidb" {2} "%","int32 %(int32,int32)",true,"scidb" {3} "%","int64 %(int64,int64)",true,"scidb" {4} "%","int8 %(int8,int8)",true,"scidb" {5} "%","uint16 %(uint16,uint16)",true,"scidb" {6} "%","uint32 %(uint32,uint32)",true,"scidb" {7} "%","uint64 %(uint64,uint64)",true,"scidb" {8} "%","uint8 %(uint8,uint8)",true,"scidb" {9} "*","double *(double,double)",true,"scidb" {10} "*","float *(float,float)",true,"scidb" {11} "*","int16 *(int16,int16)",true,"scidb" {12} "*","int32 *(int32,int32)",true,"scidb" {13} "*","int64 *(int64,int64)",true,"scidb" {14} "*","int8 *(int8,int8)",true,"scidb" {15} "*","uint16 *(uint16,uint16)",true,"scidb" {16} "*","uint32 *(uint32,uint32)",true,"scidb" {17} "*","uint64 *(uint64,uint64)",true,"scidb" {18} "*","uint8 *(uint8,uint8)",true,"scidb" | |
Streaming example |
s=`wget -O - -q "http://localhost:8080/new_session"` wget -O - -q "http://localhost:8080/execute_query?id=${s}&query=list('functions')&save=dcsv&stream=1" wget -O - "http://localhost:8080/read_lines?id=${s}" | head -n 20 HTTP/1.0 200 OK Content-Length: 903 Content-Type: application/octet-stream {No} name,profile,deterministic,library {0} "%","double %(double,double)",true,"scidb" {1} "%","int16 %(int16,int16)",true,"scidb" {2} "%","int32 %(int32,int32)",true,"scidb" {3} "%","int64 %(int64,int64)",true,"scidb" {4} "%","int8 %(int8,int8)",true,"scidb" {5} "%","uint16 %(uint16,uint16)",true,"scidb" {6} "%","uint32 %(uint32,uint32)",true,"scidb" {7} "%","uint64 %(uint64,uint64)",true,"scidb" {8} "%","uint8 %(uint8,uint8)",true,"scidb" {9} "*","double *(double,double)",true,"scidb" {10} "*","float *(float,float)",true,"scidb" {11} "*","int16 *(int16,int16)",true,"scidb" {12} "*","int32 *(int32,int32)",true,"scidb" {13} "*","int64 *(int64,int64)",true,"scidb" {14} "*","int8 *(int8,int8)",true,"scidb" {15} "*","uint16 *(uint16,uint16)",true,"scidb" {16} "*","uint32 *(uint32,uint32)",true,"scidb" {17} "*","uint64 *(uint64,uint64)",true,"scidb" {18} "*","uint8 *(uint8,uint8)",true,"scidb" | |
NOTES |
| |
/read_bytes | ||
DESCRIPTION | Read bytes from a query that saves its output | |
METHOD | GET | |
PARAMETERS |
id (an HTTP session ID) n (maximum number of bytes to read and return between 0 and 2147483647) auth optional authentication token (required for encrypted connections). | |
RESPONSE | Success: HTTP 200 followed by application/octet-stream binary query result (up to n bytes)
Failure (end of file): HTTP 416 Failure (invalid HTTP query string): HTTP 400 Failure (session not found): HTTP 404 Failure (invalid request): HTTP 414 Failure (SciDB server error): HTTP 500 Failure (could not connect to SciDB server error): HTTP 503 Failure (server out of memory): HTTP 507 Not authorized (encrypted only): HTTP 401 | |
EXAMPLE |
http://localhost:8080/new_session http://localhost:8080/execute_query?id=0&query=build(%3Cx:double%3E%5Bi=1:10,10,0%5D,random())&save=(double) http://localhost:8080/read_bytes?id=0&n=20 HTTP/1.0 200 OK Content-Length: 20 Content-Type: application/octet-stream Š/�A�}��A� | |
NOTES |
| |
/upload_file | ||
DESCRIPTION | Upload a file to the HTTP service | |
METHOD | POST/GET | |
PARAMETERS |
id (an HTTP session ID) A valid file-upload HTTP POST message. auth optional authentication token (required for encrypted connections). | |
RESPONSE | Success: HTTP 200 and the name of the file uploaded to the server in a text/plain response.
Failure (invalid HTTP query string): HTTP 400 Failure (Session not found): HTTP 404 Failure (Server error): HTTP 500 Not authorized (encrypted only): HTTP 401 | |
EXAMPLE |
Example POST to session id=0:
POST /upload_file?id=0 HTTP/1.1 Host: localhost:8080 Accept: */* Content-Length: 526 Expect: 100-continue Content-Type: multipart/form-data; boundary=----------------------------d1f47951faa4 ------------------------------d1f47951faa4 Content-Disposition: form-data; name="file"; filename="data.csv" Content-Type: application/octet-stream "","Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species" "1",5.1,3.5,1.4,0.2,"setosa" "2",4.9,3,1.4,0.2,"setosa" "3",4.7,3.2,1.3,0.2,"setosa" "4",4.6,3.1,1.5,0.2,"setosa" "5",5,3.6,1.4,0.2,"setosa" "6",5.4,3.9,1.7,0.4,"setosa" "7",4.6,3.4,1.4,0.3,"setosa" "8",5,3.4,1.5,0.2,"setosa" "9",4.4,2.9,1.4,0.2,"setosa" ------------------------------d1f47951faa4-- Example response: HTTP/1.0 200 OK Content-Length: 23 Content-Type: text/plain /tmp/shim_file_Hrloh9 | |
NOTES | The file to upload can be binary. Use the returned server-side file name in a subsequent SciDB load query, for example. The file does not persist after the HTTP session is released. | |
/login | ||
DESCRIPTION | Authenticate a user with shim | |
METHOD | GET | |
PARAMETERS | username (the PAM username)
password (the user password) | |
RESPONSE | Success: HTTP 200 and a text authentication token to use with
the auth parameter in subsequent HTTP queries.
Failure (invalid http query): HTTP 400 Not authorized: HTTP 401 | |
EXAMPLE |
https://localhost:8083/login?username=scidb&password=paradigm4
HTTP/1.0 200 OK Content-Length: 11 Content-Type: text/plain 90362228960 | |
NOTES | The /login service URI requires a TLS or SSL encrypted connection. The password and username query parameters only appear in encrypted form over the network. | |
/logout | ||
DESCRIPTION | Retire an authentication token. | |
METHOD | GET | |
PARAMETERS | auth (authentication token from /login) | |
RESPONSE | Success: HTTP 200 (empty response)
Failure (invalid http query): HTTP 400 | |
EXAMPLE |
https://localhost:8083/logout?auth=90362228960
HTTP/1.0 200 OK Content-Length: 0 Content-Type: text/plain |
Shim limits the number of simultaneous open sessions. Absent-minded or malicious clients are prevented from opening too many new sessions repeatedly without closing them (which could eventually result in denial of service). Shim uses a lazy timeout mechanism to detect unused sessions and reclaim them. It works like this:
The above scheme is called lazy as sessions are only harvested when a new session request is unable to be satisfied. Until that event occurs, sessions are free to last indefinitely.
Shim does not protect against uploading gigantic files nor from running many long-running SciDB queries. The service may become unavailable if too many query and/or upload operations are in flight; an HTTP 503 (Service Unavailable) error code is returned in that case.