Shim Telemetry Services

In addition to the simple HTTP SciDB query API outlined in api.html, the shim defines a simple service for collecting and distributing SciDB cluster telemetry data and statistics. The telemetry service goals are simplicity, independence from the collected statistics, and efficiency.

Shim's telemetry API consists of two services

  1. Collecting telemetry data from nodes in a SciDB cluster
  2. Reporting a timeseries of collected data to interested clients
Shim is not aware of the structure of the collected data (although it imposes some limits on it). Shim only serves as a central aggregation and distribution point.

The following sections assume that shim is installed and configured; see the api.html documentation for setting up shim. The telemetry API can be authenticated and TLS-encrypted or not. The examples below show unencrypted connections for simplicity.

Collecting telemetry data

The /measurement service defines a passive telemetry data collection function. Programs contact shim on the /measurement service to announce new data. The service takes a single query parameter, data, that the program may use to supply an aribtrary string containing its measurements. The formatting of the data string is up to the program, but is usually comma separated text values. Note that the telemetry data string is limited to at most 128 characters, larger data strings return an error.

Shim maintains a circular buffer for storing measurements. When a new measurement is reported, shim inserts it into the buffer along with a pre-pended timestamp in seconds (Unix time) followed by a comma character. Each measurement is limited to 128 bytes including the shim-applied timestamp and comma character and trailing newline character, longer measurements are rejected or truncated.

It's up to the SciDB nodes to start programs that periodically post telemetry data to shim using this service--shim does not actively collect any data. See the example below for a simple shell script that repots free memory and load average.

/measurement

DESCRIPTION Report measurement data
METHODGET
PARAMETERS data Free-form data string, limited to less than 128 bytes.
RESPONSE Success: HTTP 200
Failure (out of resources/server unavailable): HTTP 503
Invalid request (usually missing data parameter): HTTP 400
EXAMPLE http://localhost:8080/measurement?data=myhostname,50.6,12.1

HTTP/1.0 200 OK

Here is an example shell script that periodically sends free memory and load average statistics to shim using wget.

#!/bin/bash
# Collect basic node statistics and report them to shim
# Usage: stats.sh SHIM_HOST:SHIM_PORT

while true;
do
  M=$(cat /proc/meminfo)
  total=$(echo "${M}" | grep "^MemTotal" | cut -d ':' -f 2 | sed -e "s/^ *//" | sed -e "s/ .*//")
  free=$(echo "${M}" | grep "^MemFree" | cut -d ':' -f 2 | sed -e "s/^ *//" | sed -e "s/ .*//")
  cached=$(echo "${M}" | grep "^Cached" | cut -d ':' -f 2 | sed -e "s/^ *//" | sed -e "s/ .*//")
  buffers=$(echo "${M}" | grep "^Buffers" | cut -d ':' -f 2 | sed -e "s/^ *//" | sed -e "s/ .*//")
  shmem=$(echo "${M}" | grep "^Shmem" | cut -d ':' -f 2 | sed -e "s/^ *//" | sed -e "s/ .*//")

  n=$(hostname)   # Host name
# Percent free memory
  mem=$(echo "scale=2;100*($free + $cached + $buffers - $shmem)/$total" | bc)
# one minute load average
  load=$(cat /proc/loadavg | cut -d ' ' -f 1)
  msg="${n},${mem},${load}"
  wget -O - -q "http://${1}/measurement?data=${msg}" >/dev/null 2>&1
  sleep 30
done


Reporting telemetry data

Shim reports collected telemetry data on a websockets interface. New data are incrementally streamed to the client websocket as they become available, simplifying dynamic display of the data.

Clients contact shim and open a websocket connection at the /telemetry service as shown in the example below. When the websocket connection is first established, shim aggregates all the entries in its telemetry data buffer, separating them with the newline character (UTF-8/ASCII code 10), and sends the aggregated message to the client over the websocket.

As new data become available, shim aggregates just the new data that the client hasn't recieved yet—again separating each entry with a newline character—and sends the new data to the client over the websocket (usually a very small message).

Although shim maintains a limited circular buffer of telemetry data, the clients are free to retain as much data as they like. The example below illustrates the websockets portion of a Javascript client.

// Global array of raw telemetry data.
telemetry = [];

window.onload = function()
{
  var url = 'ws://' + window.location.host + '/telemetry';
  websocket = new WebSocket(url);
  websocket.onmessage = function(ev)
  {
    telemetry = telemetry.concat(ev.data);
  }
}

In practice, the onmessage function will parse, process and display the incoming data, and likely limit the size of the array.