Skip to content

Data Access from Templates (Queries)

Michal Töpfer edited this page Dec 27, 2020 · 4 revisions

Data Access from Templates

The data from the IVIS server can be retrieved using queries. The DataAccessSession (obtained through its constructor) can be used to perform the queries from a template. The supported query types are:

  • docs – retrieves the records as objects,
  • histogram – splits the signal into buckets and counts number of records in each bucket,
  • summary – metrics such as min, max, ...,
  • aggs – aggregations (see below),
  • timeSeriesPoint – returns one point in the time series,
  • timeSeries – time series in a selected interval,
  • timeSeriesSummary – summary of the time series (minimum, maximum, average) on an interval.

Single query can be executed using the corresponding getLatest* method of DataAccessSession. Multiple queries can be executed at the same time using the getLatestMixed method which takes an array of queries (of shape { type, args }, args is an array with the arguments) as its argument.

For the time series queries, there are also TimeSeriesProvider, TimeSeriesSummaryProvider, TimeSeriesPointProvider and TimeSeriesLimitedPointsProvider which can be used if one does not have access to the current time interval selected in the TimeContext. The current time interval is automatically added as a filter to the queries. Results are then rendered using the renderFun property.

docs

args:

  • sigSetCid
  • signals: array of sigCids
  • filter: described below
  • sort: described below
  • limit: integer, maximum number of returned records

returns: an array of objects which have fields named by the sigCids

The docs query retrieves the records from the server.

filter

The filter is an object which has a type field. Depending on the type, other fields are required.

The "and" and "or" filters represent the boolean operations. The children property is required for them and it must contain an array of other filter objects.

The "range" filter works for a numeric or datetime signal which has to be specified using the sigCid field. Then, the lt, tle, gt, gte fields can be used to specify the desired range of the signal's values.

The "mustExist" filter returns only queries for which the sigCid is defined.

The "wildcard" filter works for keyword data signal set in sigCid and performs a text search using a wildcard patter specified in value. The two supported wildcard operators are ?, which matches any single character, and *, which can match zero or more characters, including an empty one.

The "terms" filter matches the exact values of a signal set in sigCid. The record is returned if the value of the signal is equal to one of the items in the values array.

The "ids" filter returns records with IDs specified in values array.

The "function_score" filter can be used to create more advanced filters such as random data sampling used in ScatterPlot. See Elasticsearch documentation for more details.

Example: filter

filter = {
    type: "and",
    children: [
        {
            type: "range",
            sigCid: "sig1",
            lt: 100,
            gte: 0
        },
        {
            type: "wildcard",
            sigCid: "sig2",
            value: "a*b"  // matches values starting with "a" and ending with "b"
        }
    ]
};

sort

The sort argument expects an array of objects of shape { sigCid, order }, where order can be "desc" or "asc". The second signal of the array is used for records with the same value of the first signal, etc.

Example: docs query

Query:

const signalSet = "top:gapminder"
const signals = [ "fertility_rate", "region", "population", "year" ]
const filter = {
    type: "and",
    children: [{
        type: "terms",
        sigCid: "region",
        values: ["europe", "americas"]
    },{
        type: "range",  
        sigCid: "year",
        gte: 2010,
        lt: 2011
    }]
};
const sort = [{
    sigCid: "population",
    order: "desc"
}];
const limit = 5;

const query = {
    type: "docs",
    args: [ signalSet, signals, filter, sort, limit ]
}

Response:

[
  {"fertility_rate": 1.93, "region": "americas", "population": 308641391, "year": "2010-01-01T00:00:00.000Z"},
  {"fertility_rate": 1.81, "region": "americas", "population": 196796269, "year": "2010-01-01T00:00:00.000Z"},
  {"fertility_rate": 1.57, "region": "europe",   "population": 143153869, "year": "2010-01-01T00:00:00.000Z"},
  {"fertility_rate": 2.34, "region": "americas", "population": 117318941, "year": "2010-01-01T00:00:00.000Z"},
  {"fertility_rate": 1.39, "region": "europe",   "population": 80894785,  "year": "2010-01-01T00:00:00.000Z"}
]

histogram

args:

  • sigSetCid
  • signals: array of sigCids to split into buckets
  • maxBucketCount: integer or array of integers of same length as signals (or undefined), maximum number of buckets created for each of the signals
  • minStep (or undefined): number or array of numbers of same length as signals, minimal size of the bucket
  • filter: described above in docs query
  • metrics: can be used to compute a metric for each bucket, see summary query below for list of available metrics

returns: an object with fields:

  • buckets: array of objects, each with key and count (and possibly also values with the values of the metrics)
  • step: size of each bucket (difference between successive keys)
  • agg_type: "histogram" (for numeric signals) or "terms" (for keyword signals)

The histogram query can be used to split the records into buckets by defined signals and count the number of occurrences in each bucket. If more than one signal is specified, the buckets are created along all the specified signals (see HeatmapChart for example usage of the histogram query for two signals).

The sizes of the buckets are computed on the server to be human readable (multiples of 1, 2, 5, 10, 20, ...) and to satisfy all the constraints (maxBucketCount, minStep).

For keyword signals, the most frequent values are used as bucket keys. Additionally, the sum_other_doc_count (number of docs which don't belong to any of the buckets) and doc_count_error_upper_bound fields are present in the returned object.

Example: histogram query

Query:

const signalSet = "top:gapminder"
const signals = [ "fertility_rate" ]
const filter = undefined;
const maxBucketCount = 5
const minStep = 3
const metrics = {
   "population": ["sum"]
}

const query = {
    type: "histogram",
    args: [ signalSet, signals, maxBucketCount, minStep, filter, metrics ]
}

Response:

{
    "buckets": [
        {"key": 0, "count":  934, "values": {"population": {"sum": 26444592416}}},
        {"key": 5, "count": 1298, "values": {"population": {"sum": 15129193455}}}
    ],
    "agg_type": "histogram",
    "step": 5
}

summary

args:

  • sigSetCid
  • filter: described above in docs query
  • summary: an object of shape { signals }:
    • signals: an object of shape { [sigCid]: [<metrics>]}

supported metrics:

  • can be specified as string: "min", "max", "avg" (average), "sum"
  • or as an object with type and possibly other parameters
    • "percentiles": has other parameters percents and keyed (described below in aggs)

returns: an array of objects with the same length as aggs argument. Each object has the specified sigCids as keys and the values of metrics as values.

The summary query is used to compute metrics (such as min, max, average) from the data.

Example: summary query

Query (minimum and median of the fertility_rate signal without any filtering):

{
    "type": "summary",
    "args": [ "top:gapminder", null, {
        "signals": {
            "fertility_rate": [ 
                "min", 
                {"type": "percentiles", "percents": 50, "keyed": false}
            ]
        }
    }]
}

Response:

{
    "fertility_rate": {
        "min": 0.8999999761581421,
        "percentiles": [
            {"key": 50, "value": 5.5566666920979815}
        ]
    }
} 

aggs

args:

  • sigSetCid
  • filter: described above in docs query
  • aggs: an array of objects of shape { sigCid, <agg_type>, <parameters of the aggregation> }

returns: an array objects with same length as aggs argument. The fields of each object depend on the aggregation type.

The aggs query is used to perform aggregations and retrieve results.

If no agg_type is specified, the default bucket aggregation (histogram) is used. In this case, the maxBucketCount or step & offset (TODO) or bucketGroup (TODO) must be specified.

Here are the currently supported agg_types:

terms

Returns the most frequent values as buckets with counts, useful for keyword signals (described earlier in the histogram section).

additional parameters:

  • maxBucketCount: default = 10

returns: { buckets, doc_count_error_upper_bound, sum_other_doc_count, aggs_type: "terms"} (described in histogram section)

percentiles

Returns the number of records with given percentile. Percentiles show the point at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values. The 50th percentile is the median.

additional parameters:

  • percents: number or array of numbers in range 0-100
  • keyed: boolean, determines the format of the output

returns: { values, aggs_type: "percentiles"}

  • if keyed is true (default), the values is an object with percents as keys (beware: the keys are strings, e.g. "50.0"), and the computed percentiles as corresponding values
  • if keyed is false, the values is an array of objects of shape { key, value }

timeSeriesPoint

args:

  • sigSets: object in which the keys are the sigSetCids and values have the following properties:
    • tsSigCid
    • signals: array of sigCids whose values will be returned
    • mustExist: array of sigCids which must exist (have value) in the records which are returned (optional)
    • horizon: a moment.duration which sets the length of the time interval from which the data will be returned (optional)
  • ts: timestamp for filtering
  • timeSeriesPointType: value from TimeSeriesPointType enum, specifies whether the data should be before or after the timestamp

returns: an object with sigSetCids as keys and values have

  • ts: timestamp of the data point (moment)
  • data: an object with sigCids (from signals) as keys and the corresponding values

This query returns one data point which is closest to the timestamp (ts). Depending on the timeSeriesPointType, it is the last value before the timestamp (timeSeriesPointType.LTE or timeSeriesPointType.LT) or the first value after the timestamp (timeSeriesPointType.GTE or timeSeriesPointType.GT).

If the horizon parameter is specified, only that duration before or after the timestamp is considered. If there are no data points in the interval, an empty object is returned.

Example: timeSeriesPoint query

Query (the latest data point from the last 10 seconds):

const sigSets = {
    "components_frequency": {
        tsSigCid: "ts",
        signals: ["cpu_freq", "gpu_freq", "online_cpus"],
        horizon: duration(10, "s"),
    }
}
const ts = moment(); // now
const timeSeriesPointType = TimeSeriesPointType.LTE;

const query = {
    type: "timeSeriesPoint",
    args: [ sigSets, ts, timeSeriesPointType ]
}

Response:

{
    "components_frequency": {
        "data": {
            "cpu_freq": 1.77,
            "gpu_freq": 0.98,
            "online_cpus": 4
        },
        "ts": "2020-12-12T13:23:49.000Z" // this is a moment object
    }
}

timeSeries

TODO

timeSeriesSummary

TODO

Clone this wiki locally