-
Notifications
You must be signed in to change notification settings - Fork 11
Data Access from Templates (Queries)
The data from the IVIS server can be retrieved using queries. The DataAccessSession
(obtained through its constructor) can be used to perform the queries from a template. The supported query types are:
- docs – retrieves the records as objects,
- histogram – splits the signal into buckets and counts number of records in each bucket,
- summary – metrics such as min, max, ...,
- aggs – aggregations (see below),
- timeSeriesPoint – returns one point in the time series,
- timeSeries – time series in a selected interval,
- timeSeriesSummary – summary of the time series (minimum, maximum, average) on an interval.
Single query can be executed using the corresponding getLatest*
method of DataAccessSession
. Multiple queries can be executed at the same time using the getLatestMixed
method which takes an array of queries (of shape { type, args }
, args
is an array with the arguments) as its argument.
For the time series queries, there are also TimeSeriesProvider
, TimeSeriesSummaryProvider
, TimeSeriesPointProvider
and TimeSeriesLimitedPointsProvider
which can be used if one does not have access to the current time interval selected in the TimeContext
.
The current time interval is automatically added as a filter to the queries. Results are then rendered using the renderFun
property.
args:
sigSetCid
-
signals
: array ofsigCid
s -
filter
: described below -
sort
: described below -
limit
: integer, maximum number of returned records
returns: an array of objects which have fields named by the sigCid
s
The docs query retrieves the records from the server.
The filter is an object which has a type
field. Depending on the type, other fields are required.
The "and"
and "or"
filters represent the boolean operations. The children
property is required for them and it must contain an array of other filter objects.
The "range"
filter works for a numeric or datetime signal which has to be specified using the sigCid
field. Then, the lt
, tle
, gt
, gte
fields can be used to specify the desired range of the signal's values.
The "mustExist"
filter returns only queries for which the sigCid
is defined.
The "wildcard"
filter works for keyword data signal set in sigCid
and performs a text search using a wildcard patter specified in value
. The two supported wildcard operators are ?
, which matches any single character, and *
, which can match zero or more characters, including an empty one.
The "terms"
filter matches the exact values of a signal set in sigCid
. The record is returned if the value of the signal is equal to one of the items in the values
array.
The "ids"
filter returns records with IDs specified in values
array.
The "function_score"
filter can be used to create more advanced filters such as random data sampling used in ScatterPlot. See Elasticsearch documentation for more details.
filter = {
type: "and",
children: [
{
type: "range",
sigCid: "sig1",
lt: 100,
gte: 0
},
{
type: "wildcard",
sigCid: "sig2",
value: "a*b" // matches values starting with "a" and ending with "b"
}
]
};
The sort argument expects an array of objects of shape { sigCid, order }
, where order
can be "desc"
or "asc"
. The second signal of the array is used for records with the same value of the first signal, etc.
Query:
const signalSet = "top:gapminder"
const signals = [ "fertility_rate", "region", "population", "year" ]
const filter = {
type: "and",
children: [{
type: "terms",
sigCid: "region",
values: ["europe", "americas"]
},{
type: "range",
sigCid: "year",
gte: 2010,
lt: 2011
}]
};
const sort = [{
sigCid: "population",
order: "desc"
}];
const limit = 5;
const query = {
type: "docs",
args: [ signalSet, signals, filter, sort, limit ]
}
Response:
[
{"fertility_rate": 1.93, "region": "americas", "population": 308641391, "year": "2010-01-01T00:00:00.000Z"},
{"fertility_rate": 1.81, "region": "americas", "population": 196796269, "year": "2010-01-01T00:00:00.000Z"},
{"fertility_rate": 1.57, "region": "europe", "population": 143153869, "year": "2010-01-01T00:00:00.000Z"},
{"fertility_rate": 2.34, "region": "americas", "population": 117318941, "year": "2010-01-01T00:00:00.000Z"},
{"fertility_rate": 1.39, "region": "europe", "population": 80894785, "year": "2010-01-01T00:00:00.000Z"}
]
args:
sigSetCid
-
signals
: array ofsigCid
s to split into buckets -
maxBucketCount
: integer or array of integers of same length assignals
(orundefined
), maximum number of buckets created for each of the signals -
minStep
(orundefined
): number or array of numbers of same length assignals
, minimal size of the bucket -
filter
: described above in docs query -
metrics
: can be used to compute a metric for each bucket, see summary query below for list of available metrics
returns: an object with fields:
-
buckets
: array of objects, each withkey
andcount
(and possibly alsovalues
with the values of the metrics) -
step
: size of each bucket (difference between successivekey
s) -
agg_type
:"histogram"
(for numeric signals) or"terms"
(for keyword signals)
The histogram query can be used to split the records into buckets by defined signals and count the number of occurrences in each bucket. If more than one signal is specified, the buckets are created along all the specified signals (see HeatmapChart for example usage of the histogram query for two signals).
The sizes of the buckets are computed on the server to be human readable (multiples of 1, 2, 5, 10, 20, ...) and to satisfy all the constraints (maxBucketCount
, minStep
).
For keyword signals, the most frequent values are used as bucket keys. Additionally, the sum_other_doc_count
(number of docs which don't belong to any of the buckets) and doc_count_error_upper_bound
fields are present in the returned object.
Query:
const signalSet = "top:gapminder"
const signals = [ "fertility_rate" ]
const filter = undefined;
const maxBucketCount = 5
const minStep = 3
const metrics = {
"population": ["sum"]
}
const query = {
type: "histogram",
args: [ signalSet, signals, maxBucketCount, minStep, filter, metrics ]
}
Response:
{
"buckets": [
{"key": 0, "count": 934, "values": {"population": {"sum": 26444592416}}},
{"key": 5, "count": 1298, "values": {"population": {"sum": 15129193455}}}
],
"agg_type": "histogram",
"step": 5
}
args:
sigSetCid
-
filter
: described above in docs query -
summary
: an object of shape{ signals }
:-
signals
: an object of shape{ [sigCid]: [<metrics>]}
-
supported metrics:
- can be specified as string:
"min"
,"max"
,"avg"
(average),"sum"
- or as an object with
type
and possibly other parameters-
"percentiles"
: has other parameterspercents
andkeyed
(described below in aggs)
-
returns: an array of objects with the same length as aggs
argument. Each object has the specified sigCid
s as keys and the values of metrics as values.
The summary query is used to compute metrics (such as min, max, average) from the data.
Query (minimum and median of the fertility_rate signal without any filtering):
{
"type": "summary",
"args": [ "top:gapminder", null, {
"signals": {
"fertility_rate": [
"min",
{"type": "percentiles", "percents": 50, "keyed": false}
]
}
}]
}
Response:
{
"fertility_rate": {
"min": 0.8999999761581421,
"percentiles": [
{"key": 50, "value": 5.5566666920979815}
]
}
}
args:
sigSetCid
-
filter
: described above in docs query -
aggs
: an array of objects of shape{ sigCid, <agg_type>, <parameters of the aggregation> }
returns: an array objects with same length as aggs
argument. The fields of each object depend on the aggregation type.
The aggs query is used to perform aggregations and retrieve results.
If no agg_type
is specified, the default bucket aggregation (histogram) is used. In this case, the maxBucketCount
or step
& offset
(TODO) or bucketGroup
(TODO) must be specified.
Here are the currently supported agg_type
s:
Returns the most frequent values as buckets with counts, useful for keyword signals (described earlier in the histogram section).
additional parameters:
-
maxBucketCount
: default = 10
returns: { buckets, doc_count_error_upper_bound, sum_other_doc_count, aggs_type: "terms"}
(described in histogram section)
Returns the number of records with given percentile. Percentiles show the point at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values. The 50th percentile is the median.
additional parameters:
-
percents
: number or array of numbers in range 0-100 -
keyed
: boolean, determines the format of the output
returns: { values, aggs_type: "percentiles"}
- if
keyed
istrue
(default), thevalues
is an object withpercents
as keys (beware: the keys are strings, e.g."50.0"
), and the computed percentiles as corresponding values - if
keyed
isfalse
, thevalues
is an array of objects of shape{ key, value }
args:
-
sigSets
: object in which the keys are thesigSetCid
s and values have the following properties:tsSigCid
-
signals
: array ofsigCid
s whose values will be returned -
mustExist
: array ofsigCid
s which must exist (have value) in the records which are returned (optional) -
horizon
: amoment.duration
which sets the length of the time interval from which the data will be returned (optional)
-
ts
: timestamp for filtering -
timeSeriesPointType
: value fromTimeSeriesPointType
enum, specifies whether the data should be before or after the timestamp
returns: an object with sigSetCid
s as keys and values have
-
ts
: timestamp of the data point (moment) -
data
: an object withsigCid
s (fromsignals
) as keys and the corresponding values
This query returns one data point which is closest to the timestamp (ts
). Depending on the timeSeriesPointType
, it is the last value before the timestamp (timeSeriesPointType.LTE
or timeSeriesPointType.LT
) or the first value after the timestamp (timeSeriesPointType.GTE
or timeSeriesPointType.GT
).
If the horizon
parameter is specified, only that duration before or after the timestamp is considered. If there are no data points in the interval, an empty object is returned.
Query (the latest data point from the last 10 seconds):
const sigSets = {
"components_frequency": {
tsSigCid: "ts",
signals: ["cpu_freq", "gpu_freq", "online_cpus"],
horizon: duration(10, "s"),
}
}
const ts = moment(); // now
const timeSeriesPointType = TimeSeriesPointType.LTE;
const query = {
type: "timeSeriesPoint",
args: [ sigSets, ts, timeSeriesPointType ]
}
Response:
{
"components_frequency": {
"data": {
"cpu_freq": 1.77,
"gpu_freq": 0.98,
"online_cpus": 4
},
"ts": "2020-12-12T13:23:49.000Z" // this is a moment object
}
}
TODO
TODO