Skip to content

Commit

Permalink
[DOCS] Makes the naming convention of the DFA response objects cohere…
Browse files Browse the repository at this point in the history
…nt (#53172)
  • Loading branch information
szabosteve authored Mar 5, 2020
1 parent d7fb641 commit 870e189
Showing 1 changed file with 107 additions and 99 deletions.
206 changes: 107 additions & 99 deletions docs/reference/ml/ml-shared.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -106,12 +106,12 @@ don't support missing values therefore fields that have data types other than
numeric or boolean are ignored. Documents where included fields contain missing
values, null values, or an array are also ignored. Therefore the `dest` index
may contain documents that don't have an {olscore}.
* {regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`,
and `ip`. It is also tolerant of missing values. Fields that are supported are
included in the analysis, other fields are ignored. Documents where included
fields contain an array with two or more values are also ignored. Documents in
the `dest` index that don’t contain a results field are not included in the
{reganalysis}.
* {regression-cap} supports fields that are numeric, `boolean`, `text`,
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
supported are included in the analysis, other fields are ignored. Documents
where included fields contain an array with two or more values are also
ignored. Documents in the `dest` index that don’t contain a results field are
not included in the {reganalysis}.
* {classification-cap} supports fields that are numeric, `boolean`, `text`,
`keyword`, and `ip`. It is also tolerant of missing values. Fields that are
supported are included in the analysis, other fields are ignored. Documents
Expand Down Expand Up @@ -144,7 +144,8 @@ of a node to run the job.
end::assignment-explanation-anomaly-jobs[]

tag::assignment-explanation-datafeeds[]
For started {dfeeds} only, contains messages relating to the selection of a node.
For started {dfeeds} only, contains messages relating to the selection of a
node.
end::assignment-explanation-datafeeds[]

tag::assignment-explanation-dfanalytics[]
Expand Down Expand Up @@ -323,10 +324,10 @@ If `true`, the feature influence calculation is enabled. Defaults to `true`.
end::compute-feature-influence[]

tag::chunking-config[]
{dfeeds-cap} might be required to search over long time periods, for several months
or years. This search is split into time chunks in order to ensure the load
on {es} is managed. Chunking configuration controls how the size of these time
chunks are calculated and is an advanced configuration option.
{dfeeds-cap} might be required to search over long time periods, for several
months or years. This search is split into time chunks in order to ensure the
load on {es} is managed. Chunking configuration controls how the size of these
time chunks are calculated and is an advanced configuration option.
A chunking configuration object has the following properties:

`chunking_config`.`mode`:::
Expand Down Expand Up @@ -381,7 +382,8 @@ end::custom-rules-scope-filter-type[]
tag::custom-rules-conditions[]
An optional array of numeric conditions when the rule applies. A rule must
either have a non-empty scope or at least one condition. Multiple conditions are
combined together with a logical `AND`. A condition has the following properties:
combined together with a logical `AND`. A condition has the following
properties:
end::custom-rules-conditions[]

tag::custom-rules-conditions-applies-to[]
Expand All @@ -393,7 +395,8 @@ end::custom-rules-conditions-applies-to[]

tag::custom-rules-conditions-operator[]
Specifies the condition operator. The available options are `gt` (greater than),
`gte` (greater than or equals), `lt` (less than) and `lte` (less than or equals).
`gte` (greater than or equals), `lt` (less than) and `lte` (less than or
equals).
end::custom-rules-conditions-operator[]

tag::custom-rules-conditions-value[]
Expand Down Expand Up @@ -432,97 +435,91 @@ tag::data-frame-analytics[]
An array of {dfanalytics-job} resources, which are sorted by the `id` value in
ascending order.

`analysis`:::
(object) The type of analysis that is performed on the `source`.

`analyzed_fields`:::
(object) Contains `includes` and/or `excludes` patterns that select which fields
are included in the analysis.

`analyzed_fields`.`excludes`:::
(Optional, array) An array of strings that defines the fields that are excluded
from the analysis.

`analyzed_fields`.`includes`:::
(Optional, array) An array of strings that defines the fields that are included
in the analysis.

`dest`:::
(string) The destination configuration of the analysis.

`dest`.`index`:::
(string) The _destination index_ that stores the results of the
{dfanalytics-job}.

`dest`.`results_field`:::
(string) The name of the field that stores the results of the analysis. Defaults
to `ml`.

`id`:::
(string) The unique identifier of the {dfanalytics-job}.

`model_memory_limit`:::
(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.

`source`:::
(object) The configuration of how the analysis data is sourced. It has an
`index` parameter and optionally a `query` and a `_source`.

`index`::::
`source`.`index`:::
(array) Index or indices on which to perform the analysis. It can be a single
index or index pattern as well as an array of indices or patterns.

`query`::::
`source`.`query`:::
(object) The query that has been specified for the {dfanalytics-job}. The {es}
query domain-specific language (<<query-dsl,DSL>>). This value corresponds to
the query object in an {es} search POST body. By default, this property has the
following value: `{"match_all": {}}`.

`_source`::::
`source`.`_source`:::
(object) Contains the specified `includes` and/or `excludes` patterns that
select which fields are present in the destination. Fields that are excluded
cannot be included in the analysis.

`source`.`_source`.`excludes`:::
(array) An array of strings that defines the fields that are excluded from the
destination.

`includes`:::::
`source`.`_source`.`includes`:::
(array) An array of strings that defines the fields that are included in the
destination.

`excludes`:::::
(array) An array of strings that defines the fields that are excluded from the
destination.

`dest`:::
(string) The destination configuration of the analysis.

`index`::::
(string) The _destination index_ that stores the results of the
{dfanalytics-job}.

`results_field`::::
(string) The name of the field that stores the results of the analysis. Defaults
to `ml`.

`analysis`:::
(object) The type of analysis that is performed on the `source`.

`analyzed_fields`:::
(object) Contains `includes` and/or `excludes` patterns that select which fields
are included in the analysis.

`includes`::::
(Optional, array) An array of strings that defines the fields that are included
in the analysis.

`excludes`::::
(Optional, array) An array of strings that defines the fields that are excluded
from the analysis.

`model_memory_limit`:::
(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
end::data-frame-analytics[]

tag::data-frame-analytics-stats[]
An array of statistics objects for {dfanalytics-jobs}, which are
sorted by the `id` value in ascending order.

`assignment_explanation`:::
(string)
For running jobs only, contains messages relating to the selection of a node to
run the job.

`id`:::
(string) The unique identifier of the {dfanalytics-job}.

`state`:::
(string) Current state of the {dfanalytics-job}.

`progress`:::
(array) The progress report of the {dfanalytics-job} by phase.

`phase`::::
(string) Defines the phase of the {dfanalytics-job}. Possible phases:
`reindexing`, `loading_data`, `analyzing`, and `writing_results`.

`progress_percent`::::
(integer) The progress that the {dfanalytics-job} has made expressed in
percentage.
(string)
The unique identifier of the {dfanalytics-job}.

`memory_usage`:::
(Optional, Object) An object describing memory usage of the analytics.
It will be present only after the job has started and memory usage has
been reported.
(Optional, object)
An object describing memory usage of the analytics. It is present only after the
job is started and memory usage is reported.

`timestamp`::::
(date) The timestamp when memory usage was calculated.
`memory_usage`.`peak_usage_bytes`:::
(long)
The number of bytes used at the highest peak of memory usage.

`peak_usage_bytes`::::
(long) The number of bytes used at the highest peak of memory usage.
`memory_usage`.`timestamp`:::
(date)
The timestamp when memory usage was calculated.

`node`:::
(object)
Expand Down Expand Up @@ -550,10 +547,19 @@ The node name.
(string)
The host and port where transport HTTP connections are accepted.

`assignment_explanation`:::
(string)
For running jobs only, contains messages relating to the selection of a node to
run the job.
`progress`:::
(array) The progress report of the {dfanalytics-job} by phase.

`progress`.`phase`:::
(string) Defines the phase of the {dfanalytics-job}. Possible phases:
`reindexing`, `loading_data`, `analyzing`, and `writing_results`.

`progress`.`progress_percent`:::
(integer) The progress that the {dfanalytics-job} has made expressed in
percentage.

`state`:::
(string) Current state of the {dfanalytics-job}.
end::data-frame-analytics-stats[]

tag::datafeed-id[]
Expand All @@ -576,8 +582,8 @@ prior training.)
end::dead-category-count[]

tag::decompress-definition[]
Specifies whether the included model definition should be returned as a JSON map (`true`) or
in a custom compressed format (`false`). Defaults to `true`.
Specifies whether the included model definition should be returned as a JSON map
(`true`) or in a custom compressed format (`false`). Defaults to `true`.
end::decompress-definition[]

tag::delayed-data-check-config[]
Expand All @@ -586,10 +592,10 @@ window. For example: `{"enabled": true, "check_window": "1h"}`.
+
--
The {dfeed} can optionally search over indices that have already been read in
an effort to determine whether any data has subsequently been added to the index.
If missing data is found, it is a good indication that the `query_delay` option
is set too low and the data is being indexed after the {dfeed} has passed that
moment in time. See
an effort to determine whether any data has subsequently been added to the
index. If missing data is found, it is a good indication that the `query_delay`
option is set too low and the data is being indexed after the {dfeed} has passed
that moment in time. See
{ml-docs}/ml-delayed-data-detection.html[Working with delayed data].

This check runs only on real-time {dfeeds}.
Expand Down Expand Up @@ -812,7 +818,8 @@ A comma separated list of influencer field names. Typically these can be the by,
over, or partition fields that are used in the detector configuration. You might
also want to use a field name that is not specifically named in a detector, but
is available as part of the input data. When you use multiple detectors, the use
of influencers is recommended as it aggregates results for each influencer entity.
of influencers is recommended as it aggregates results for each influencer
entity.
end::influencers[]

tag::input-bytes[]
Expand Down Expand Up @@ -933,9 +940,10 @@ tag::max-empty-searches[]
If a real-time {dfeed} has never seen any data (including during any initial
training period) then it will automatically stop itself and close its associated
job after this many real-time searches that return no documents. In other words,
it will stop after `frequency` times `max_empty_searches` of real-time operation.
If not set then a {dfeed} with no end time that sees no data will remain started
until it is explicitly stopped. By default this setting is not set.
it will stop after `frequency` times `max_empty_searches` of real-time
operation. If not set then a {dfeed} with no end time that sees no data will
remain started until it is explicitly stopped. By default this setting is not
set.
end::max-empty-searches[]

tag::maximum-number-trees[]
Expand Down Expand Up @@ -1092,10 +1100,10 @@ example, `1575402236000 `.
end::model-snapshot-id[]

tag::model-snapshot-retention-days[]
Advanced configuration option. The period of time (in days) that model snapshots are retained.
Age is calculated relative to the timestamp of the newest model snapshot.
The default value is `1`, which means snapshots that are one day (twenty-four hours)
older than the newest snapshot are deleted.
Advanced configuration option. The period of time (in days) that model snapshots
are retained. Age is calculated relative to the timestamp of the newest model
snapshot. The default value is `1`, which means snapshots that are one day
(twenty-four hours) older than the newest snapshot are deleted.
end::model-snapshot-retention-days[]

tag::model-timestamp[]
Expand Down Expand Up @@ -1250,10 +1258,10 @@ is `shared`, which generates an index named `.ml-anomalies-shared`.
end::results-index-name[]

tag::results-retention-days[]
Advanced configuration option. The period of time (in days) that results are retained.
Age is calculated relative to the timestamp of the latest bucket result.
If this property has a non-null value, once per day at 00:30 (server time),
results that are the specified number of days older than the latest
Advanced configuration option. The period of time (in days) that results are
retained. Age is calculated relative to the timestamp of the latest bucket
result. If this property has a non-null value, once per day at 00:30 (server
time), results that are the specified number of days older than the latest
bucket result are deleted from {es}. The default value is null, which means all
results are retained.
end::results-retention-days[]
Expand Down Expand Up @@ -1353,11 +1361,11 @@ job must be opened before it can accept further data.
* `closing`: The job close action is in progress and has not yet completed. A
closing job cannot accept further data.
* `failed`: The job did not finish successfully due to an error. This situation
can occur due to invalid input data, a fatal error occurring during the analysis,
or an external interaction such as the process being killed by the Linux out of
memory (OOM) killer. If the job had irrevocably failed, it must be force closed
and then deleted. If the {dfeed} can be corrected, the job can be closed and
then re-opened.
can occur due to invalid input data, a fatal error occurring during the
analysis, or an external interaction such as the process being killed by the
Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be
force closed and then deleted. If the {dfeed} can be corrected, the job can be
closed and then re-opened.
* `opened`: The job is available to receive and process data.
* `opening`: The job open action is in progress and has not yet completed.
--
Expand Down

0 comments on commit 870e189

Please sign in to comment.