Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[7.11] [DOCS] Adds custom feature processors description to PUT DFA API (#67424) #67679

Merged
merged 2 commits into from
Jan 19, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,111 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
`feature_processors`::::
(Optional, list)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors]
+
.Properties of `feature_processors`
[%collapsible%open]
======
`frequency_encoding`::::
(object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-frequency]
+
.Properties of `frequency_encoding`
[%collapsible%open]
=======
`feature_name`::::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-feat-name]

`field`::::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]

`frequency_map`::::
(Required, object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-frequency-map]
=======

`multi_encoding`::::
(object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-multi]
+
.Properties of `multi_encoding`
[%collapsible%open]
=======
`processors`::::
(Required, array)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-multi-proc]
=======

szabosteve marked this conversation as resolved.
Show resolved Hide resolved
`ngram_encoding`::::
(object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram]
+
.Properties of `ngram_encoding`
[%collapsible%open]
=======
`feature_prefix`::::
(Optional, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-feat-pref]

`field`::::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-field]

`length`::::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-length]

`n_grams`::::
(Required, array)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-ngrams]

`start`::::
(Optional, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-start]
=======

`one_hot_encoding`::::
(object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-one-hot]
+
.Properties of `one_hot_encoding`
[%collapsible%open]
=======
`field`::::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]

`hot_map`::::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-one-hot-map]
=======

`target_mean_encoding`::::
(object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean]
+
.Properties of `target_mean_encoding`
[%collapsible%open]
=======
`default_value`::::
(Required, integer)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean-default]

`feature_name`::::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-feat-name]

`field`::::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]

`target_map`::::
(Required, object)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean-map]
=======

======

`gamma`::::
(Optional, double)
Expand Down
81 changes: 80 additions & 1 deletion docs/reference/ml/ml-shared.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -554,9 +554,88 @@ A collection of feature preprocessors that modify one or more included fields.
The analysis uses the resulting one or more features instead of the
original document field. Multiple `feature_processors` entries can refer to the
same document fields.
Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs.
Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding]
still occurs.
end::dfas-feature-processors[]

tag::dfas-feature-processors-feat-name[]
The resulting feature name.
end::dfas-feature-processors-feat-name[]

tag::dfas-feature-processors-field[]
The name of the field to encode.
end::dfas-feature-processors-field[]

tag::dfas-feature-processors-frequency[]
The configuration information necessary to perform frequency encoding.
end::dfas-feature-processors-frequency[]

tag::dfas-feature-processors-frequency-map[]
The resulting frequency map for the field value. If the field value is missing
from the `frequency_map`, the resulting value is `0`.
end::dfas-feature-processors-frequency-map[]

tag::dfas-feature-processors-multi[]
The configuration information necessary to perform multi encoding. It allows
multiple processors to be changed together. This way the output of a processor
can then be passed to another as an input.
end::dfas-feature-processors-multi[]

szabosteve marked this conversation as resolved.
Show resolved Hide resolved
tag::dfas-feature-processors-multi-proc[]
The ordered array of custom processors to execute. Must be more than 1.
end::dfas-feature-processors-multi-proc[]

szabosteve marked this conversation as resolved.
Show resolved Hide resolved
tag::dfas-feature-processors-ngram[]
The configuration information necessary to perform ngram encoding. Features
written out by this encoder have the following name format:
`<feature_prefix>.<ngram><string position>`. For example, if the
`feature_prefix` is `f`, the feature name for the second unigram in a string is
`f.11`.
end::dfas-feature-processors-ngram[]

tag::dfas-feature-processors-ngram-feat-pref[]
The feature name prefix. Defaults to `ngram_<start>_<length>`.
end::dfas-feature-processors-ngram-feat-pref[]

tag::dfas-feature-processors-ngram-field[]
The name of the text field to encode.
end::dfas-feature-processors-ngram-field[]

tag::dfas-feature-processors-ngram-length[]
Specifies the length of the ngram substring. Defaults to `50`. Must be greater
than `0`.
end::dfas-feature-processors-ngram-length[]

tag::dfas-feature-processors-ngram-ngrams[]
Specifies which ngrams to gather. It’s an array of integer values where the
minimum value is 1, and a maximum value is 5.
end::dfas-feature-processors-ngram-ngrams[]

tag::dfas-feature-processors-ngram-start[]
Specifies the zero-indexed start of the ngram substring. Negative values are
allowed for encoding ngram of string suffixes. Defaults to `0`.
end::dfas-feature-processors-ngram-start[]

tag::dfas-feature-processors-one-hot[]
The configuration information necessary to perform one hot encoding.
end::dfas-feature-processors-one-hot[]

tag::dfas-feature-processors-one-hot-map[]
The one hot map mapping the field value with the column name.
end::dfas-feature-processors-one-hot-map[]

tag::dfas-feature-processors-target-mean[]
The configuration information necessary to perform target mean encoding.
end::dfas-feature-processors-target-mean[]

tag::dfas-feature-processors-target-mean-default[]
The default value if field value is not found in the `target_map`.
end::dfas-feature-processors-target-mean-default[]

tag::dfas-feature-processors-target-mean-map[]
The field value to target mean transition map.
end::dfas-feature-processors-target-mean-map[]

tag::dfas-iteration[]
The number of iterations on the analysis.
end::dfas-iteration[]
Expand Down