From 5806c69e523da549bb05c5df823ad6a5f26a3edd Mon Sep 17 00:00:00 2001 From: Lisa Cawley Date: Mon, 20 Jul 2020 13:04:36 -0700 Subject: [PATCH] [DOCS] Changes level offset for data frame analytics pages (#1288) (#1290) --- .../df-analytics/dfa-classification.asciidoc | 18 +++++----- .../dfa-outlier-detection.asciidoc | 6 ++-- .../ml/df-analytics/dfa-regression.asciidoc | 16 ++++----- .../dfanalytics-examples.asciidoc | 4 +-- .../df-analytics/ecommerce-outliers.asciidoc | 2 +- .../flightdata-classification.asciidoc | 10 +++--- .../flightdata-regression.asciidoc | 10 +++--- .../ml/df-analytics/hyperparameters.asciidoc | 2 +- docs/en/stack/ml/df-analytics/index.asciidoc | 34 +++++++++---------- .../ml/df-analytics/ml-dfa-concepts.asciidoc | 2 +- .../df-analytics/ml-dfa-limitations.asciidoc | 30 ++++++++-------- .../ml/df-analytics/ml-dfa-overview.asciidoc | 2 +- .../ml/df-analytics/ml-dfa-phases.asciidoc | 10 +++--- .../df-analytics/ml-dfanalytics-apis.asciidoc | 2 +- .../ml-dfanalytics-evaluate.asciidoc | 20 +++++------ .../ml-feature-importance.asciidoc | 4 +-- .../ml/df-analytics/ml-inference.asciidoc | 8 ++--- .../ml/df-analytics/ml-lang-ident.asciidoc | 8 ++--- 18 files changed, 94 insertions(+), 94 deletions(-) diff --git a/docs/en/stack/ml/df-analytics/dfa-classification.asciidoc b/docs/en/stack/ml/df-analytics/dfa-classification.asciidoc index 77632fdc5..3c561647f 100644 --- a/docs/en/stack/ml/df-analytics/dfa-classification.asciidoc +++ b/docs/en/stack/ml/df-analytics/dfa-classification.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[dfa-classification]] -== {classification-cap} += {classification-cap} experimental[] @@ -28,7 +28,7 @@ about field selection, see the {ref}/explain-dfanalytics.html[explain data frame analytics API]. [[dfa-classification-supervised]] -=== Training the {classification} model +== Training the {classification} model {classification-cap} – just like {regression} – is a supervised {ml} process. When you create the {dfanalytics-job}, you must provide a data set that contains @@ -64,7 +64,7 @@ have a similar number of data points for each class. //// [[dfa-classification-algorithm]] -==== {classification-cap} algorithms +=== {classification-cap} algorithms //tag::classification-algorithms[] {classanalysis-cap} uses an ensemble algorithm that is a type of boosting called @@ -77,7 +77,7 @@ previous tree. //end::classification-algorithms[] [[dfa-classification-performance]] -=== {classification-cap} performance +== {classification-cap} performance As a rule of thumb, a {classanalysis} with many classes takes more time to run than a binary {classification} process when there are only two classes. The @@ -98,13 +98,13 @@ fields that are not relevant from the analysis by specifying `excludes` patterns in the `analyzed_fields` object when configuring the {dfanalytics-job}. [[dfa-classification-interpret]] -=== Interpreting {classification} results +== Interpreting {classification} results The following sections help you understand and interpret the results of a {classanalysis}. [[dfa-classification-class-probability]] -==== `class_probability` +=== `class_probability` The value of `class_probability` shows how likely it is that a given data point belongs to a certain class. It is a value between 0 and 1. The higher the @@ -115,7 +115,7 @@ in your destination index. See the section in the {classification} example. [[dfa-classification-class-score]] -==== `class_score` +=== `class_score` The value of `class_score` controls the probability at which a class label is assigned to a data point. In normal case – that you maximize the number of @@ -142,14 +142,14 @@ actual `class 0` predicted `class 1` errors, or in other words, a slight degradation of the overall accuracy. [[dfa-classification-feature-importance]] -==== {feat-imp-cap} +=== {feat-imp-cap} {feat-imp-cap} provides further information about the results of an analysis and helps to interpret the results in a more subtle way. If you want to learn more about {feat-imp}, <>. [[dfa-classification-evaluation]] -=== Measuring model performance +== Measuring model performance You can measure how well the model has performed on your data set by using the `classification` evaluation type of the diff --git a/docs/en/stack/ml/df-analytics/dfa-outlier-detection.asciidoc b/docs/en/stack/ml/df-analytics/dfa-outlier-detection.asciidoc index 81ce67fe6..b07a74f8a 100644 --- a/docs/en/stack/ml/df-analytics/dfa-outlier-detection.asciidoc +++ b/docs/en/stack/ml/df-analytics/dfa-outlier-detection.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[dfa-outlier-detection]] -== {oldetection-cap} += {oldetection-cap} experimental[] @@ -17,7 +17,7 @@ You can create {oldetection} {dfanalytics-jobs} in {kib} or by using the {ref}/put-dfanalytics.html[create {dfanalytics-jobs} API]. [[dfa-outlier-algorithms]] -=== {oldetection-cap} algorithms +== {oldetection-cap} algorithms //tag::outlier-detection-algorithms[] In the {stack}, we use an ensemble of four different distance and density based @@ -82,7 +82,7 @@ once. If new data comes into the index, you need to do the analysis again on the altered data. [[dfa-feature-influence]] -=== Feature influence +== Feature influence Besides the {olscore}, another value is calculated during {oldetection}: the feature influence score. As we mentioned, there are multiple features of a diff --git a/docs/en/stack/ml/df-analytics/dfa-regression.asciidoc b/docs/en/stack/ml/df-analytics/dfa-regression.asciidoc index 025cd4bfe..bca17863e 100644 --- a/docs/en/stack/ml/df-analytics/dfa-regression.asciidoc +++ b/docs/en/stack/ml/df-analytics/dfa-regression.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[dfa-regression]] -== {regression-cap} += {regression-cap} experimental[] @@ -32,7 +32,7 @@ on. All of these factors can be considered _features_; they are measurable properties or characteristics of the phenomenon we're studying. [[dfa-regression-features]] -=== {feature-vars-cap} +== {feature-vars-cap} When you perform {reganalysis}, you must identify a subset of fields that you want to use to create a model for predicting other fields. We refer to these @@ -52,7 +52,7 @@ algorithm: Arrays are not supported. [[dfa-regression-supervised]] -=== Training the {regression} model +== Training the {regression} model {regression-cap} is a supervised {ml} method, which means that you need to supply a labeled training data set that has some {feature-vars} and a {depvar}. @@ -73,7 +73,7 @@ predictions are combined. you must restart the {dfanalytics-job}. [[dfa-regression-algorithm]] -==== {regression-cap} algorithms +=== {regression-cap} algorithms //tag::regression-algorithms[] The ensemble learning technique that we use in the {stack} is a type of boosting @@ -82,7 +82,7 @@ gradient boosting methodologies. //end::regression-algorithms[] [[dfa-regression-lossfunction]] -==== Loss functions for {regression} analyses +=== Loss functions for {regression} analyses A loss function measures how well a given {ml} model fits the specific data set. It boils down all the different under- and overestimations of the model to a @@ -124,14 +124,14 @@ the impact of the different loss function parameters. [[dfa-regression-feature-importance]] -=== {feat-imp-cap} +== {feat-imp-cap} {feat-imp-cap} provides further information about the results of an analysis and helps to interpret the results in a more subtle way. If you want to learn more about {feat-imp}, <>. [[dfa-regression-evaluation]] -=== Measuring model performance +== Measuring model performance You can measure how well the model has performed on your training data set by using the `regression` evaluation type of the @@ -160,7 +160,7 @@ data set. For more information about the evaluation metrics, see <>. [[dfa-regression-readings]] -=== Further readings +== Further readings * https://github.com/elastic/examples/tree/master/Machine%20Learning/Feature%20Importance[Feature importance for {dfanalytics} (Jupyter notebook)] diff --git a/docs/en/stack/ml/df-analytics/dfanalytics-examples.asciidoc b/docs/en/stack/ml/df-analytics/dfanalytics-examples.asciidoc index 65afb7c18..5e02d91d6 100644 --- a/docs/en/stack/ml/df-analytics/dfanalytics-examples.asciidoc +++ b/docs/en/stack/ml/df-analytics/dfanalytics-examples.asciidoc @@ -1,7 +1,7 @@ [role="xpack"] [testenv="platinum"] [[dfanalytics-examples]] -== {dfanalytics-cap} examples += {dfanalytics-cap} examples ++++ Examples ++++ @@ -21,7 +21,7 @@ from your data. [discrete] [[dfanalytics-examples-blog-posts]] -=== {dfanalytics-cap} examples in blog posts +== {dfanalytics-cap} examples in blog posts The blog posts listed below show how to get the most out of Elastic {ml} {dfanalytics}. diff --git a/docs/en/stack/ml/df-analytics/ecommerce-outliers.asciidoc b/docs/en/stack/ml/df-analytics/ecommerce-outliers.asciidoc index 95b18d12a..ef9922b38 100644 --- a/docs/en/stack/ml/df-analytics/ecommerce-outliers.asciidoc +++ b/docs/en/stack/ml/df-analytics/ecommerce-outliers.asciidoc @@ -1,7 +1,7 @@ [role="xpack"] [testenv="platinum"] [[ecommerce-outliers]] -== Finding outliers in the eCommerce sample data += Finding outliers in the eCommerce sample data beta[] diff --git a/docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc b/docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc index 1aab7ad74..4d2306d09 100644 --- a/docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc +++ b/docs/en/stack/ml/df-analytics/flightdata-classification.asciidoc @@ -1,7 +1,7 @@ [role="xpack"] [testenv="platinum"] [[flightdata-classification]] -== Predicting delayed flights with {classanalysis} += Predicting delayed flights with {classanalysis} Let's try to predict whether a flight will be delayed or not by using the {kibana-ref}/add-sample-data.html[sample flight data]. The data set contains @@ -16,7 +16,7 @@ TIP: If you want to view this example in a Jupyter notebook, https://github.com/elastic/examples/tree/master/Machine%20Learning/Analytics%20Jupyter%20Notebooks[click here]. [[flightdata-classification-data]] -=== Preparing your data +== Preparing your data Each document in the sample flight data set contains details for a single flight, so this data is ready for analysis; it is already in a two-dimensional @@ -88,7 +88,7 @@ a good reminder that the quality of your input data affects the quality of your results. [[flightdata-classification-model]] -=== Creating a {classification} model +== Creating a {classification} model To predict whether a specific flight is delayed: @@ -280,7 +280,7 @@ The API call returns the following response: -- [[flightdata-classification-results]] -=== Viewing {classification} results +== Viewing {classification} results Now you have a new index that contains a copy of your source data with predictions for your dependent variable. @@ -372,7 +372,7 @@ any class. ==== [[flightdata-classification-evaluate]] -=== Evaluating {classification} results +== Evaluating {classification} results Though you can look at individual results and compare the predicted value (`ml.FlightDelay_prediction`) to the actual value (`FlightDelay`), you diff --git a/docs/en/stack/ml/df-analytics/flightdata-regression.asciidoc b/docs/en/stack/ml/df-analytics/flightdata-regression.asciidoc index 35c00359f..aa7d508b7 100644 --- a/docs/en/stack/ml/df-analytics/flightdata-regression.asciidoc +++ b/docs/en/stack/ml/df-analytics/flightdata-regression.asciidoc @@ -1,7 +1,7 @@ [role="xpack"] [testenv="platinum"] [[flightdata-regression]] -== Predicting flight delays with {reganalysis} += Predicting flight delays with {reganalysis} Let's try to predict flight delays by using the {kibana-ref}/add-sample-data.html[sample flight data]. The data set contains @@ -13,7 +13,7 @@ _dependent variable_, which in this case is the numeric `FlightDelayMins` field. For an overview of these concepts, see <>. [[flightdata-regression-data]] -=== Preparing your data +== Preparing your data Each document in the data set contains details for a single flight, so this data is ready for analysis; it is already in a two-dimensional entity-based data @@ -85,7 +85,7 @@ a good reminder that the quality of your input data affects the quality of your results. [[flightdata-regression-model]] -=== Creating a {regression} model +== Creating a {regression} model To predict the number of minutes delayed for each flight: @@ -285,7 +285,7 @@ The API call returns the following response: -- [[flightdata-regression-results]] -=== Viewing {regression} results +== Viewing {regression} results Now you have a new index that contains a copy of your source data with predictions for your dependent variable. @@ -334,7 +334,7 @@ The snippet below shows a part of a document with the annotated results: ==== [[flightdata-regression-evaluate]] -=== Evaluating {regression} results +== Evaluating {regression} results Though you can look at individual results and compare the predicted value (`ml.FlightDelayMin_prediction`) to the actual value (`FlightDelayMins`), you diff --git a/docs/en/stack/ml/df-analytics/hyperparameters.asciidoc b/docs/en/stack/ml/df-analytics/hyperparameters.asciidoc index fa2163d18..91a19a460 100644 --- a/docs/en/stack/ml/df-analytics/hyperparameters.asciidoc +++ b/docs/en/stack/ml/df-analytics/hyperparameters.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[hyperparameters]] -== Hyperparameter optimization += Hyperparameter optimization experimental[] diff --git a/docs/en/stack/ml/df-analytics/index.asciidoc b/docs/en/stack/ml/df-analytics/index.asciidoc index 56b86fa49..121ee420a 100644 --- a/docs/en/stack/ml/df-analytics/index.asciidoc +++ b/docs/en/stack/ml/df-analytics/index.asciidoc @@ -1,23 +1,23 @@ include::ml-dfanalytics.asciidoc[] -include::ml-dfa-overview.asciidoc[] -include::ml-dfa-phases.asciidoc[leveloffset=+1] +include::ml-dfa-overview.asciidoc[leveloffset=+1] +include::ml-dfa-phases.asciidoc[leveloffset=+2] -include::ml-dfa-concepts.asciidoc[] -include::dfa-outlier-detection.asciidoc[leveloffset=+1] -include::dfa-regression.asciidoc[leveloffset=+1] -include::dfa-classification.asciidoc[leveloffset=+1] -include::ml-inference.asciidoc[leveloffset=+1] -include::ml-dfanalytics-evaluate.asciidoc[leveloffset=+1] -include::ml-feature-importance.asciidoc[leveloffset=+1] -include::hyperparameters.asciidoc[leveloffset=+1] +include::ml-dfa-concepts.asciidoc[leveloffset=+1] +include::dfa-outlier-detection.asciidoc[leveloffset=+2] +include::dfa-regression.asciidoc[leveloffset=+2] +include::dfa-classification.asciidoc[leveloffset=+2] +include::ml-inference.asciidoc[leveloffset=+2] +include::ml-dfanalytics-evaluate.asciidoc[leveloffset=+2] +include::ml-feature-importance.asciidoc[leveloffset=+2] +include::hyperparameters.asciidoc[leveloffset=+2] -include::ml-dfanalytics-apis.asciidoc[] +include::ml-dfanalytics-apis.asciidoc[leveloffset=+1] -include::dfanalytics-examples.asciidoc[] -include::ecommerce-outliers.asciidoc[leveloffset=+1] -include::flightdata-regression.asciidoc[leveloffset=+1] -include::flightdata-classification.asciidoc[leveloffset=+1] -include::ml-lang-ident.asciidoc[leveloffset=+1] +include::dfanalytics-examples.asciidoc[leveloffset=+1] +include::ecommerce-outliers.asciidoc[leveloffset=+2] +include::flightdata-regression.asciidoc[leveloffset=+2] +include::flightdata-classification.asciidoc[leveloffset=+2] +include::ml-lang-ident.asciidoc[leveloffset=+2] -include::ml-dfa-limitations.asciidoc[] \ No newline at end of file +include::ml-dfa-limitations.asciidoc[leveloffset=+1] \ No newline at end of file diff --git a/docs/en/stack/ml/df-analytics/ml-dfa-concepts.asciidoc b/docs/en/stack/ml/df-analytics/ml-dfa-concepts.asciidoc index 021428c40..227f1f060 100644 --- a/docs/en/stack/ml/df-analytics/ml-dfa-concepts.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-dfa-concepts.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-dfa-concepts]] -== Concepts += Concepts This section explains the fundamental concepts of the Elastic {ml} {dfanalytics} feature and the corresponding {evaluatedf-api}. diff --git a/docs/en/stack/ml/df-analytics/ml-dfa-limitations.asciidoc b/docs/en/stack/ml/df-analytics/ml-dfa-limitations.asciidoc index 5972a65bd..db69b8a15 100644 --- a/docs/en/stack/ml/df-analytics/ml-dfa-limitations.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-dfa-limitations.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-dfa-limitations]] -== {dfanalytics-cap} limitations += {dfanalytics-cap} limitations [subs="attributes"] ++++ Limitations @@ -13,13 +13,13 @@ the Elastic {dfanalytics} feature: [float] [[dfa-ccs-limitations]] -=== {ccs-cap} is not supported +== {ccs-cap} is not supported {ccs-cap} is not supported for {dfanalytics}. [float] [[dfa-deletion-limitations]] -=== Deleting a {dfanalytics-job} does not delete the destination index +== Deleting a {dfanalytics-job} does not delete the destination index The {ref}/delete-dfanalytics.html[delete {dfanalytics-job} API] does not delete the destination index that contains the annotated data of the {dfanalytics}. @@ -27,14 +27,14 @@ That index must be deleted separately. [float] [[dfa-update-limitations]] -=== {dfanalytics-jobs-cap} cannot be updated +== {dfanalytics-jobs-cap} cannot be updated You cannot update {dfanalytics} configurations. Instead, delete the {dfanalytics-job} and create a new one. [float] [[dfa-dataframe-size-limitations]] -=== {dfanalytics-cap} memory limitation +== {dfanalytics-cap} memory limitation {dfanalytics-cap} can only perform analyses that fit into the memory available for {ml}. Overspill to disk is not currently possible. For general {ml} @@ -42,7 +42,7 @@ settings, see {ref}/ml-settings.html[{ml-cap} settings in {es}]. [float] [[dfa-time-limitations]] -=== {dfanalytics-jobs-cap} runtime may vary +== {dfanalytics-jobs-cap} runtime may vary The runtime of {dfanalytics-jobs} depends on numerous factors, such as the number of data points in the data set, the type of analytics, the number of @@ -60,7 +60,7 @@ with an increased training percentage. [float] [[dfa-missing-fields-limitations]] -=== Documents with missing values in analyzed fields are skipped +== Documents with missing values in analyzed fields are skipped If there are missing values in feature fields (fields that are subjects of the {dfanalytics}), the document that contains these fields is skipped @@ -68,7 +68,7 @@ during the analysis. [float] [[dfa-od-field-type-docs-limitations]] -=== {oldetection-cap} field types +== {oldetection-cap} field types {oldetection-cap} requires numeric or boolean data to analyze. The algorithms don't support missing values (see also <>), @@ -81,7 +81,7 @@ therefore no {olscore} is computed. [float] [[dfa-regression-field-type-docs-limitations]] -=== {regression-cap} field types +== {regression-cap} field types {regression-cap} supports fields that are numeric, boolean, text, keyword and ip. It is also tolerant of missing values. Fields that are supported are @@ -91,7 +91,7 @@ that don't contain a results field are not included in the {reganalysis}. [float] [[dfa-classification-field-type-docs-limitations]] -=== {classification-cap} field types +== {classification-cap} field types {classification-cap} supports fields that have numeric, boolean, text, keyword, or ip data types. It is also tolerant of missing values. Fields that are @@ -102,7 +102,7 @@ destination index that don't contain a results field are not included in the [float] [[dfa-classification-imbalanced-classes]] -=== Imbalanced class sizes affect {classification} performance +== Imbalanced class sizes affect {classification} performance If your training data is very imbalanced, {classanalysis} may not provide good predictions. Try to avoid highly imbalanced situations. We recommend having @@ -113,7 +113,7 @@ minority class, or gathering more data. [float] [[dfa-inference-nested-limitation]] -=== Deeply nested objects affect {infer} performance +== Deeply nested objects affect {infer} performance If the data that you run inference against contains documents that have a series of combinations of dot delimited and nested fields (for example: @@ -123,7 +123,7 @@ performance profile. [float] [[dfa-feature-importance-limitation]] -=== Analytics runtime performance may significantly slow down with feature importance computation +== Analytics runtime performance may significantly slow down with feature importance computation For complex models (such as those with many deep trees), the calculation of feature importance takes significantly more time. Feature importance is @@ -138,7 +138,7 @@ values, or only selecting fields that are relevant for analysis. [float] [[dfa-inference-bwc]] -=== {infer-cap} trained models created in 7.8 are not backwards compatible +== {infer-cap} trained models created in 7.8 are not backwards compatible Inference models created in version 7.8.0 are not backwards compatible with older node versions. In a mixed cluster environment, all nodes must be at @@ -146,7 +146,7 @@ least 7.8.0 to use a model created on a 7.8.0 node. [discrete] [[dfa-scheduling-priority]] -=== CPU scheduling improvements apply to Linux and MacOS only +== CPU scheduling improvements apply to Linux and MacOS only When there are many {ml} jobs running at the same time and there are insufficient CPU resources, the JVM performance must be prioritized so search and indexing diff --git a/docs/en/stack/ml/df-analytics/ml-dfa-overview.asciidoc b/docs/en/stack/ml/df-analytics/ml-dfa-overview.asciidoc index a58778f29..76343b063 100644 --- a/docs/en/stack/ml/df-analytics/ml-dfa-overview.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-dfa-overview.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-dfa-overview]] -== Overview += Overview {dfanalytics-cap} enable you to perform different analyses of your data and annotate it with the results. By doing this, it provides additional insights diff --git a/docs/en/stack/ml/df-analytics/ml-dfa-phases.asciidoc b/docs/en/stack/ml/df-analytics/ml-dfa-phases.asciidoc index 9b54aea26..42c696270 100644 --- a/docs/en/stack/ml/df-analytics/ml-dfa-phases.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-dfa-phases.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-dfa-phases]] -== How a {dfanalytics-job} works += How a {dfanalytics-job} works A {dfanalytics-job} is essentially a persistent {es} task. During its life cycle, it goes through four main phases: @@ -13,7 +13,7 @@ cycle, it goes through four main phases: Let's take a look at the phases one-by-one. [[ml-dfa-phases-reindex]] -=== Reindexing +== Reindexing During the reindexing phase the documents from the source index or indices are copied to the destination index. If you want to define settings or mappings, @@ -24,14 +24,14 @@ Once the destination index is built, the {dfanalytics-job} task calls the {es} {ref}/docs-reindex.html[Reindex API] to launch the reindexing task. [[ml-dfa-phases-load]] -=== Loading data +== Loading data After the reindexing is finished, the job fetches the needed data from the destination index. It converts the data into the format that the analysis process expects, then sends it to the analysis process. [[ml-dfa-phases-analyze]] -=== Analyzing +== Analyzing In this phase, the job generates a {ml} model for analyzing the data. The specific phases of analysis vary depending on the type of {dfanalytics-job}. @@ -49,7 +49,7 @@ in which they identify outliers in the data. . `final_training`: Trains the {ml} model. [[ml-dfa-phases-write]] -=== Writing results +== Writing results After the loaded data is analyzed, the analysis process sends back the results. Only the additional fields that the analysis calculated are written back, the diff --git a/docs/en/stack/ml/df-analytics/ml-dfanalytics-apis.asciidoc b/docs/en/stack/ml/df-analytics/ml-dfanalytics-apis.asciidoc index 5c9df8398..ee73d7983 100644 --- a/docs/en/stack/ml/df-analytics/ml-dfanalytics-apis.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-dfanalytics-apis.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-dfanalytics-apis]] -== API quick reference += API quick reference All {dfanalytics} endpoints have the following base: diff --git a/docs/en/stack/ml/df-analytics/ml-dfanalytics-evaluate.asciidoc b/docs/en/stack/ml/df-analytics/ml-dfanalytics-evaluate.asciidoc index c1a3f5c56..2d9085c49 100644 --- a/docs/en/stack/ml/df-analytics/ml-dfanalytics-evaluate.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-dfanalytics-evaluate.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-dfanalytics-evaluate]] -== Evaluating {dfanalytics} += Evaluating {dfanalytics} experimental[] @@ -25,7 +25,7 @@ not. The {evaluatedf-api} evaluates the performance of the {dfanalytics} against this manually provided ground truth. [[ml-dfanalytics-binary-soft-classification]] -=== {binarysc-cap} evaluation +== {binarysc-cap} evaluation This evaluation type is suitable for analyses which calculate a probability that each data point in a data set is a member of a class or not. The {binarysc} @@ -37,7 +37,7 @@ evaluation type offers the following metrics to evaluate the model performance: * receiver operating characteristic (ROC) curve. [[ml-dfanalytics-confusion-matrix]] -==== Confusion matrix +=== Confusion matrix A confusion matrix provides four measures of how well the {dfanalytics} worked on your data set: @@ -65,7 +65,7 @@ To take this complexity into account, the {evaluatedf-api} returns the confusion matrix at different thresholds (by default, 0.25, 0.5, and 0.75). [[ml-dfanalytics-precision-recall]] -==== Precision and recall +=== Precision and recall A confusion matrix is a useful measure, but it could be hard to compare the results across the different algorithms. Precision and recall values @@ -85,7 +85,7 @@ As was the case for the confusion matrix, you also need to define different threshold levels for computing precision and recall. [[ml-dfanalytics-roc]] -==== Receiver operating characteristic curve +=== Receiver operating characteristic curve The receiver operating characteristic (ROC) curve is a plot that represents the performance of the binary classification process at different thresholds. It @@ -99,7 +99,7 @@ positive rate (`tpr`) at the different threshold levels, so you can visualize the algorithm performance by using these values. [[ml-dfanalytics-regression-evaluation]] -=== {regression-cap} evaluation +== {regression-cap} evaluation This evaluation type is suitable for evaluating {regression} models. The {regression} evaluation type offers the following metrics to evaluate the model @@ -109,7 +109,7 @@ performance: * R-squared (R^2^) [[ml-dfanalytics-mse]] -==== Mean squared error +=== Mean squared error The API provides a MSE by computing the average squared sum of the difference between the true value and the value that the {regression} model predicted. @@ -117,7 +117,7 @@ between the true value and the value that the {regression} model predicted. the {reganalysis} model is performing. [[ml-dfanalytics-r-sqared]] -==== R-squared +=== R-squared Another evaluation metrics for {reganalysis} is R-squared (R^2^). It represents the goodness of fit and measures how much of the variation in the data the @@ -128,7 +128,7 @@ value of 0.5 for R^2^ would indicate that, the predictions are 1 - 0.5^(1/2)^ (about 30%) closer to true values than their mean. [[ml-dfanalytics-classification]] -=== {classification-cap} evaluation +== {classification-cap} evaluation This evaluation type is suitable for evaluating {classification} models. The {classification} evaluation offers the following metrics to evaluate the model @@ -137,7 +137,7 @@ performance: * Multiclass confusion matrix [[ml-dfanalytics-mccm]] -==== Multiclass confusion matrix +=== Multiclass confusion matrix The multiclass confusion matrix provides a summary of the performance of the {classanalysis}. It contains the number of occurrences where the analysis diff --git a/docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc b/docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc index a3a0241bb..b5cea3abc 100644 --- a/docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-feature-importance]] -== {feat-imp-cap} += {feat-imp-cap} {feat-imp-cap} values indicate which fields had the biggest impact on each prediction that is generated by <> or @@ -30,6 +30,6 @@ NOTE: The number of {feat-imp} values for each document might be less than the features that had a positive or negative effect on the prediction. [[ml-feature-importance-readings]] -=== Further readings +== Further readings https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}] diff --git a/docs/en/stack/ml/df-analytics/ml-inference.asciidoc b/docs/en/stack/ml/df-analytics/ml-inference.asciidoc index ebe849d93..316553e73 100644 --- a/docs/en/stack/ml/df-analytics/ml-inference.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-inference.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-inference]] -== {infer-cap} += {infer-cap} experimental[] @@ -21,7 +21,7 @@ Let's take a closer look at the machinery behind {infer}. [[ml-inference-models]] -=== Trained {ml} models as functions +== Trained {ml} models as functions When you create a {dfanalytics-job} that executes a supervised process, you need to train a {ml} model on a training dataset to be able to make predictions on @@ -36,7 +36,7 @@ more information and configuration details, check the <> page. [[ml-inference-processor]] -=== {infer-cap} processor +== {infer-cap} processor {infer-cap} can be used as a processor specified in an {ref}/pipeline.html[ingest pipeline]. It uses a stored {dfanalytics} model to @@ -52,7 +52,7 @@ learn more about the feature. [[ml-inference-aggregation]] -=== {infer-cap} aggregation +== {infer-cap} aggregation {infer-cap} can also be used as a pipeline aggregation. You can reference a pre-trained {dfanalytics} model in the aggregation to infer on the result field diff --git a/docs/en/stack/ml/df-analytics/ml-lang-ident.asciidoc b/docs/en/stack/ml/df-analytics/ml-lang-ident.asciidoc index 961c6477c..d8715c280 100644 --- a/docs/en/stack/ml/df-analytics/ml-lang-ident.asciidoc +++ b/docs/en/stack/ml/df-analytics/ml-lang-ident.asciidoc @@ -1,6 +1,6 @@ [role="xpack"] [[ml-lang-ident]] -== {lang-ident-cap} += {lang-ident-cap} experimental[] @@ -26,7 +26,7 @@ languages table (see below) with the `Latn` subtag. {lang-ident-cap} supports Unicode input. [[ml-lang-ident-supported-languages]] -=== Supported languages +== Supported languages The table below contains the ISO codes and the English names of the languages that {lang-ident} supports. If a language has a 2-letter `ISO 639-1` code, the @@ -78,7 +78,7 @@ script. |=== [[ml-lang-ident-example]] -=== Example of {lang-ident} +== Example of {lang-ident} In the following example, we feed the {lang-ident} trained model a short Hungarian text that contains diacritics and a couple of English words. The @@ -186,6 +186,6 @@ The request returns the following response: <2> The ISO identifier of the language with the highest probability. [[ml-lang-ident-readings]] -=== Further readings +== Further readings https://www.elastic.co/blog/multilingual-search-using-language-identification-in-elasticsearch[Multilingual search using language identification in Elasticsearch]