Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Adds NLP folders to ML Guide #1903

Merged
merged 2 commits into from
Dec 6, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/en/stack/ml/df-analytics/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ include::ml-dfa-overview.asciidoc[leveloffset=+1]
include::ml-dfa-outlier-detection.asciidoc[leveloffset=+1]
include::ml-dfa-regression.asciidoc[leveloffset=+1]
include::ml-dfa-classification.asciidoc[leveloffset=+1]
include::ml-dfa-lang-ident.asciidoc[leveloffset=+1]

include::ml-dfa-concepts.asciidoc[leveloffset=+1]
include::ml-how-dfa-works.asciidoc[leveloffset=+2]
Expand Down
20 changes: 3 additions & 17 deletions docs/en/stack/ml/df-analytics/ml-dfanalytics-apis.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,40 +18,26 @@ The evaluation API endpoint has the following base:
----
// NOTCONSOLE

All the trained models endpoints have the following base:

[source,js]
----
/_ml/trained_models/
----
// NOTCONSOLE

// CREATE
* {ref}/put-dfanalytics.html[Create {dfanalytics-jobs}]
* {ref}/put-trained-models-aliases.html[Create trained model aliases]
* {ref}/put-trained-model-definition-part.html[Create trained model definition part]
* {ref}/put-trained-models.html[Create trained models]
// DELETE
* {ref}/delete-dfanalytics.html[Delete {dfanalytics-jobs}]
* {ref}/delete-trained-models.html[Delete trained models]
// EVALUATE
* {ref}/evaluate-dfanalytics.html[Evaluate {dfanalytics}]
// EXPLAIN
* {ref}/explain-dfanalytics.html[Explain {dfanalytics}]
// GET
* {ref}/get-dfanalytics.html[Get {dfanalytics-jobs} info]
* {ref}/get-dfanalytics-stats.html[Get {dfanalytics-jobs} statistics]
* {ref}/get-trained-models.html[Get trained models]
* {ref}/get-trained-models-stats.html[Get trained models statistics]
// INFER
* {ref}//infer-trained-model-deployment.html[Infer trained model deployment]
// PREVIEW
* {ref}/preview-dfanalytics.html[Preview {dfanalytics}]
// START
* {ref}/start-dfanalytics.html[Start {dfanalytics-jobs}]
// STOP
* {ref}/stop-dfanalytics.html[Stop {dfanalytics-jobs}]
* {ref}/stop-trained-model-deployment.html[Stop trained model deployment]
// UPDATE
* {ref}/update-dfanalytics.html[Update {dfanalytics-jobs}]

For information about the APIs related to trained models, refer to
<<ml-nlp-apis>>.

2 changes: 2 additions & 0 deletions docs/en/stack/ml/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,6 @@ include::anomaly-detection/index.asciidoc[]

include::df-analytics/index.asciidoc[]

include::nlp/index.asciidoc[]

include::redirects.asciidoc[]
5 changes: 5 additions & 0 deletions docs/en/stack/ml/nlp/index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
include::ml-nlp.asciidoc[]
include::ml-nlp-overview.asciidoc[leveloffset=+1]
include::ml-nlp-lang-ident.asciidoc[leveloffset=+2]
include::ml-nlp-apis.asciidoc[leveloffset=+1]

29 changes: 29 additions & 0 deletions docs/en/stack/ml/nlp/ml-nlp-apis.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[[ml-nlp-apis]]
= API quick reference

All the trained models endpoints have the following base:

[source,js]
----
/_ml/trained_models/
----
// NOTCONSOLE

// CREATE
* {ref}/put-trained-models-aliases.html[Create trained model aliases]
* {ref}/put-trained-model-definition-part.html[Create trained model definition part]
* {ref}/put-trained-models.html[Create trained models]
// DELETE
* {ref}/delete-trained-models.html[Delete trained models]
// GET
* {ref}/get-trained-models.html[Get trained models]
* {ref}/get-trained-models-stats.html[Get trained models statistics]
// INFER
* {ref}/infer-trained-model-deployment.html[Infer trained model deployment]
// START
* {ref}//start-trained-model-deployment.html[Start trained model deployment]
// STOP
* {ref}/stop-trained-model-deployment.html[Stop trained model deployment]
// UPDATE
* {ref}/put-trained-models-aliases.html[Update trained model aliases]

Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
[role="xpack"]
[[ml-dfa-lang-ident]]
[[ml-nlp-lang-ident]]
= {lang-ident-cap}

:keywords: {ml-init}, {stack}, {dfanalytics}, {lang-ident}
Expand Down
44 changes: 44 additions & 0 deletions docs/en/stack/ml/nlp/ml-nlp-overview.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
[[ml-nlp-overview]]
= Overview

{nlp-cap} (NLP) refers to the way in which we can use software to understand
natural language in spoken word or written text.

Classically, NLP was performed using linguistic rules, dictionaries, regular
expressions, and {ml} for specific tasks such as automatic categorization or
summarization of text. In recent years, however, deep learning techniques have
taken over much of the NLP landscape. Deep learning capitalizes on the
availability of large scale data sets, cheap computation, and techniques for
learning at scale with less human involvement. Pre-trained language models that
use a transformer architecture have been particularly successful. For example,
BERT is a pre-trained language model that was released by Google in 2018. Since
that time, it has become the inspiration for most of today’s modern NLP
techniques. The {stack} {ml} features are structured around BERT and
transformer models. These features support BERT’s tokenization scheme (called
WordPiece) and transformer models that conform to the standard BERT model
interface.

To incorporate transformer models and make predictions, {es} uses libtorch,
which is an underlying native library for PyTorch. Trained models must be in a
TorchScript representation for use with {stack} {ml} features.

As in the cases of <<ml-dfa-classification,classification>> and
<<ml-dfa-regression,regression>>, after you deploy a model to your cluster, you
can use it to make predictions (also known as _inference_) against incoming data.
You can perform the following NLP tasks:

Extract information::
* _Named entity recognition (NER)_ enables you to identify and categorize entities
in your text.
* _Fill masks_ enable you to predict missing words in text sequences.

Categorize text::
* <<ml-nlp-lang-ident,Language identification>> enables you to determine the
language of text.
* _Text classification_ enables you to classify input text.
* _Zero-shot text classification_ performs classification without requiring a
specialized model.

Search and compare text::
* _Text embedding_ turns content into vectors, which enables you to compare text
by using mathematical functions.
16 changes: 16 additions & 0 deletions docs/en/stack/ml/nlp/ml-nlp.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[[ml-nlp]]
= {nlp-cap}

:keywords: {ml-init}, {stack}, {nlp}, overview
:description: An introduction to {ml} {nlp} features.

[partintro]
--

You can use {stack-ml-features} to analyze natural language data and make
predictions.

* <<ml-nlp-overview>>
* <<ml-nlp-apis>>

--
5 changes: 5 additions & 0 deletions docs/en/stack/ml/redirects.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,8 @@ This content has moved. See <<sample-data-forecasts>>.
=== Next steps

This content has moved. See <<sample-data-next>>.

[role="exclude",id="ml-dfa-lang-ident"]
=== Language identification

This content has moved. See <<ml-nlp-lang-ident>>.