-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add getting started content #6834
Conversation
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done! Clear, simple instructions that cover the key concepts and information users need. It clarified my understanding of OpenSearch terms and use cases too :)
_getting-started/communicate.md
Outdated
|
||
You interact with OpenSearch clusters using the REST API, which offers a lot of flexibility. Through the REST API, you can change most OpenSearch settings, modify indexes, check the health of the cluster, get statistics---almost everything. You can use clients like [cURL](https://curl.se/) or any programming language that can send HTTP requests. | ||
|
||
You can send HTTP requests in your terminal or in the Dev Tools console in OpenSearch Dashboards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "Dev Tools console" hyperlink to the documentation? {{site.url}}{{site.baseurl}}/dashboards/dev-tools/index-dev/
_getting-started/communicate.md
Outdated
|
||
For more information about `pretty` and other useful query parameters, see [Common REST parameters]({{site.url}}{{site.baseurl}}/opensearch/common-parameters/). | ||
|
||
For requests that contain a body, specify the `Content-Type` header and provide the request payload in the `-d` (data) oprion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For requests that contain a body, specify the `Content-Type` header and provide the request payload in the `-d` (data) oprion: | |
For requests that contain a body, specify the `Content-Type` header and provide the request payload in the `-d` (data) option: |
Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
_getting-started/search-data.md
Outdated
} | ||
``` | ||
|
||
Both `John Doe` and `Jane Doe` matched the word `doe`, but `John Doe` is scored higher because it also matched `john`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth mentioning that the match
query type uses OR
as an operator by default, so the query is functionally doe OR john
.
_getting-started/search-data.md
Outdated
|
||
## Search methods | ||
|
||
Along with the traditional BM25 search described in this tutorial, OpenSearch supports a range of machine learning (ML)-powered search methods, including k-NN, semantic, multimodal, sparse, hybrid, and conversational search. For information about all search methods, see [Search]({{site.url}}{{site.baseurl}}/search-plugins/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the latest commit removed the description of BM25. (I think this is the only mention of BM25 in the page now.)
Maybe "Along with the traditional full-text search described..." ?
_getting-started/intro.md
Outdated
Any index changes, such as document indexing or deletion, are written to disk during a Lucene commit. However, Lucene commits are expensive operations, so they cannot be performed after every change to the index. Instead, each shard records every indexing operation in a transaction log called _translog_. When a document is indexed, it is added to the memory buffer and recorded in the translog. After a process or host restart, any data in the in-memory buffer is lost. Recording the document in the translog ensures durability because the translog is written to disk. | ||
|
||
Frequent refresh operations write the documents in the memory buffer to a segment and then clear the memory buffer. Periodically, a [flush](#flush) performs a Lucene commit, which includes writing the segments to disk using `fsync`, purging the old translog, and starting a new translog. Thus, a translog contains all operations that have not yet been flushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this might be getting too detailed while muddying the key thing that users might need to know ("When is my data durable? When is my data searchable?").
I think we could say something like:
An indexing or bulk call responds when the documents have been written to the translog and the translog is flushed to disk, so the updates are durable. The updates will be visible from search requests until after a refresh operation (see below).
I almost feel like it would help to document these as steps in the lifecycle of an update, like:
- An update is received by a primary shard and gets written to the shard's transaction log, which is flushed to disk (followed by an
fsync
) before the update is acknowledged. This guarantees durability. - The update is also passed to the Lucene index writer, which adds it to an in-memory buffer.
- On refresh, the Lucene index writer flushes the in-memory buffers to disk (with each buffer becoming a new Lucene segment), and a new index reader is opened over the resulting segment files. The updates are now visible for search.
- On a flush operation, the shard
fsync
s the Lucene segments. Since the segment files are a durable representation of the updates, the translog is no longer needed to provide durability, do the updates can be purged from the translog.
If the OpenSearch process is terminated between the end of step 1 (when the update has been acknowledged) and the end of step 4 (when the updated Lucene segments have been flushed to disk), the updates will be replayed from the translog when the process restarts.
@smacrakis -- we talked briefly about this content. Is the above clearer or still too in-the-weeds?
_getting-started/communicate.md
Outdated
} | ||
``` | ||
|
||
You cannot change the mappings once the index is created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some mapping changes that are allowed. For example, new fields can be added. I believe you can change the search analyzer associated with a field.
Maybe "You cannot change the type of a field once it is created" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to your suggestion. Also added "Changing a field type requires deleting the index and recreating it with the new mappings."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Great job putting this all together 😄. Please see my comments and changes and let me know if you have any questions. Thanks!
_getting-started/search-data.md
Outdated
``` | ||
{% include copy-curl.html %} | ||
|
||
This request returns no hits because the `keyword` fields must be matched exactly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This request returns no hits because the `keyword` fields must be matched exactly. | |
Then the request returns no hits because the `keyword` fields must exactly match. |
_getting-started/search-data.md
Outdated
|
||
This request returns no hits because the `keyword` fields must be matched exactly. | ||
|
||
However, you can search for the exact text `John Doe`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment. Would this be better structured as "However, if you search for the exact text John Doe
:
[Example]
Then OpenSearch returns..."?
_getting-started/search-data.md
Outdated
|
||
### Filters | ||
|
||
You can add a filter clause to your query for fields with exact values using a Boolean query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax here is slightly confusing. Do we mean "Using a Boolean query, you can add a filter clause to your query for fields with exact values"?
_getting-started/search-data.md
Outdated
``` | ||
{% include copy-curl.html %} | ||
|
||
Range filters support specifying a range of values. For example, the following Boolean query searches for students whose GPA is greater than 3.6: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either "Range filters specify a range of values", "Range filters allow you to specify a range of values", or "With range filters, you can support a range of values".
_getting-started/search-data.md
Outdated
|
||
## Search methods | ||
|
||
Along with the traditional full-text search described in this tutorial, OpenSearch supports a range of machine learning (ML)-powered search methods, including k-NN, semantic, multimodal, sparse, hybrid, and conversational search. For information about all search methods, see [Search]({{site.url}}{{site.baseurl}}/search-plugins/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along with the traditional full-text search described in this tutorial, OpenSearch supports a range of machine learning (ML)-powered search methods, including k-NN, semantic, multimodal, sparse, hybrid, and conversational search. For information about all search methods, see [Search]({{site.url}}{{site.baseurl}}/search-plugins/). | |
Along with the traditional full-text search described in this tutorial, OpenSearch supports a range of machine learning (ML)-powered search methods, including k-NN, semantic, multimodal, sparse, hybrid, and conversational search. For information about all OpenSearch-supported search methods, see [Search]({{site.url}}{{site.baseurl}}/search-plugins/). |
_getting-started/intro.md
Outdated
|
||
- In a database of students, a document might represent one student. | ||
- When you search for information, OpenSearch returns documents related to your search. | ||
- If you're familiar with traditional databases, a document represents a row. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If you're familiar with traditional databases, a document represents a row. | |
- A document represents a row in a traditional database. |
_getting-started/intro.md
Outdated
|
||
You can think of an index in several ways: | ||
|
||
- If you have a collection of encyclopedia articles, an index represents the whole collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If you have a collection of encyclopedia articles, an index represents the whole collection. | |
- In a database of students, an index represents all students in the database. |
_getting-started/intro.md
Outdated
|
||
- If you have a collection of encyclopedia articles, an index represents the whole collection. | ||
- When you search for information, you query data contained in an index. | ||
- If you're familiar with traditional databases, a document represents a database table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If you're familiar with traditional databases, a document represents a database table. | |
- An index represents a database table in a traditional database. |
- When you search for information, you query data contained in an index. | ||
- If you're familiar with traditional databases, a document represents a database table. | ||
|
||
For example, in a school database, an index might contain all students in the school. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, in a school database, an index might contain all students in the school. | |
For example, in a school database, an index might contain information about all students in the school. |
_getting-started/intro.md
Outdated
|
||
## Clusters and nodes | ||
|
||
OpenSearch is designed to be a distributed search engine. OpenSearch can run on one or more _nodes_---servers that store your data and process search requests. An OpenSearch *cluster* is a collection of nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSearch is designed to be a distributed search engine. OpenSearch can run on one or more _nodes_---servers that store your data and process search requests. An OpenSearch *cluster* is a collection of nodes. | |
OpenSearch is designed to be a distributed search engine, meaning that it can run on one or more _nodes_---servers that store your data and process search requests. An OpenSearch *cluster* is a collection of nodes. |
_getting-started/intro.md
Outdated
|
||
You can run OpenSearch locally on a laptop---its system requirements are minimal---but you can also scale a single cluster to hundreds of powerful machines in a data center. | ||
|
||
In a single-node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a single-node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. | |
In a single-node cluster, such as one deployed on a laptop, one machine has to perform every task: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might perform well when indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. |
_getting-started/search-data.md
Outdated
|
||
### Full-text search | ||
|
||
You can run a full-text search on fields mapped as `text`. By default, text fields are analyzed by the `default` analyzer. The analyzer splits text into terms and makes it lowercase. For more information about OpenSearch analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can run a full-text search on fields mapped as `text`. By default, text fields are analyzed by the `default` analyzer. The analyzer splits text into terms and makes it lowercase. For more information about OpenSearch analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/). | |
You can run a full-text search on fields mapped as `text`. By default, text fields are analyzed by the `default` analyzer. The analyzer splits text into terms and changes it to lowercase. For more information about OpenSearch analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/). |
_getting-started/search-data.md
Outdated
|
||
### Keyword search | ||
|
||
The `name` field contains the `name.keyword` subfield, which was added by OpenSearch automatically. You can try to search the `name.keyword` field in a manner similar to the previous request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `name` field contains the `name.keyword` subfield, which was added by OpenSearch automatically. You can try to search the `name.keyword` field in a manner similar to the previous request: | |
The `name` field contains the `name.keyword` subfield, which is added by OpenSearch automatically. If you search the `name.keyword` field in a manner similar to the previous request: |
_getting-started/search-data.md
Outdated
|
||
This request returns no hits because the `keyword` fields must be matched exactly. | ||
|
||
However, you can search for the exact text `John Doe`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, you can search for the exact text `John Doe`: | |
However, if you search for the exact text `John Doe`: |
_getting-started/search-data.md
Outdated
|
||
### Filters | ||
|
||
You can add a filter clause to your query for fields with exact values using a Boolean query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a filter clause to your query for fields with exact values using a Boolean query. | |
Using a Boolean query, you can add a filter clause to your query for fields with exact values |
_getting-started/search-data.md
Outdated
``` | ||
{% include copy-curl.html %} | ||
|
||
Range filters support specifying a range of values. For example, the following Boolean query searches for students whose GPA is greater than 3.6: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Range filters support specifying a range of values. For example, the following Boolean query searches for students whose GPA is greater than 3.6: | |
With range filters, you can specify a range of values. For example, the following Boolean query searches for students whose GPA is greater than 3.6: |
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Just a few minor comments/changes.
|
||
1. An update is received by a primary shard and is written to the shard's transaction log ([translog](#translog)). The translog is flushed to disk (followed by an fsync) before the update is acknowledged. This guarantees durability. | ||
1. The update is also passed to the Lucene index writer, which adds it to an in-memory buffer. | ||
1. On a [refresh operation](#refresh), the Lucene index writer flushes the in-memory buffers to disk (with each buffer becoming a new Lucene segment), and a new index reader is opened over the resulting segment files. The updates are now visible for search. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "over" the right preposition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the word that I've heard Lucene developers use, because an IndexReader is like a moving window providing a view "over" a set of segments.
Maybe "with" would make more sense to a casual reader? That doesn't sound quite right, though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll keep "over"
_getting-started/intro.md
Outdated
|
||
### Translog | ||
|
||
An indexing or bulk call responds when the documents have been written to the translog and the translog is flushed to disk, so the updates are durable. The updates will be visible from search requests until after a [refresh operation](#refresh). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "from" the right preposition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to" is probably better
Also, the word "not" is missing -- The updates will not be visible to search requests until after a refresh operation.
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
* First iteration Signed-off-by: Fanit Kolchina <[email protected]> * Add shard and node info Signed-off-by: Fanit Kolchina <[email protected]> * Communicate section additions Signed-off-by: Fanit Kolchina <[email protected]> * Change examples Signed-off-by: Fanit Kolchina <[email protected]> * Remove extraneous files Signed-off-by: Fanit Kolchina <[email protected]> * Update _getting-started/communicate.md Signed-off-by: kolchfa-aws <[email protected]> * Update _getting-started/intro.md Signed-off-by: kolchfa-aws <[email protected]> * Update _getting-started/intro.md Signed-off-by: kolchfa-aws <[email protected]> * Update _getting-started/intro.md Signed-off-by: kolchfa-aws <[email protected]> * Update _getting-started/search-data.md Signed-off-by: kolchfa-aws <[email protected]> * Apply suggestions from code review Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Apply suggestions from code review Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Tech review comments Signed-off-by: Fanit Kolchina <[email protected]> * Add link to compound query section Signed-off-by: Fanit Kolchina <[email protected]> * Added install types section Signed-off-by: Fanit Kolchina <[email protected]> * Remove further reading suggestions Signed-off-by: Fanit Kolchina <[email protected]> * Reorder sections Signed-off-by: Fanit Kolchina <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Update _getting-started/intro.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> * Fix links Signed-off-by: Fanit Kolchina <[email protected]> * Reword Signed-off-by: Fanit Kolchina <[email protected]> * Reword Signed-off-by: Fanit Kolchina <[email protected]> * Update _getting-started/intro.md Signed-off-by: kolchfa-aws <[email protected]> --------- Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Melissa Vagi <[email protected]> Co-authored-by: Nathan Bower <[email protected]> (cherry picked from commit 246bb44) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Closes #6533
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.