diff --git a/_about/intro.md b/_about/intro.md deleted file mode 100644 index ef1dc4977f..0000000000 --- a/_about/intro.md +++ /dev/null @@ -1,112 +0,0 @@ ---- -layout: default -title: Intro to OpenSearch -nav_order: 2 -permalink: /intro/ ---- - -# Introduction to OpenSearch - -OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indexes, boost fields, rank results by score, sort results by field, and aggregate results. - -Unsurprisingly, people often use search engines like OpenSearch as the backend for a search application---think [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#What_software_is_used_to_run_Wikipedia?) or an online store. It offers excellent performance and can scale up and down as the needs of the application grow or shrink. - -An equally popular, but less obvious use case is log analytics, in which you take the logs from an application, feed them into OpenSearch, and use the rich search and visualization functionality to identify issues. For example, a malfunctioning web server might throw a 500 error 0.5% of the time, which can be hard to notice unless you have a real-time graph of all HTTP status codes that the server has thrown in the past four hours. You can use [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/index/) to build these sorts of visualizations from data in OpenSearch. - - -## Clusters and nodes - -Its distributed design means that you interact with OpenSearch *clusters*. Each cluster is a collection of one or more *nodes*, servers that store your data and process search requests. - -You can run OpenSearch locally on a laptop---its system requirements are minimal---but you can also scale a single cluster to hundreds of powerful machines in a data center. - -In a single node cluster, such as a laptop, one machine has to do everything: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might be great at indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. For more information on setting node types, see [Cluster formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). - - -## Indexes and documents - -OpenSearch organizes data into *indexes*. Each index is a collection of JSON *documents*. If you have a set of raw encyclopedia articles or log lines that you want to add to OpenSearch, you must first convert them to [JSON](https://www.json.org/). A simple JSON document for a movie might look like this: - -```json -{ - "title": "The Wind Rises", - "release_date": "2013-07-20" -} -``` - -When you add the document to an index, OpenSearch adds some metadata, such as the unique document *ID*: - -```json -{ - "_index": "", - "_type": "_doc", - "_id": "", - "_version": 1, - "_source": { - "title": "The Wind Rises", - "release_date": "2013-07-20" - } -} -``` - -Indexes also contain mappings and settings: - -- A *mapping* is the collection of *fields* that documents in the index have. In this case, those fields are `title` and `release_date`. -- Settings include data like the index name, creation date, and number of shards. - -## Primary and replica shards - -OpenSearch splits indexes into *shards* for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into ten shards, each one 40 GB, OpenSearch can distribute the shards across ten nodes and work with each shard individually. - -By default, OpenSearch creates a *replica* shard for each *primary* shard. If you split your index into ten shards, for example, OpenSearch also creates ten replica shards. These replica shards act as backups in the event of a node failure---OpenSearch distributes replica shards to different nodes than their corresponding primary shards---but they also improve the speed and rate at which the cluster can process search requests. You might specify more than one replica per index for a search-heavy workload. - -Despite being a piece of an OpenSearch index, each shard is actually a full Lucene index---confusing, we know. This detail is important, though, because each instance of Lucene is a running process that consumes CPU and memory. More shards is not necessarily better. Splitting a 400 GB index into 1,000 shards, for example, would place needless strain on your cluster. A good rule of thumb is to keep shard size between 10--50 GB. - - -## REST API - -You interact with OpenSearch clusters using the REST API, which offers a lot of flexibility. You can use clients like [curl](https://curl.se/) or any programming language that can send HTTP requests. To add a JSON document to an OpenSearch index (i.e. index a document), you send an HTTP request: - -```json -PUT https://://_doc/ -{ - "title": "The Wind Rises", - "release_date": "2013-07-20" -} -``` - -To run a search for the document: - -```json -GET https://://_search?q=wind -``` - -To delete the document: - -```json -DELETE https://://_doc/ -``` - -You can change most OpenSearch settings using the REST API, modify indexes, check the health of the cluster, get statistics---almost everything. - -## Advanced concepts - -The following section describes more advanced OpenSearch concepts. - -### Translog - -Any index changes, such as document indexing or deletion, are written to disk during a Lucene commit. However, Lucene commits are expensive operations, so they cannot be performed after every change to the index. Instead, each shard records every indexing operation in a transaction log called _translog_. When a document is indexed, it is added to the memory buffer and recorded in the translog. After a process or host restart, any data in the in-memory buffer is lost. Recording the document in the translog ensures durability because the translog is written to disk. - -Frequent refresh operations write the documents in the memory buffer to a segment and then clear the memory buffer. Periodically, a [flush](#flush) performs a Lucene commit, which includes writing the segments to disk using `fsync`, purging the old translog, and starting a new translog. Thus, a translog contains all operations that have not yet been flushed. - -### Refresh - -Periodically, OpenSearch performs a _refresh_ operation, which writes the documents from the in-memory Lucene index to files. These files are not guaranteed to be durable because an `fsync` is not performed. A refresh makes documents available for search. - -### Flush - -A _flush_ operation persists the files to disk using `fsync`, ensuring durability. Flushing ensures that the data stored only in the translog is recorded in the Lucene index. OpenSearch performs a flush as needed to ensure that the translog does not grow too large. - -### Merge - -In OpenSearch, a shard is a Lucene index, which consists of _segments_ (or segment files). Segments store the indexed data and are immutable. Periodically, smaller segments are merged into larger ones. Merging reduces the overall number of segments on each shard, frees up disk space, and improves search performance. Eventually, segments reach a maximum size specified in the merge policy and are no longer merged into larger segments. The merge policy also specifies how often merges are performed. \ No newline at end of file diff --git a/_config.yml b/_config.yml index d9c0ee823f..6d6be4cd89 100644 --- a/_config.yml +++ b/_config.yml @@ -118,6 +118,9 @@ collections: dashboards-assistant: permalink: /:collection/:path/ output: true + getting-started: + permalink: /:collection/:path/ + output: true opensearch_collection: # Define the collections used in the theme @@ -125,6 +128,9 @@ opensearch_collection: about: name: About OpenSearch nav_fold: true + getting-started: + name: Getting started + nav_fold: true install-and-configure: name: Install and upgrade nav_fold: true @@ -196,6 +202,7 @@ opensearch_collection: developer-documentation: name: Developer documentation nav_fold: true + clients_collection: collections: diff --git a/_getting-started/communicate.md b/_getting-started/communicate.md new file mode 100644 index 0000000000..391bc9bef0 --- /dev/null +++ b/_getting-started/communicate.md @@ -0,0 +1,320 @@ +--- +layout: default +title: Communicate with OpenSearch +nav_order: 30 +--- + +# Communicate with OpenSearch + +You can communicate with OpenSearch using the REST API or one of the OpenSearch language clients. This page introduces the OpenSearch REST API. If you need to communicate with OpenSearch in your programming language, see the [Clients]({{site.url}}{{site.baseurl}}/clients/) section for a list of available clients. + +## OpenSearch REST API + +You interact with OpenSearch clusters using the REST API, which offers a lot of flexibility. Through the REST API, you can change most OpenSearch settings, modify indexes, check cluster health, get statistics---almost everything. You can use clients like [cURL](https://curl.se/) or any programming language that can send HTTP requests. + +You can send HTTP requests in your terminal or in the [Dev Tools console]({{site.url}}{{site.baseurl}}/dashboards/dev-tools/index-dev/) in OpenSearch Dashboards. + +### Sending requests in a terminal + +When sending cURL requests in a terminal, the request format varies depending on whether you're using the Security plugin. As an example, consider a request to the Cluster Health API. + +If you're not using the Security plugin, send the following request: + +```bash +curl -XGET "http://localhost:9200/_cluster/health" +``` +{% include copy.html %} + +If you're using the Security plugin, provide the username and password in the request: + +```bash +curl -X GET "http://localhost:9200/_cluster/health" -ku admin: +``` +{% include copy.html %} + +The default username is `admin`, and the password is set in your `docker-compose.yml` file in the `OPENSEARCH_INITIAL_ADMIN_PASSWORD=` setting. + +OpenSearch generally returns responses in a flat JSON format by default. For a human-readable response body, provide the `pretty` query parameter: + +```bash +curl -XGET "http://localhost:9200/_cluster/health?pretty" +``` +{% include copy.html %} + +For more information about `pretty` and other useful query parameters, see [Common REST parameters]({{site.url}}{{site.baseurl}}/opensearch/common-parameters/). + +For requests that contain a body, specify the `Content-Type` header and provide the request payload in the `-d` (data) option: + +```json +curl -XGET "http://localhost:9200/students/_search?pretty" -H 'Content-Type: application/json' -d' +{ + "query": { + "match_all": {} + } +}' +``` +{% include copy.html %} + +### Sending requests in Dev Tools + +The Dev Tools console in OpenSearch Dashboards uses a simpler syntax to format REST requests as compared to the cURL command. To send requests in Dev Tools, use the following steps: + +1. Access OpenSearch Dashboards by opening `http://localhost:5601/` in a web browser on the same host that is running your OpenSearch cluster. The default username is `admin`, and the password is set in your `docker-compose.yml` file in the `OPENSEARCH_INITIAL_ADMIN_PASSWORD=` setting. +1. On the top menu bar, go to **Management > Dev Tools**. +1. In the left pane of the console, enter the following request: + ```json + GET _cluster/health + ``` + {% include copy-curl.html %} +1. Choose the triangle icon on the upper right of the request to submit the query. You can also submit the request by pressing `Ctrl+Enter` (or `Cmd+Enter` for Mac users). To learn more about using the OpenSearch Dashboards console for submitting queries, see [Running queries in the console]({{site.url}}{{site.baseurl}}/dashboards/run-queries/). + +In the following sections, and in most of the OpenSearch documentation, requests are presented in the Dev Tools console format. + +## Indexing documents + +To add a JSON document to an OpenSearch index (that is, to _index_ a document), you send an HTTP request with the following header: + +```json +PUT https://://_doc/ +``` + +For example, to index a document representing a student, you can send the following request: + +```json +PUT /students/_doc/1 +{ + "name": "John Doe", + "gpa": 3.89, + "grad_year": 2022 +} +``` +{% include copy-curl.html %} + +Once you send the preceding request, OpenSearch creates an index called `students` and stores the ingested document in the index. If you don't provide an ID for your document, OpenSearch generates a document ID. In the preceding request, the document ID is specified as the student ID (`1`). + +To learn more about indexing, see [Managing indexes]({{site.url}}{{site.baseurl}}/im-plugin/). + +## Dynamic mapping + +When you index a document, OpenSearch infers the field types from the JSON types submitted in the document. This process is called _dynamic mapping_. For more information, see [Dynamic mapping]({{site.url}}{{site.baseurl}}/field-types/#dynamic-mapping). + +To view the inferred field types, send a request to the `_mapping` endpoint: + +```json +GET /students/_mapping +``` +{% include copy-curl.html %} + +OpenSearch responds with the field `type` for each field: + +```json +{ + "students": { + "mappings": { + "properties": { + "gpa": { + "type": "float" + }, + "grad_year": { + "type": "long" + }, + "name": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + } + } +} +``` + +OpenSearch mapped the numeric fields to the `float` and `long` types. Notice that OpenSearch mapped the `name` text field to `text` and added a `name.keyword` subfield mapped to `keyword`. Fields mapped to `text` are analyzed (lowercased and split into terms) and can be used for full-text search. Fields mapped to `keyword` are used for exact term search. + +OpenSearch mapped the `grad_year` field to `long`. If you want to map it to the `date` type instead, you need to [delete the index](#deleting-an-index) and then recreate it, explicitly specifying the mappings. For instructions on how to explicitly specify mappings, see [Index settings and mappings](#index-mappings-and-settings). + +## Searching for documents + +To run a search for the document, specify the index that you're searching and a query that will be used to match documents. The simplest query is the `match_all` query, which matches all documents in an index: + +```json +GET /students/_search +{ + "query": { + "match_all": {} + } +} +``` +{% include copy-curl.html %} + +OpenSearch returns the indexed document: + +```json +{ + "took": 12, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "students", + "_id": "1", + "_score": 1, + "_source": { + "name": "John Doe", + "gpa": 3.89, + "grad_year": 2022 + } + } + ] + } +} +``` + +For more information about search, see [Search your data]({{site.url}}{{site.baseurl}}/getting-started/search-data/). + +## Updating documents + +In OpenSearch, documents are immutable. However, you can update a document by retrieving it, updating its information, and reindexing it. You can update an entire document using the Index Document API, providing values for all existing and added fields in the document. For example, to update the `gpa` field and add an `address` field to the previously indexed document, send the following request: + +```json +PUT /students/_doc/1 +{ + "name": "John Doe", + "gpa": 3.91, + "grad_year": 2022, + "address": "123 Main St." +} +``` +{% include copy.html %} + +Alternatively, you can update parts of a document by calling the Update Document API: + +```json +POST /students/_update/1/ +{ + "doc": { + "gpa": 3.91, + "address": "123 Main St." + } +} +``` +{% include copy-curl.html %} + +For more information about partial document updates, see [Update Document API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/update-document/). + +## Deleting a document + +To delete a document, send a delete request and provide the document ID: + +```json +DELETE /students/_doc/1 +``` +{% include copy-curl.html %} + +## Deleting an index + +To delete an index, send the following request: + +```json +DELETE /students +``` +{% include copy-curl.html %} + +## Index mappings and settings + +OpenSearch indexes are configured with mappings and settings: + +- A _mapping_ is a collection of fields and the types of those fields. For more information, see [Mappings and field types]({{site.url}}{{site.baseurl}}/field-types/). +- _Settings_ include index data like the index name, creation date, and number of shards. For more information, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). + +You can specify the mappings and settings in one request. For example, the following request specifies the number of index shards and maps the `name` field to `text` and the `grad_year` field to `date`: + +```json +PUT /students +{ + "settings": { + "index.number_of_shards": 1 + }, + "mappings": { + "properties": { + "name": { + "type": "text" + }, + "grad_year": { + "type": "date" + } + } + } +} +``` +{% include copy-curl.html %} + +Now you can index the same document that you indexed in the previous section: + +```json +PUT /students/_doc/1 +{ + "name": "John Doe", + "gpa": 3.89, + "grad_year": 2022 +} +``` +{% include copy-curl.html %} + +To view the mappings for the index fields, send the following request: + +```json +GET /students/_mapping +``` +{% include copy-curl.html %} + +OpenSearch mapped the `name` and `grad_year` fields according to the specified types and inferred the field type for the `gpa` field: + +```json +{ + "students": { + "mappings": { + "properties": { + "gpa": { + "type": "float" + }, + "grad_year": { + "type": "date" + }, + "name": { + "type": "text" + } + } + } + } +} +``` + +Once a field is created, you cannot change its type. Changing a field type requires deleting the index and recreating it with the new mappings. +{: .note} + +## Further reading + +- For information about the OpenSearch REST API, see the [REST API reference]({{site.url}}{{site.baseurl}}/api-reference/). +- For information about OpenSearch language clients, see [Clients]({{site.url}}{{site.baseurl}}/clients/). +- For information about mappings, see [Mappings and field types]({{site.url}}{{site.baseurl}}/field-types/). +- For information about settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). + +## Next steps + +- See [Ingest data into OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/) to learn about ingestion options. \ No newline at end of file diff --git a/_getting-started/index.md b/_getting-started/index.md new file mode 100644 index 0000000000..b25587c522 --- /dev/null +++ b/_getting-started/index.md @@ -0,0 +1,38 @@ +--- +layout: default +title: Getting started +nav_order: 1 +has_children: true +has_toc: false +nav_exclude: true +permalink: /getting-started/ +--- + +# Getting started + +OpenSearch is a distributed search and analytics engine based on [Apache Lucene](https://lucene.apache.org/). After adding your data to OpenSearch, you can perform full-text searches on it with all of the features you might expect: search by field, search multiple indexes, boost fields, rank results by score, sort results by field, and aggregate results. + +Unsurprisingly, builders often use a search engine like OpenSearch as the backend for a search application---think [Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:FAQ/Technical#What_software_is_used_to_run_Wikipedia?) or an online store. It offers excellent performance and can scale up or down as the needs of the application grow or shrink. + +An equally popular, but less obvious use case is log analytics, in which you take the logs from an application, feed them into OpenSearch, and use the rich search and visualization functionality to identify issues. For example, a malfunctioning web server might throw a 500 error 0.5% of the time, which can be hard to notice unless you have a real-time graph of all HTTP status codes that the server has thrown in the past four hours. You can use [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/index/) to build these sorts of visualizations from data in OpenSearch. + +## Components + +OpenSearch is more than just the core engine. It also includes the following components: + +- [OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/index/): The OpenSearch data visualization UI. +- [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/): A server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. +- [Clients]({{site.url}}{{site.baseurl}}/clients/): Language APIs that let you communicate with OpenSearch in several popular programming languages. + +## Use cases + +OpenSearch supports a variety of use cases, for example: + +- [Observability]({{site.url}}{{site.baseurl}}/observing-your-data/): Visualize data-driven events by using Piped Processing Language (PPL) to explore, discover, and query data stored in OpenSearch. +- [Search]({{site.url}}{{site.baseurl}}/search-plugins/): Choose the best search method for your application, from regular lexical search to conversational search powered by machine learning (ML). +- [Machine learning]({{site.url}}{{site.baseurl}}/ml-commons-plugin/): Integrate ML models into your OpenSearch application. +- [Security analytics]({{site.url}}{{site.baseurl}}/security-analytics/): Investigate, detect, analyze, and respond to security threats that can jeopardize organizational success and online operations. + +## Next steps + +- See [Introduction to OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/intro/) to learn about essential OpenSearch concepts. \ No newline at end of file diff --git a/_getting-started/ingest-data.md b/_getting-started/ingest-data.md new file mode 100644 index 0000000000..73cf1502f7 --- /dev/null +++ b/_getting-started/ingest-data.md @@ -0,0 +1,111 @@ +--- +layout: default +title: Ingest data +nav_order: 40 +--- + +# Ingest your data into OpenSearch + +There are several ways to ingest data into OpenSearch: + +- Ingest individual documents. For more information, see [Indexing documents]({{site.url}}{{site.baseurl}}/getting-started/communicate/#indexing-documents). +- Index multiple documents in bulk. For more information, see [Bulk indexing](#bulk-indexing). +- Use Data Prepper---an OpenSearch server-side data collector that can enrich data for downstream analysis and visualization. For more information, see [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/). +- Use other ingestion tools. For more information, see [OpenSearch tools]({{site.url}}{{site.baseurl}}/tools/). + +## Bulk indexing + +To index documents in bulk, you can use the [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/). For example, if you want to index several documents into the `students` index, send the following request: + +```json +POST _bulk +{ "create": { "_index": "students", "_id": "2" } } +{ "name": "Jonathan Powers", "gpa": 3.85, "grad_year": 2025 } +{ "create": { "_index": "students", "_id": "3" } } +{ "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 } +``` +{% include copy-curl.html %} + +## Experiment with sample data + +OpenSearch provides a fictitious e-commerce dataset that you can use to experiment with REST API requests and OpenSearch Dashboards visualizations. You can create an index and define field mappings by downloading the corresponding dataset and mapping files. + +### Create a sample index + +Use the following steps to create a sample index and define field mappings for the document fields: + +1. Download [ecommerce-field_mappings.json](https://github.com/opensearch-project/documentation-website/blob/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce-field_mappings.json). This file defines a [mapping]({{site.url}}{{site.baseurl}}/opensearch/mappings/) for the sample data you will use. + + To use cURL, send the following request: + + ```bash + curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce-field_mappings.json + ``` + {% include copy.html %} + + To use wget, send the following request: + + ``` + wget https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce-field_mappings.json + ``` + {% include copy.html %} + +1. Download [ecommerce.json](https://github.com/opensearch-project/documentation-website/blob/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json). This file contains the index data formatted so that it can be ingested by the Bulk API: + + To use cURL, send the following request: + + ```bash + curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json + ``` + {% include copy.html %} + + To use wget, send the following request: + + ``` + wget https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json + ``` + {% include copy.html %} + +1. Define the field mappings provided in the mapping file: + ```bash + curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce" -ku admin: --data-binary "@ecommerce-field_mappings.json" + ``` + {% include copy.html %} + +1. Upload the documents using the Bulk API: + + ```bash + curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce/_bulk" -ku admin: --data-binary "@ecommerce.json" + ``` + {% include copy.html %} + +### Query the data + +Query the data using the Search API. The following query searches for documents in which `customer_first_name` is `Sonya`: + +```json +GET ecommerce/_search +{ + "query": { + "match": { + "customer_first_name": "Sonya" + } + } +} +``` +{% include copy-curl.html %} + +### Visualize the data + +To learn how to use OpenSearch Dashboards to visualize the data, see the [OpenSearch Dashboards quickstart guide]({{site.url}}{{site.baseurl}}/dashboards/quickstart/). + +## Further reading + +- For information about Data Prepper, see [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/). +- For information about ingestion tools, see [OpenSearch tools]({{site.url}}{{site.baseurl}}/tools/). +- For information about OpenSearch Dashboards, see [OpenSearch Dashboards quickstart guide]({{site.url}}{{site.baseurl}}/dashboards/quickstart/). +- For information about bulk indexing, see [Bulk API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/). + +## Next steps + +- See [Search your data]({{site.url}}{{site.baseurl}}/getting-started/search-data/) to learn about search options. \ No newline at end of file diff --git a/_getting-started/intro.md b/_getting-started/intro.md new file mode 100644 index 0000000000..272d8d6981 --- /dev/null +++ b/_getting-started/intro.md @@ -0,0 +1,161 @@ +--- +layout: default +title: Intro to OpenSearch +nav_order: 2 +has_math: true +redirect_from: + - /intro/ +--- + +# Introduction to OpenSearch + +OpenSearch is a distributed search and analytics engine that supports various use cases, from implementing a search box on a website to analyzing security data for threat detection. The term _distributed_ means that you can run OpenSearch on multiple computers. _Search and analytics_ means that you can search and analyze your data once you ingest it into OpenSearch. No matter your type of data, you can store and analyze it using OpenSearch. + +## Document + +A _document_ is a unit that stores information (text or structured data). In OpenSearch, documents are stored in [JSON](https://www.json.org/) format. + +You can think of a document in several ways: + +- In a database of students, a document might represent one student. +- When you search for information, OpenSearch returns documents related to your search. +- A document represents a row in a traditional database. + +For example, in a school database, a document might represent one student and contain the following data. + +ID | Name | GPA | Graduation year | +:--- | :--- | :--- | :--- | +1 | John Doe | 3.89 | 2022 | + +Here is what this document looks like in JSON format: + +```json +{ + "name": "John Doe", + "gpa": 3.89, + "grad_year": 2022 +} +``` + +You'll learn about how document IDs are assigned in [Indexing documents]({{site.url}}{{site.baseurl}}/getting-started/communicate/#indexing-documents). + +## Index + +An _index_ is a collection of documents. + +You can think of an index in several ways: + +- In a database of students, an index represents all students in the database. +- When you search for information, you query data contained in an index. +- An index represents a database table in a traditional database. + +For example, in a school database, an index might contain all students in the school. + +ID | Name | GPA | Graduation year +:--- | :--- | :--- | :--- +1 | John Doe | 3.89 | 2022 +2 | Jonathan Powers | 3.85 | 2025 +3 | Jane Doe | 3.52 | 2024 + +## Clusters and nodes + +OpenSearch is designed to be a distributed search engine, meaning that it can run on one or more _nodes_---servers that store your data and process search requests. An OpenSearch *cluster* is a collection of nodes. + +You can run OpenSearch locally on a laptop---its system requirements are minimal---but you can also scale a single cluster to hundreds of powerful machines in a data center. + +In a single-node cluster, such as one deployed on a laptop, one machine has to perform every task: manage the state of the cluster, index and search data, and perform any preprocessing of data prior to indexing it. As a cluster grows, however, you can subdivide responsibilities. Nodes with fast disks and plenty of RAM might perform well when indexing and searching data, whereas a node with plenty of CPU power and a tiny disk could manage cluster state. + +In each cluster, there is an elected _cluster manager_ node, which orchestrates cluster-level operations, such as creating an index. Nodes communicate with each other, so if your request is routed to a node, that node sends requests to other nodes, gathers the nodes' responses, and returns the final response. + +For more information about other node types, see [Cluster formation]({{site.url}}{{site.baseurl}}/opensearch/cluster/). + +## Shards + +OpenSearch splits indexes into _shards_. Each shard stores a subset of all documents in an index, as shown in the following image. + +An index is split into shards + +Shards are used for even distribution across nodes in a cluster. For example, a 400 GB index might be too large for any single node in your cluster to handle, but split into 10 shards of 40 GB each, OpenSearch can distribute the shards across 10 nodes and manage each shard individually. Consider a cluster with 2 indexes: index 1 and index 2. Index 1 is split into 2 shards, and index 2 is split into 4 shards. The shards are distributed across nodes 1 and 2, as shown in the following image. + +A cluster containing two indexes and two nodes + +Despite being one piece of an OpenSearch index, each shard is actually a full Lucene index. This detail is important because each instance of Lucene is a running process that consumes CPU and memory. More shards is not necessarily better. Splitting a 400 GB index into 1,000 shards, for example, would unnecessarily strain your cluster. A good rule of thumb is to limit shard size to 10--50 GB. + +## Primary and replica shards + +In OpenSearch, a shard may be either a _primary_ (original) shard or a _replica_ (copy) shard. By default, OpenSearch creates a replica shard for each primary shard. Thus, if you split your index into 10 shards, OpenSearch creates 10 replica shards. For example, consider the cluster described in the previous section. If you add 1 replica for each shard of each index in the cluster, your cluster will contain a total of 2 shards and 2 replicas for index 1 and 4 shards and 4 replicas for index 2, as shown in the following image. + +A cluster containing two indexes with one replica shard for each shard in the index + +These replica shards act as backups in the event of a node failure---OpenSearch distributes replica shards to different nodes than their corresponding primary shards---but they also improve the speed at which the cluster processes search requests. You might specify more than one replica per index for a search-heavy workload. + +## Inverted index + +An OpenSearch index uses a data structure called an _inverted index_. An inverted index maps words to the documents in which they occur. For example, consider an index containing the following two documents: + +- Document 1: "Beauty is in the eye of the beholder" +- Document 2: "Beauty and the beast" + +An inverted index for such an index maps the words to the documents in which they occur: + +Word | Document +:--- | :--- +beauty | 1, 2 +is | 1 +in | 1 +the | 1, 2 +eye | 1 +of | 1 +the | 1 +beholder | 1 +and | 2 +beast | 2 + +In addition to the document ID, OpenSearch stores the position of the word within the document for running phrase queries, where words must appear next to each other. + +## Relevance + +When you search for a document, OpenSearch matches the words in the query to the words in the documents. For example, if you search the index described in the previous section for the word `beauty`, OpenSearch will return documents 1 and 2. Each document is assigned a _relevance score_ that tells you how well the document matched the query. + +Individual words in a search query are called search _terms_. Each search term is scored according to the following rules: + +1. A search term that occurs more frequently in a document will tend to be scored higher. A document about dogs that uses the word `dog` many times is likely more relevant than a document that contains the word `dog` fewer times. This is the _term frequency_ component of the score. + +1. A search term that occurs in more documents will tend to be scored lower. A query for the terms `blue` and `axolotl` should prefer documents that contain `axolotl` over the likely more common word `blue`. This is the _inverse document frequency_ component of the score. + +1. A match on a longer document should tend to be scored lower than a match on a shorter document. A document that contains a full dictionary would match on any word but is not very relevant to any particular word. This corresponds to the _length normalization_ component of the score. + +OpenSearch uses the BM25 ranking algorithm to calculate document relevance scores and then returns the results sorted by relevance. To learn more, see [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25). + +## Advanced concepts + +The following section describes more advanced OpenSearch concepts. + +### Update lifecycle + +The lifecycle of an update operation consists of the following steps: + +1. An update is received by a primary shard and is written to the shard's transaction log ([translog](#translog)). The translog is flushed to disk (followed by an fsync) before the update is acknowledged. This guarantees durability. +1. The update is also passed to the Lucene index writer, which adds it to an in-memory buffer. +1. On a [refresh operation](#refresh), the Lucene index writer flushes the in-memory buffers to disk (with each buffer becoming a new Lucene segment), and a new index reader is opened over the resulting segment files. The updates are now visible for search. +1. On a [flush operation](#flush), the shard fsyncs the Lucene segments. Because the segment files are a durable representation of the updates, the translog is no longer needed to provide durability, so the updates can be purged from the translog. + +### Translog + +An indexing or bulk call responds when the documents have been written to the translog and the translog is flushed to disk, so the updates are durable. The updates will not be visible to search requests until after a [refresh operation](#refresh). + +### Refresh + +Periodically, OpenSearch performs a _refresh_ operation, which writes the documents from the in-memory Lucene index to files. These files are not guaranteed to be durable because an `fsync` is not performed. A refresh makes documents available for search. + +### Flush + +A _flush_ operation persists the files to disk using `fsync`, ensuring durability. Flushing ensures that the data stored only in the translog is recorded in the Lucene index. OpenSearch performs a flush as needed to ensure that the translog does not grow too large. + +### Merge + +In OpenSearch, a shard is a Lucene index, which consists of _segments_ (or segment files). Segments store the indexed data and are immutable. Periodically, smaller segments are merged into larger ones. Merging reduces the overall number of segments on each shard, frees up disk space, and improves search performance. Eventually, segments reach a maximum size specified in the merge policy and are no longer merged into larger segments. The merge policy also specifies how often merges are performed. + +## Next steps + +- Learn how to install OpenSearch within minutes in [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/). \ No newline at end of file diff --git a/_about/quickstart.md b/_getting-started/quickstart.md similarity index 58% rename from _about/quickstart.md rename to _getting-started/quickstart.md index 5c7da2950e..5ef783959a 100644 --- a/_about/quickstart.md +++ b/_getting-started/quickstart.md @@ -1,13 +1,13 @@ --- layout: default -title: Quickstart +title: Installation quickstart nav_order: 3 -permalink: /quickstart/ redirect_from: - /opensearch/install/quickstart/ + - /quickstart/ --- -# Quickstart +# Installation quickstart Get started using OpenSearch and OpenSearch Dashboards by deploying your containers with [Docker](https://www.docker.com/). Before proceeding, you need to [get Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://github.com/docker/compose) installed on your local machine. @@ -18,33 +18,63 @@ The Docker Compose commands used in this guide are written with a hyphen (for ex You'll need a special file, called a Compose file, that Docker Compose uses to define and create the containers in your cluster. The OpenSearch Project provides a sample Compose file that you can use to get started. Learn more about working with Compose files by reviewing the official [Compose specification](https://docs.docker.com/compose/compose-file/). -1. Before running OpenSearch on your machine, you should disable memory paging and swapping performance on the host to improve performance and increase the number of memory maps available to OpenSearch. See [important system settings]({{site.url}}{{site.baseurl}}/opensearch/install/important-settings/) for more information. +1. Before running OpenSearch on your machine, you should disable memory paging and swapping performance on the host to improve performance and increase the number of memory maps available to OpenSearch. + + Disable memory paging and swapping: + ```bash - # Disable memory paging and swapping. sudo swapoff -a + ``` + {% include copy.html %} + + Edit the sysctl config file that defines the host's max map count: - # Edit the sysctl config file that defines the host's max map count. + ```bash sudo vi /etc/sysctl.conf + ``` + {% include copy.html %} - # Set max map count to the recommended value of 262144. + Set max map count to the recommended value of `262144`: + + ```bash vm.max_map_count=262144 + ``` + {% include copy.html %} - # Reload the kernel parameters. + Reload the kernel parameters: + + ``` sudo sysctl -p ``` + {% include copy.html %} + + For more information, see [important system settings]({{site.url}}{{site.baseurl}}/opensearch/install/important-settings/). + 1. Download the sample Compose file to your host. You can download the file with command line utilities like `curl` and `wget`, or you can manually copy [docker-compose.yml](https://github.com/opensearch-project/documentation-website/blob/{{site.opensearch_major_minor_version}}/assets/examples/docker-compose.yml) from the OpenSearch Project documentation-website repository using a web browser. + + To use cURL, send the following request: + ```bash - # Using cURL: curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/docker-compose.yml + ``` + {% include copy.html %} + + To use wget, send the following request: - # Using wget: + ``` wget https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/docker-compose.yml ``` -1. In your terminal application, navigate to the directory containing the `docker-compose.yml` file you just downloaded, and run the following command to create and start the cluster as a background process. + {% include copy.html %} + +1. In your terminal application, navigate to the directory containing the `docker-compose.yml` file you downloaded and run the following command to create and start the cluster as a background process: + ```bash docker-compose up -d ``` + {% include copy.html %} + 1. Confirm that the containers are running with the command `docker-compose ps`. You should see an output like the following: + ```bash $ docker-compose ps NAME COMMAND SERVICE STATUS PORTS @@ -52,11 +82,16 @@ You'll need a special file, called a Compose file, that Docker Compose uses to d opensearch-node1 "./opensearch-docker…" opensearch-node1 running 0.0.0.0:9200->9200/tcp, 9300/tcp, 0.0.0.0:9600->9600/tcp, 9650/tcp opensearch-node2 "./opensearch-docker…" opensearch-node2 running 9200/tcp, 9300/tcp, 9600/tcp, 9650/tcp ``` -1. Query the OpenSearch REST API to verify that the service is running. You should use `-k` (also written as `--insecure`) to disable hostname checking because the default security configuration uses demo certificates. Use `-u` to pass the default username and password (`admin:`). + +1. Query the OpenSearch REST API to verify that the service is running. You should use `-k` (also written as `--insecure`) to disable hostname checking because the default security configuration uses demo certificates. Use `-u` to pass the default username and password (`admin:`): + ```bash curl https://localhost:9200 -ku admin: ``` - Sample response: + {% include copy.html %} + + The response confirms that the installation was successful: + ```json { "name" : "opensearch-node1", @@ -78,64 +113,6 @@ You'll need a special file, called a Compose file, that Docker Compose uses to d ``` 1. Explore OpenSearch Dashboards by opening `http://localhost:5601/` in a web browser on the same host that is running your OpenSearch cluster. The default username is `admin` and the default password is set in your `docker-compose.yml` file in the `OPENSEARCH_INITIAL_ADMIN_PASSWORD=` setting. -## Create an index and field mappings using sample data - -Create an index and define field mappings using a dataset provided by the OpenSearch Project. The same fictitious e-commerce data is also used for sample visualizations in OpenSearch Dashboards. To learn more, see [Getting started with OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/index/). - -1. Download [ecommerce-field_mappings.json](https://github.com/opensearch-project/documentation-website/blob/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce-field_mappings.json). This file defines a [mapping]({{site.url}}{{site.baseurl}}/opensearch/mappings/) for the sample data you will use. - ```bash - # Using cURL: - curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce-field_mappings.json - - # Using wget: - wget https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce-field_mappings.json - ``` -1. Download [ecommerce.json](https://github.com/opensearch-project/documentation-website/blob/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json). This file contains the index data formatted so that it can be ingested by the bulk API. To learn more, see [index data]({{site.url}}{{site.baseurl}}/opensearch/index-data/) and [Bulk]({{site.url}}{{site.baseurl}}/api-reference/document-apis/bulk/). - ```bash - # Using cURL: - curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json - - # Using wget: - wget https://raw.githubusercontent.com/opensearch-project/documentation-website/{{site.opensearch_major_minor_version}}/assets/examples/ecommerce.json - ``` -1. Define the field mappings with the mapping file. - ```bash - curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce" -ku admin: --data-binary "@ecommerce-field_mappings.json" - ``` -1. Upload the index to the bulk API. - ```bash - curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce/_bulk" -ku admin: --data-binary "@ecommerce.json" - ``` -1. Query the data using the search API. The following command submits a query that will return documents where `customer_first_name` is `Sonya`. - ```bash - curl -H 'Content-Type: application/json' -X GET "https://localhost:9200/ecommerce/_search?pretty=true" -ku admin: -d' {"query":{"match":{"customer_first_name":"Sonya"}}}' - ``` - Queries submitted to the OpenSearch REST API will generally return a flat JSON by default. For a human readable response body, use the query parameter `pretty=true`. For more information about `pretty` and other useful query parameters, see [Common REST parameters]({{site.url}}{{site.baseurl}}/opensearch/common-parameters/). -1. Access OpenSearch Dashboards by opening `http://localhost:5601/` in a web browser on the same host that is running your OpenSearch cluster. The default username is `admin` and the password is set in your `docker-compose.yml` file in the `OPENSEARCH_INITIAL_ADMIN_PASSWORD=` setting. -1. On the top menu bar, go to **Management > Dev Tools**. -1. In the left pane of the console, enter the following: - ```json - GET ecommerce/_search - { - "query": { - "match": { - "customer_first_name": "Sonya" - } - } - } - ``` -1. Choose the triangle icon at the top right of the request to submit the query. You can also submit the request by pressing `Ctrl+Enter` (or `Cmd+Enter` for Mac users). To learn more about using the OpenSearch Dashboards console for submitting queries, see [Running queries in the console]({{site.url}}{{site.baseurl}}/dashboards/run-queries/). - -## Next steps - -You successfully deployed your own OpenSearch cluster with OpenSearch Dashboards and added some sample data. Now you're ready to learn about configuration and functionality in more detail. Here are a few recommendations on where to begin: -- [About the Security plugin]({{site.url}}{{site.baseurl}}/security/index/) -- [OpenSearch configuration]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/) -- [OpenSearch plugin installation]({{site.url}}{{site.baseurl}}/opensearch/install/plugins/) -- [Getting started with OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/index/) -- [OpenSearch tools]({{site.url}}{{site.baseurl}}/tools/index/) -- [Index APIs]({{site.url}}{{site.baseurl}}/api-reference/index-apis/index/) - ## Common issues Review these common issues and suggested solutions if your containers fail to start or exit unexpectedly. @@ -163,3 +140,18 @@ opensearch-node1 | ERROR: [1] bootstrap checks failed opensearch-node1 | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] opensearch-node1 | ERROR: OpenSearch did not exit normally - check the logs at /usr/share/opensearch/logs/opensearch-cluster.log ``` + +## Other installation types + +In addition to Docker, you can install OpenSearch on various Linux distributions and on Windows. For all available installation guides, see [Install and upgrade OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/). + +## Further reading + +You successfully deployed your own OpenSearch cluster with OpenSearch Dashboards and added some sample data. Now you're ready to learn about configuration and functionality in more detail. Here are a few recommendations on where to begin: +- [About the Security plugin]({{site.url}}{{site.baseurl}}/security/index/) +- [OpenSearch configuration]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/) +- [OpenSearch plugin installation]({{site.url}}{{site.baseurl}}/opensearch/install/plugins/) + +## Next steps + +- See [Communicate with OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/communicate/) to learn about how to send requests to OpenSearch. diff --git a/_getting-started/search-data.md b/_getting-started/search-data.md new file mode 100644 index 0000000000..c6970e7e7b --- /dev/null +++ b/_getting-started/search-data.md @@ -0,0 +1,446 @@ +--- +layout: default +title: Search your data +nav_order: 50 +--- + +# Search your data + +In OpenSearch, there are several ways to search data: + +- [Query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/query-dsl/index/): The primary OpenSearch query language, which you can use to create complex, fully customizable queries. +- [Query string query language]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/): A scaled-down query language that you can use in a query parameter of a search request or in OpenSearch Dashboards. +- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/index/): A traditional query language that bridges the gap between traditional relational database concepts and the flexibility of OpenSearch’s document-oriented data storage. +- [Piped Processing Language (PPL)]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/): The primary language used for observability in OpenSearch. PPL uses a pipe syntax that chains commands into a query. +- [Dashboards Query Language (DQL)]({{site.url}}{{site.baseurl}}/dashboards/dql/): A simple text-based query language for filtering data in OpenSearch Dashboards. + +## Prepare the data + +For this tutorial, you'll need to index student data if you haven't done so already. You can start by deleting the `students` index (`DELETE /students`) and then sending the following bulk request: + +```json +POST _bulk +{ "create": { "_index": "students", "_id": "1" } } +{ "name": "John Doe", "gpa": 3.89, "grad_year": 2022} +{ "create": { "_index": "students", "_id": "2" } } +{ "name": "Jonathan Powers", "gpa": 3.85, "grad_year": 2025 } +{ "create": { "_index": "students", "_id": "3" } } +{ "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 } +``` +{% include copy-curl.html %} + +## Retrieve all documents in an index + +To retrieve all documents in an index, send the following request: + +```json +GET /students/_search +``` +{% include copy-curl.html %} + +The preceding request is equivalent to the `match_all` query, which matches all documents in an index: + +```json +GET /students/_search +{ + "query": { + "match_all": {} + } +} +``` +{% include copy-curl.html %} + +OpenSearch returns the matching documents: + +```json +{ + "took": 12, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 3, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "students", + "_id": "1", + "_score": 1, + "_source": { + "name": "John Doe", + "gpa": 3.89, + "grad_year": 2022 + } + }, + { + "_index": "students", + "_id": "2", + "_score": 1, + "_source": { + "name": "Jonathan Powers", + "gpa": 3.85, + "grad_year": 2025 + } + }, + { + "_index": "students", + "_id": "3", + "_score": 1, + "_source": { + "name": "Jane Doe", + "gpa": 3.52, + "grad_year": 2024 + } + } + ] + } +} +``` + +## Response fields + +The preceding response contains the following fields. + + +### took + + +The `took` field contains the amount of time the query took to run, in milliseconds. + + +### timed_out + + +This field indicates whether the request timed out. If a request timed out, then OpenSearch returns the results that were gathered before the timeout. You can set the desired timeout value by providing the `timeout` query parameter: + +```json +GET /students/_search?timeout=20ms +``` +{% include copy-curl.html %} + + +### _shards + + +The `_shards` object specifies the total number of shards on which the query ran as well as the number of shards that succeeded or failed. A shard may fail if the shard itself and all its replicas are unavailable. If any of the involved shards fail, OpenSearch continues to run the query on the remaining shards. + + +### hits + + +The `hits` object contains the total number of matching documents and the documents themselves (listed in the `hits` array). Each matching document contains the `_index` and `_id` fields as well as the `_source` field, which contains the complete originally indexed document. + +Each document is given a relevance score in the `_score` field. Because you ran a `match_all` search, all document scores are set to `1` (there is no difference in their relevance). The `max_score` field contains the highest score of any matching document. + +## Query string queries + +Query string queries are lightweight but powerful. You can send a query string query as a `q` query parameter. For example, the following query searches for students with the name `john`: + +```json +GET /students/_search?q=name:john +``` +{% include copy-curl.html %} + +OpenSearch returns the matching document: + +```json +{ + "took": 18, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 0.9808291, + "hits": [ + { + "_index": "students", + "_id": "1", + "_score": 0.9808291, + "_source": { + "name": "John Doe", + "grade": 12, + "gpa": 3.89, + "grad_year": 2022, + "future_plans": "John plans to be a computer science major" + } + } + ] + } +} +``` + +For more information about query string syntax, see [Query string query language]({{site.url}}{{site.baseurl}}/query-dsl/full-text/query-string/). + +## Query DSL + +Using Query DSL, you can create more complex and customized queries. + +### Full-text search + +You can run a full-text search on fields mapped as `text`. By default, text fields are analyzed by the `default` analyzer. The analyzer splits text into terms and changes it to lowercase. For more information about OpenSearch analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/). + +For example, the following query searches for students with the name `john`: + +```json +GET /students/_search +{ + "query": { + "match": { + "name": "john" + } + } +} +``` +{% include copy-curl.html %} + +The response contains the matching document: + +```json +{ + "took": 13, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 0.9808291, + "hits": [ + { + "_index": "students", + "_id": "1", + "_score": 0.9808291, + "_source": { + "name": "John Doe", + "gpa": 3.89, + "grad_year": 2022 + } + } + ] + } +} +``` + +Notice that the query text is lowercase while the text in the field is not, but the query still returns the matching document. + +You can reorder the terms in the search string. For example, the following query searches for `doe john`: + +```json +GET /students/_search +{ + "query": { + "match": { + "name": "doe john" + } + } +} +``` +{% include copy-curl.html %} + +The response contains two matching documents: + +```json +{ + "took": 14, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 2, + "relation": "eq" + }, + "max_score": 1.4508327, + "hits": [ + { + "_index": "students", + "_id": "1", + "_score": 1.4508327, + "_source": { + "name": "John Doe", + "gpa": 3.89, + "grad_year": 2022 + } + }, + { + "_index": "students", + "_id": "3", + "_score": 0.4700036, + "_source": { + "name": "Jane Doe", + "gpa": 3.52, + "grad_year": 2024 + } + } + ] + } +} +``` + +The match query type uses `OR` as an operator by default, so the query is functionally `doe OR john`. Both `John Doe` and `Jane Doe` matched the word `doe`, but `John Doe` is scored higher because it also matched `john`. + +### Keyword search + +The `name` field contains the `name.keyword` subfield, which is added by OpenSearch automatically. If you search the `name.keyword` field in a manner similar to the previous request: + +```json +GET /students/_search +{ + "query": { + "match": { + "name.keyword": "john" + } + } +} +``` +{% include copy-curl.html %} + +Then the request returns no hits because the `keyword` fields must exactly match. + +However, if you search for the exact text `John Doe`: + +```json +GET /students/_search +{ + "query": { + "match": { + "name.keyword": "John Doe" + } + } +} +``` +{% include copy-curl.html %} + +OpenSearch returns the matching document: + +```json +{ + "took": 19, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 0.9808291, + "hits": [ + { + "_index": "students", + "_id": "1", + "_score": 0.9808291, + "_source": { + "name": "John Doe", + "gpa": 3.89, + "grad_year": 2022 + } + } + ] + } +} +``` + +### Filters + +Using a Boolean query, you can add a filter clause to your query for fields with exact values + +Term filters match specific terms. For example, the following Boolean query searches for students whose graduation year is 2022: + +```json +GET students/_search +{ + "query": { + "bool": { + "filter": [ + { "term": { "grad_year": 2022 }} + ] + } + } +} +``` +{% include copy-curl.html %} + +With range filters, you can specify a range of values. For example, the following Boolean query searches for students whose GPA is greater than 3.6: + +```json +GET students/_search +{ + "query": { + "bool": { + "filter": [ + { "range": { "gpa": { "gt": 3.6 }}} + ] + } + } +} +``` +{% include copy-curl.html %} + +For more information about filters, see [Query and filter context]({{site.url}}{{site.baseurl}}/query-dsl/query-filter-context/). + +### Compound queries + +A compound query lets you combine multiple query or filter clauses. A Boolean query is an example of a compound query. + +For example, to search for students whose name matches `doe` and filter by graduation year and GPA, use the following request: + +```json +GET students/_search +{ + "query": { + "bool": { + "must": [ + { + "match": { + "name": "doe" + } + }, + { "range": { "gpa": { "gte": 3.6, "lte": 3.9 } } }, + { "term": { "grad_year": 2022 }} + ] + } + } +} +``` +{% include copy-curl.html %} + +For more information about Boolean and other compound queries, see [Compound queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/index/). + +## Search methods + +Along with the traditional full-text search described in this tutorial, OpenSearch supports a range of machine learning (ML)-powered search methods, including k-NN, semantic, multimodal, sparse, hybrid, and conversational search. For information about all OpenSearch-supported search methods, see [Search]({{site.url}}{{site.baseurl}}/search-plugins/). + +## Next steps + +- For information about available query types, see [Query DSL]({{site.url}}{{site.baseurl}}/query-dsl/index/). +- For information about available search methods, see [Search]({{site.url}}{{site.baseurl}}/search-plugins/). \ No newline at end of file diff --git a/images/intro/cluster-replicas.png b/images/intro/cluster-replicas.png new file mode 100644 index 0000000000..3462406b98 Binary files /dev/null and b/images/intro/cluster-replicas.png differ diff --git a/images/intro/cluster.png b/images/intro/cluster.png new file mode 100644 index 0000000000..300cf41ecc Binary files /dev/null and b/images/intro/cluster.png differ diff --git a/images/intro/index-shard.png b/images/intro/index-shard.png new file mode 100644 index 0000000000..f2663d2e95 Binary files /dev/null and b/images/intro/index-shard.png differ