Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Blocks API and add additional Get document descriptions. #7836

Merged
merged 10 commits into from
Aug 8, 2024
94 changes: 85 additions & 9 deletions _api-reference/document-apis/get-documents.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,29 +11,28 @@
**Introduced 1.0**
{: .label .label-purple }

After adding a JSON document to your index, you can use the get document API operation to retrieve the document's information and data.
After adding a JSON document to your index, you can use the Get Document API operation to retrieve the document's information and data.

## Example

```json
GET sample-index1/_doc/1
```
{% include copy-curl.html %}

## Path and HTTP methods

Use the GET method to retrieve a document and its source or stored fields from a particular index. Use the HEAD method to verify that a document exists.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
GET <index>/_doc/<_id>
HEAD <index>/_doc/<_id>
```

Use `_source` to retrieve the document source or verify that it exists.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
GET <index>/_source/<_id>
HEAD <index>/_source/<_id>
```

## URL parameters
## Query parameters

All get document URL parameters are optional.
All query parameters are optional.

Parameter | Type | Description
:--- | :--- | :---
Expand All @@ -48,6 +47,83 @@
version | Integer | The version of the document to return, which must match the current version of the document.
version_type | Enum | Retrieves a specifically typed document. Available options are `external` (retrieve the document if the specified version number is greater than the document's current version) and `external_gte` (retrieve the document if the specified version number is greater than or equal to the document's current version). For example, to retrieve version 3 of a document, use `/_doc/1?version=3&version_type=external`.

### Real time

The Get Document API in OpenSearch operates in real time by default, which means that it retrieves the latest version of the document regardless of the index's refresh rate, or the rate at which new data becomes searchable. However, if you request stored fields (using the `stored_fields` parameter) for a document that has been updated but not yet refreshed, the Get Document API parses and analyzes the document's source to extract those stored fields.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

To disable the real-time behavior and retrieve the document based on the last refreshed state of the index, set the `realtime` parameter to `false`.

### Source filtering

By default, the Get Document API returns the entire contents of the `_source` field for the requested document. However, you can choose to exclude the `_source` field from the response by using the `_source` URL parameter and setting it to false, as shown in the following example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "false" be in code font?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
GET test-index/_doc/0?_source=false
```

#### `source` includes and excludes

Check failure on line 64 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L64

[OpenSearch.HeadingCapitalization] 'includes and excludes' is a heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.HeadingCapitalization] 'includes and excludes' is a heading and should be in sentence case.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 64, "column": 15}}}, "severity": "ERROR"}

If you only want to retrieve specific fields from the source, use the `_source_includes` or `_source_excludes` parameters to include or exclude particular fields, respectively. This can be beneficial for large documents, as retrieving only the required fields can reduce network overhead.

Check failure on line 66 in _api-reference/document-apis/get-documents.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _api-reference/document-apis/get-documents.md#L66

[OpenSearch.Spelling] Error: _source. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: _source. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_api-reference/document-apis/get-documents.md", "range": {"start": {"line": 66, "column": 95}}}, "severity": "ERROR"}
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Both parameters accept a comma-separated list of fields and wildcard expressions, as shown in the following example where any `_source` that contains `*.play` is included in the response, but excludes sources with the field `entities`:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
GET test-index/_doc/0?_source_includes=*.play&_source_excludes=entities
```

#### Shorter notation

If only want to include certain fields and don't need to exclude any, you can use a shorter notation by specifying the desired fields directly in the `_source` parameter:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
GET test-index/_doc/0?_source=*.id
```

### Routing

When indexing documents in OpenSearch, you can specify a `routing` value to control the shard assignment for the documents. If routing was used during indexing, you must provide the same routing value when retrieving the document using the Get Document API, as shown in the following example:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
GET test-index/_doc/1?routing=user1
```

This request retrieves the document with the ID `1`, but it uses the routing value "user1" to determine the shard where the document is stored. If the correct routing value is not specified, the Get Document API is not able to locate and fetch the requested document.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Preference

The Get Document API allows you to control which shard replica should handle the request. By default, the operation is randomly distributed across the available shard replicas.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

However, you can specify a preference to influence the replica selection. The preference can be set to one of the following values:

- `_local`: The operation tries to execute on a locally allocated shard replica, if possible. This can improve performance by reducing network overhead.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
- Custom (string) value: Specifying a custom string value ensures that requests with the same value are routed to the same set of shards. This consistency can be beneficial when dealing with shards in different refresh states, as it prevents "jumping values" that may occur when hitting shards with varying data visibility. A common practice is to use a web session ID or a user name as the custom value.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved


### Refresh

Set the `refresh` parameter to `true` to force a refresh of the relevant shard before running the Get Document API operation. This ensures that the latest data changes are made searchable and visible to the API. However, triggering a refresh should be done judiciously, as it can potentially impose a heavy load on the system and slow down indexing performance. It's recommended to carefully evaluate the trade-off between data freshness and system load before enabling the `refresh` parameter.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Distributed

When running the Get Document API, OpenSearch first calculates a hash value based on the document ID, which determines the specific shard ID where the document resides. The operation is then redirected to one of the replicas (including the primary shard and its replica shards) within that shard ID group, and the result is returned from that replica.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Having more replicas for a shard improves the scalability and performance of GET operations, as the load can be distributed across multiple replica shards. This means that the more replicas you have, the better scaling and throughput you can achieve for Get Document API requests.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Versioning support

Use the `version` parameter to retrieve a document only if its current version matches the specified version number. This can be useful for ensuring data consistency and preventing conflicts when working with versioned documents.

Internally, when a document is updated in OpenSearch, the original version is marked as deleted, and a new version of the document is added. However, the original version doesn't immediately disappear from the system. While you won't be able to access it through the Get Document API, OpenSearch handles the cleanup of deleted document versions in the background as you continue indexing new data.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Example request

The following example requests information about a document named `1`:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
GET sample-index1/_doc/1
```
{% include copy-curl.html %}


## Example response
```json
Expand Down
59 changes: 59 additions & 0 deletions _api-reference/index-apis/blocks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
layout: default
title: Blocks
parent: Index APIs
nav_order: 6
---

# Blocks
**Introduced 1.0**
{: .label .label-purple }

Use the Blocks API to limit certain operations on a specified index. The different types of blocks allow you to restrict write, read, or metadata operations on an index.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
For example, adding a `write` block through the API ensures that all shards of the index have properly accounted for the block before returning a successful response to the user. Any in-flight write operations to the index must have been completed before the `write` block takes effect.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Path and HTTP methods

```json
PUT /<index>/_block/<block>
```

## Path parameters

| Parameter | Data type | Description |
:--- | :--- | :---
| `index` | String | A comma-delimited list of index names. Wildcard expressions (`*`) are supported. To target all data streams and indexes in a cluster, use `_all` or `*`. Optional. |
| `<block>` | String | Specifies the type of block to apply to the index. Valid values are: <br> `metadata`: Disables all metadata changes, such as closing the index. <br> `read`: Disables any read operations. <br> `read_only`: Disables any write operations and metadata changes. <br> `write`: Disables write operations. However, metadata changes are still allowed. |

## Query parameters

The following table lists the available query parameters. All query parameters are optional.

| Parameter | Data type | Description |
| :--- | :--- | :--- |
| `ignore_unavailable` | Boolean | When `false`, the request returns an error when it targets a missing or closed index. Default is `false`.
| `allow_no_indices` | Boolean | When `false`, the Refresh Index API returns an error when a wildcard expression, index alias, or `_all` targets only closed or missing indexes, even when the request is made against open indexes. Default is `true`. |
| `expand_wildcards` | String | The type of index that the wildcard patterns can match. If the request targets data streams, this argument determines whether the wildcard expressions match any hidden data streams. Supports comma-separated values, such as `open,hidden`. Valid values are `all`, `open`, `closed`, `hidden`, and `none`. |
`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`.
`timeout` | Time | The amount of time to wait for the request to return. Default is `30s`. |

## Example request

The following example request disables any `write` operations made to the test index:

```json
PUT /test-index/_block/write
```

## Example response

```json
{
"acknowledged" : true,
"shards_acknowledged" : true,
"indices" : [ {
"name" : "test-index",
"blocked" : true
} ]
}
```
Loading