Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateless real-time GET #93976

Merged
merged 5 commits into from
May 25, 2023
Merged

Stateless real-time GET #93976

merged 5 commits into from
May 25, 2023

Conversation

pxsalehi
Copy link
Member

@pxsalehi pxsalehi commented Feb 21, 2023

For real-time get on Stateless, we'd need to first check the indexing shard whether it has
the document in its Translog, if not we might have to wait on the search shard and then
handle the GET locally.

Relates ES-5537

@pxsalehi pxsalehi added WIP :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Feb 21, 2023
@pxsalehi pxsalehi changed the title Defer real-time GET to unpromotable shard copies if necessary Defer real-time GET to promotable shard copies if necessary Feb 21, 2023
@pxsalehi pxsalehi added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 21, 2023
@pxsalehi pxsalehi added WIP and removed WIP labels Feb 21, 2023
@henningandersen

This comment was marked as outdated.

@pxsalehi
Copy link
Member Author

Thanks for the feedback @henningandersen.

I think there is no way around contacting the indexing shard for real-time GET.

So as long as the GET request has its realtime field set to true (which is the default) we'd forward it to the indexing shard?

@pxsalehi
Copy link
Member Author

So as long as the GET request has its realtime field set to true (which is the default) we'd forward it to the indexing shard?

We discussed this on another channel with Henning. The disadvantage here would be that this might cause a lot of load on the indexing shards. We'd rather only contact the indexing shards for a Translog lookup.

@pxsalehi

This comment was marked as outdated.

@pxsalehi pxsalehi changed the title Defer real-time GET to promotable shard copies if necessary Stateless real-time GET Mar 3, 2023
@pxsalehi pxsalehi removed the WIP label Mar 6, 2023
@pxsalehi pxsalehi marked this pull request as ready for review March 6, 2023 10:59
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@pxsalehi pxsalehi requested review from henningandersen and removed request for DaveCTurner March 6, 2023 13:10
@pxsalehi
Copy link
Member Author

pxsalehi commented Mar 7, 2023

@elasticmachine update branch

1 similar comment
@pxsalehi
Copy link
Member Author

pxsalehi commented Mar 7, 2023

@elasticmachine update branch

@pxsalehi pxsalehi requested a review from idegtiarenko March 8, 2023 09:36
@pxsalehi
Copy link
Member Author

pxsalehi commented Mar 8, 2023

This is ready for review.

elasticsearchmachine pushed a commit that referenced this pull request May 16, 2023
This is a change broken off from the ongoing real-time GET PR
(#93976).  It is just an
action that can be used to invoke the new
`ShardGetService.getFromTranslog`  in
#95736. It will be used on
the search shards as a first step to handle a real-time get.

Relates #93976, ES-5537
@pxsalehi pxsalehi force-pushed the ps230220-realTimeGet branch from f53961b to 943a811 Compare May 23, 2023 14:27
@pxsalehi
Copy link
Member Author

@elasticmachine test this please

hmm... seems like an infra issue!

@pxsalehi pxsalehi force-pushed the ps230220-realTimeGet branch from 943a811 to 6367839 Compare May 23, 2023 16:37
@pxsalehi pxsalehi requested review from tlrx and henningandersen and removed request for tlrx, idegtiarenko, Tim-Brooks and henningandersen May 23, 2023 19:11
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -91,6 +110,16 @@ protected void resolveRequest(ClusterState state, InternalRequest request) {
protected void asyncShardOperation(GetRequest request, ShardId shardId, ActionListener<GetResponse> listener) throws IOException {
IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex());
IndexShard indexShard = indexService.getShard(shardId.id());
if (indexShard.routingEntry() == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen? Looks like we init it in IndexShard constructor and then only update it to non-null values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure. I vaguely remember running into an issue in the begging when I was working on this, and adding this helped. Maybe the issue was something else. I do see that similar checks are done in a couple of other transport actions, e.g., TransportIndicesStatsAction and DataStreamsStatsTransportAction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that looks like both comes from TransportIndicesStatsAction, which was done in 2014 and thus on a very different code base. I think we should assume this does not happen, since we assume so in other actions, for instance TransportGetFromTranslogAction, PostWriteRefresh and many others. Unless you know where it can happen I suggest to remove the new null check, we do not want null-checks all over for things that cannot be null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree!

if (shardRoutingTable.primaryShard() == null || shardRoutingTable.primaryShard().active() == false) {
throw new NoShardAvailableActionException(shardId, "primary shard is not active");
}
DiscoveryNode node = clusterService.state().nodes().get(shardRoutingTable.primaryShard().currentNodeId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the same ClusterState instance here rather than asking for it again? That way, the null check below can turn into an assertion (assert node != null).

@pxsalehi pxsalehi added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels May 24, 2023
@pxsalehi
Copy link
Member Author

@elasticmachine update branch

@pxsalehi pxsalehi added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels May 24, 2023
@pxsalehi
Copy link
Member Author

@elasticmachine update branch

@pxsalehi pxsalehi added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels May 25, 2023
@pxsalehi pxsalehi merged commit 1762733 into elastic:main May 25, 2023
elasticsearchmachine pushed a commit that referenced this pull request Jun 15, 2023
The mget counterpart of
#93976.

Relates ES-5677
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants