Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic search with point in time and search after #2847

Merged
merged 2 commits into from
Jun 9, 2023

Conversation

graytaylor0
Copy link
Member

@graytaylor0 graytaylor0 commented Jun 8, 2023

Description

Implements searching via point in time. Paginates based on the batch_size in the opensearch config (defaults to 1000)

Limitations of this PR:

  • If the buffer is full and Events time out writing to the buffer, they will currently be dropped (data loss). There will be a follow up PR to robustly handle this by backing off and retrying
  • The query passed in the config is currently unused, as the opensearch java client does not support passing query strings or query maps directly to search requests ([FEATURE] Send search request query DSL json directly to search request opensearch-java#525). For now, only processing all documents via a match_all query and default sorting in ascending mode is supported.

Issues Resolved

Related to #1985

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.


import java.util.List;

public class SearchPointInTimeResponse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd perhaps rename this. At first glance it looks like it is intended to be similar to the opensearch-java APIs. But, it is quite different.

Perhaps: SearchPointInTimeResults or SearchPointInTimeOutput?


return SearchPointInTimeResponse.builder()
.withDocuments(documents)
.withNextSearchAfter(searchResponse.hits().hits().get(searchResponse.hits().hits().size() - 1).sort())
Copy link
Member

@dlvenable dlvenable Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow how you get the next search after. Is there no order from the OpenSearch response that could help here?

Also, you have an inline sort(). That should probably be pulled out of this line as it is somewhat signficiant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this is the OpenSearch "sort" model, not a collection sort.


return SearchPointInTimeResponse.builder()
.withDocuments(documents)
.withNextSearchAfter(searchResponse.hits().hits().get(searchResponse.hits().hits().size() - 1).sort())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the size be 0? What if your paging results in a final empty page?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be good catch.

@graytaylor0 graytaylor0 mentioned this pull request Jun 8, 2023
4 tasks
kkondaka
kkondaka previously approved these changes Jun 8, 2023
@@ -20,10 +20,10 @@ public class SearchConfiguration {
private static final Logger LOG = LoggerFactory.getLogger(SearchConfiguration.class);

@JsonProperty("batch_size")
private Integer batchSize;
private Integer batchSize = 1000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit
DEFAULT_BATCH_SIZE=1000

Signed-off-by: Taylor Gray <[email protected]>
@dlvenable
Copy link
Member

I'd also recommend that you rename SearchPointInTimeRequest to also clearly disambiguate it from the opensearch-java client pattern. But, you could easily do this in another PR.

@graytaylor0 graytaylor0 requested a review from kkondaka June 9, 2023 14:52
@graytaylor0 graytaylor0 merged commit 2f979d5 into opensearch-project:main Jun 9, 2023
MaGonzalMayedo pushed a commit to MaGonzalMayedo/data-prepper that referenced this pull request Jun 21, 2023
…h-project#2847)

Implement basic search with point in time and search after

Signed-off-by: Taylor Gray <[email protected]>
Signed-off-by: Marcos_Gonzalez_Mayedo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants