Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to Elasticsearch streaming_bulk helper #2492

Merged
merged 7 commits into from
Oct 21, 2021

Conversation

dbutenhof
Copy link
Member

@dbutenhof dbutenhof commented Oct 12, 2021

This creates a new ElasticBulkBase class providing support for Elasticsearch bulk operations including publish and delete, and which currently shares the query_apis __init__.py with the ElasticBase class.

Resolves #2490

@dbutenhof dbutenhof added bug Server Indexing API Of and relating to application programming interfaces to services and functions labels Oct 12, 2021
@dbutenhof dbutenhof added this to the v0.72 milestone Oct 12, 2021
@dbutenhof dbutenhof self-assigned this Oct 12, 2021
@dbutenhof dbutenhof marked this pull request as ready for review October 13, 2021 21:23
npalaska
npalaska previously approved these changes Oct 14, 2021
Copy link
Member

@npalaska npalaska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should update the PR description (e.g., remove the references to it being a draft), and the branch needs a rebase...but, this is great stuff!

Of course, I have the usual pile of comments. The principal theme running through them is that this code is evidently evolutionary: there are vestiges of the original code in the new code, and there are vestiges of the subclass in the superclass; I've pointed out the ones that I found. Most of my concerns are centered around documentation and comments, particularly around the json_data parameter, which turns out to be a bit of a swiss-army knife. (It's really more like a "context" parameter.)

So, really, it just needs a little polish and then it will be good to go!

lib/pbench/server/api/resources/query_apis/__init__.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/query_apis/__init__.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/query_apis/__init__.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/query_apis/__init__.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/query_apis/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@portante portante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it looks good, the details of the bulk handling is being repeated from py-es-bulk. By the time we are done here, there won't be much difference.

webbnh
webbnh previously approved these changes Oct 18, 2021
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good: you still need to update the PR description; otherwise, I have just a couple of nits for you.

lib/pbench/server/api/resources/query_apis/__init__.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/query_apis/__init__.py Outdated Show resolved Hide resolved
@dbutenhof dbutenhof dismissed stale reviews from portante and webbnh via 45b2f9b October 20, 2021 11:20
@dbutenhof dbutenhof requested review from portante and webbnh October 20, 2021 11:21
Simplify the pre-update info log to speed things up and avoid
an unnecessary pass through the MAP; the document count will
appear in the post-update message anyway.

Add a clarification that the bulk update operation only changes
the one specified field.
Elasticsearch bulk "action" rather than assuming "update" in
the base class.
@dbutenhof
Copy link
Member Author

@webbnh still has an unresolved conversation.

@portante portante modified the milestones: v0.72, v0.71 Oct 20, 2021
@dbutenhof dbutenhof merged commit 91eefd2 into distributed-system-analysis:main Oct 21, 2021
@dbutenhof dbutenhof deleted the bulk branch October 21, 2021 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Of and relating to application programming interfaces to services and functions bug Indexing Server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A large dataset cannot be marked as "published"
4 participants