Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk insert batch requests #3628

Merged

Conversation

michaelcheah
Copy link
Contributor

What this PR does / why we need it:
The previous implementation does a for loop and inserts each item in a batch request as an individual update to elasticsearch. This PR changes inserting batch requests by using bulk updates. This provides a significant improvement in performance.

Special notes for your reviewer:

  • The content of docs created in elasticsearch is unchanged.
  • One breaking change introduced where the response of for the request logger is now empty. i.e. previously a post request to the request logger would return a response body with the following content (this is the elasticsearch API response)
    [{"_id": "abc123", "_index": "inference-log-seldon-seldon-iris-default", "_primary_term": 1, 
    "_seq_no": 2, "_shards": {"failed": 0, "successful": 1, "total": 2}, "_type": "_doc", 
    "_version": 1, "forced_refresh": true, "result": "created"}]]
    However, now an empty list is returned. It is possible to return a similar looking response by constructing it, but the Python elasticsearch client for bulk updates does not provide any way to return the same content.

Does this PR introduce a user-facing change?:

seldon-request-logger post response when receiving a request with batch data does not include elasticsearch API response - returns an empty list

@seldondev
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jklaise
You can assign the PR to them by writing /assign @jklaise in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


def gen_data():
for num, item in enumerate(new_content_part["instance"], start=0):
print(f"bulk inserting item {num}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print the item_request_id that yielded

def gen_data():
for num, item in enumerate(new_content_part["instance"], start=0):
print(f"bulk inserting item {num}")
item_body = doc_body.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this redundant?


yield {
"_index": index_name,
"_type": "_doc",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use helper function log_helper.DOC_TYPE_NAME

@axsaucedo axsaucedo merged commit 62d9b55 into SeldonIO:master Oct 2, 2021
seldondev pushed a commit that referenced this pull request Oct 2, 2021
* bulk insert batch requests

* address PR comments
@axsaucedo axsaucedo mentioned this pull request Oct 2, 2021
4 tasks
stephen37 pushed a commit to stephen37/seldon-core that referenced this pull request Dec 21, 2021
* bulk insert batch requests

* address PR comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants