Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sagemaker): support batch-transform #6055

Merged
merged 9 commits into from
Sep 28, 2023
Merged

Conversation

deepankarm
Copy link
Contributor

@deepankarm deepankarm commented Sep 25, 2023

Goals

PR in continuation to #6046.

Sagemaker still calls the POST /invocations route during batch-transform but with different content-type & body. To enable both application/json (for inference) and text/csv (for batch-transform, I removed the pydantic model from the path signature, rather use fastapi's Request class and evaluate the body according to the content-type.

For batch-transform,

  • Expect text/csv or application/csv content type.
  • Validate if the format is correct.
  • Convert each row in the csv to input_model (user-defined model in the Executor route).
  • Invoke the route for all inputs together and return the response. This makes sure the values appended to the output csv by sagemaker has a similar format as inference.

Gotchas

  • Sagemaker doesn't support CSV with headers. We follow the same restriction.

  • Rows in the input csv should have all fields defined in the input type (including optional fields like id)

    from docarray import BaseDoc
    
    class TextDoc(BaseDoc):
        text: str

    Following inputs are valid

    1,abcd
    2,efgh
    3,ijkl
    4,mnop
    5,qrst
    6,uvwx
    7,yzab
    8,cdef
    9,ghij
    10,klmn
    

    Following CSV is invalid.

    abcd
    efgh
    ijkl
    mnop
    qrst
    uvwx
    yzab
    cdef
    ghij
    klmn
    

End-to-end tests

I've manually tested the Executor after pushing a model to S3, the Executor image to ECR & running batch-transform jobs for both SingleLine and MultiLine records.


  • check and update documentation. See guide and ask the team.

@github-actions github-actions bot added size/M area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing labels Sep 25, 2023
@deepankarm deepankarm marked this pull request as draft September 25, 2023 14:02
@codecov
Copy link

codecov bot commented Sep 25, 2023

Codecov Report

Attention: 46 lines in your changes are missing coverage. Please review.

Comparison is base (4ea8bb5) 76.93% compared to head (9ff2d57) 76.85%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6055      +/-   ##
==========================================
- Coverage   76.93%   76.85%   -0.09%     
==========================================
  Files         145      145              
  Lines       13889    13916      +27     
==========================================
+ Hits        10686    10695       +9     
- Misses       3203     3221      +18     
Flag Coverage Δ
jina 76.85% <11.53%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
jina/orchestrate/deployments/__init__.py 81.67% <37.50%> (-0.05%) ⬇️
jina/serve/runtimes/worker/http_sagemaker_app.py 0.00% <0.00%> (ø)

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add tests with docker?

@deepankarm
Copy link
Contributor Author

can u add tests with docker?

Added - a7faa6f

@deepankarm deepankarm requested a review from JoanFM September 28, 2023 08:26
@deepankarm deepankarm marked this pull request as ready for review September 28, 2023 08:26
JoanFM
JoanFM previously approved these changes Sep 28, 2023
@JoanFM JoanFM merged commit 67c83c2 into master Sep 28, 2023
@JoanFM JoanFM deleted the feat-sagemaker-batch branch September 28, 2023 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/testing This issue/PR affects testing size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants