Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-enable http generate endpoint #38

Closed

Conversation

dtrifiro
Copy link

@dtrifiro dtrifiro commented Jan 9, 2024

How Has This Been Tested?

Spin up text generations server:

text-generation-launcher --model-name ./flan-t5-small-caikit/artifacts

Using curl

curl --header "Content-Type: application/json"  --data '{"inputs": "complete this text"}' localhost:3000/generate

Result:

[{"generated_text":"a sluggish sluggish sluggish sl"}]

Using httpie

Hit the serve using httpie:

http :3000/generate inputs="complete this text"

Result:

HTTP/1.1 200 OK
content-length: 54
content-type: application/json
date: Tue, 09 Jan 2024 16:01:31 GMT
x-inference-time: 175
x-queue-time: 0
x-time-per-token: 8
x-total-time: 175
x-validation-time: 0

[
    {
        "generated_text": "a sluggish sluggish sluggish sl"
    }
]

Copy link

openshift-ci bot commented Jan 9, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dtrifiro
Once this PR has been reviewed and has the lgtm label, please assign anishasthana for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dtrifiro
Copy link
Author

dtrifiro commented Jan 9, 2024

@heyselbi

@dtrifiro dtrifiro force-pushed the enable-http-generate-endpoint branch 3 times, most recently from af4ab75 to 1c771cc Compare January 9, 2024 16:11
@dtrifiro
Copy link
Author

dtrifiro commented Jan 10, 2024

Upstream PR: IBM#22

Signed-off-by: Daniele Trifirò <[email protected]>
openshift-merge-bot bot pushed a commit that referenced this pull request Feb 29, 2024
Removing flash-attention v1 from the server-release image, to speed up build times.

Modifications:
- remove flash-att-v1 build stage from Dockerfile
- remove server/Makefile-flash-att
- create GitHub package for cache image on GitHub container registry
- push full cache image to ghcr,io on push to main (PR merged) 
- use cache image from ghcr.io for PR builds
- replace build stages/step with single `build-and-push` action
- temporarily build dropout_layer_norm and rotary_emb from flash-attention v2

---------

Signed-off-by: Christian Kadner <[email protected]>
@dtrifiro dtrifiro closed this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants