Sync main and release branches #58

heyselbi · 2024-02-29T17:51:51Z

Motivation

[Describe why this change is needed]

Modifications

[Describe the code changes]

Result

[Describe how the changes affects existing behavior and how to test it]

Related Issues

[Resolves #123]

This handles the OOM problem with large prefixes by both: - Taking the max prefix cache size into account when running the memory usage estimator, to ensure a full prefix cache does not cause an OOM - Taking the prefix length into consideration when deciding if a request will fit into a batch, to avoid large prefixes causing unexpected large memory allocations This includes an api breaking change to the config, as the prefix cache will not be enabled unless a user explicitly sets PREFIX_STORE_PATH to some non-empty value. Signed-off-by: Joe Runde <[email protected]>

Removing flash-attention v1 from the server-release image, to speed up build times. Modifications: - remove flash-att-v1 build stage from Dockerfile - remove server/Makefile-flash-att - create GitHub package for cache image on GitHub container registry - push full cache image to ghcr,io on push to main (PR merged) - use cache image from ghcr.io for PR builds - replace build stages/step with single `build-and-push` action - temporarily build dropout_layer_norm and rotary_emb from flash-attention v2 --------- Signed-off-by: Christian Kadner <[email protected]>

Every successful push to main produces new build cache images on ghcr.io which currently have to be cleaned up manually. Add actions/delete-package-versions after successful push of a new cache image (implicit). --------- Signed-off-by: Christian Kadner <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

A previous OOM fix mistakenly left model.word_embeddings always unset when the prompt cache was disabled. This causes inference without a prompt cache to fail. Our tests always set up a prompt cache so they did not catch this case. Fix: Always invoke self._setup_prompt_encoder() in model.py again. Result: Inference without a prompt cache works. ----- Signed-off-by: Joe Runde <[email protected]>

[pull] main from IBM:main

openshift-ci · 2024-02-29T17:51:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: heyselbi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [heyselbi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

heyselbi · 2024-02-29T19:32:21Z

/override Build

openshift-ci · 2024-02-29T19:32:24Z

@heyselbi: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

Build

Only the following failed contexts/checkruns were expected:

branch-ci-opendatahub-io-text-generation-inference-main-fast-image-mirror
branch-ci-opendatahub-io-text-generation-inference-main-images
build
ci/prow/images
ci/prow/pr-image-mirror
integration-tests
pull-ci-opendatahub-io-text-generation-inference-main-images
pull-ci-opendatahub-io-text-generation-inference-main-pr-image-mirror
test-python
tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override Build

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

heyselbi · 2024-02-29T19:33:03Z

/override build

openshift-ci · 2024-02-29T19:33:08Z

@heyselbi: Overrode contexts on behalf of heyselbi: build

In response to this:

/override build

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

heyselbi · 2024-02-29T19:43:44Z

/override branch-ci-opendatahub-io-text-generation-inference-main-fast-image-mirror branch-ci-opendatahub-io-text-generation-inference-main-images ci/prow/images ci/prow/pr-image-mirror

openshift-ci · 2024-02-29T19:43:53Z

@heyselbi: Overrode contexts on behalf of heyselbi: branch-ci-opendatahub-io-text-generation-inference-main-fast-image-mirror, branch-ci-opendatahub-io-text-generation-inference-main-images, ci/prow/images, ci/prow/pr-image-mirror

In response to this:

/override branch-ci-opendatahub-io-text-generation-inference-main-fast-image-mirror branch-ci-opendatahub-io-text-generation-inference-main-images ci/prow/images ci/prow/pr-image-mirror

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Xaenalt · 2024-02-29T19:44:43Z

/lgtm

#### Motivation The ibm_fms engine is being too strict on the "variant" selection. The FMS get_model call requires the variant, but also supports overriding configuration parameters via kwargs. #### Modifications Allows for support for additional FMS Calico models that may have differences from the default variant config, such as the size of the vocab. Signed-off-by: Travis Johnson <[email protected]>

joerunde and others added 6 commits February 27, 2024 15:13

chore: Small formatting change in cli.py to avoid long line (#40)

7a97fd6

Signed-off-by: Nick Hill <[email protected]>

Merge pull request #57 from IBM/main

6256dd3

[pull] main from IBM:main

openshift-ci bot requested review from dtrifiro and rpancham February 29, 2024 17:51

openshift-ci bot added the approved label Feb 29, 2024

openshift-ci bot assigned Xaenalt Feb 29, 2024

openshift-ci bot added the lgtm label Feb 29, 2024

openshift-merge-bot bot merged commit d8c81a3 into release Feb 29, 2024
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync main and release branches #58

Sync main and release branches #58

heyselbi commented Feb 29, 2024

openshift-ci bot commented Feb 29, 2024

heyselbi commented Feb 29, 2024

openshift-ci bot commented Feb 29, 2024

heyselbi commented Feb 29, 2024

openshift-ci bot commented Feb 29, 2024

heyselbi commented Feb 29, 2024

openshift-ci bot commented Feb 29, 2024

Xaenalt commented Feb 29, 2024

Sync main and release branches #58

Sync main and release branches #58

Conversation

heyselbi commented Feb 29, 2024

Motivation

Modifications

Result

Related Issues

openshift-ci bot commented Feb 29, 2024

heyselbi commented Feb 29, 2024

openshift-ci bot commented Feb 29, 2024

heyselbi commented Feb 29, 2024

openshift-ci bot commented Feb 29, 2024

heyselbi commented Feb 29, 2024

openshift-ci bot commented Feb 29, 2024

Xaenalt commented Feb 29, 2024