Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync main and release branches #58

Merged
merged 6 commits into from
Feb 29, 2024
Merged

Sync main and release branches #58

merged 6 commits into from
Feb 29, 2024

Conversation

heyselbi
Copy link

Motivation

[Describe why this change is needed]

Modifications

[Describe the code changes]

Result

[Describe how the changes affects existing behavior and how to test it]

Related Issues

[Resolves #123]

joerunde and others added 6 commits February 27, 2024 15:13
This handles the OOM problem with large prefixes by both:

- Taking the max prefix cache size into account when running the memory usage estimator, to ensure a full prefix cache does not cause an OOM
- Taking the prefix length into consideration when deciding if a request will fit into a batch, to avoid large prefixes causing unexpected large memory allocations

This includes an api breaking change to the config, as the prefix cache will not be enabled unless a user explicitly sets PREFIX_STORE_PATH to some non-empty value.

Signed-off-by: Joe Runde <[email protected]>
Removing flash-attention v1 from the server-release image, to speed up build times.

Modifications:
- remove flash-att-v1 build stage from Dockerfile
- remove server/Makefile-flash-att
- create GitHub package for cache image on GitHub container registry
- push full cache image to ghcr,io on push to main (PR merged) 
- use cache image from ghcr.io for PR builds
- replace build stages/step with single `build-and-push` action
- temporarily build dropout_layer_norm and rotary_emb from flash-attention v2

---------

Signed-off-by: Christian Kadner <[email protected]>
Every successful push to main produces new build cache images on ghcr.io
which currently have to be cleaned up manually.

Add actions/delete-package-versions after successful push of a new
cache image (implicit).

---------

Signed-off-by: Christian Kadner <[email protected]>
A previous OOM fix mistakenly left model.word_embeddings always unset
when the prompt cache was disabled. This causes inference without a
prompt cache to fail. Our tests always set up a prompt cache so they did
not catch this case.

Fix: Always invoke self._setup_prompt_encoder() in model.py again.

Result: Inference without a prompt cache works.

-----

Signed-off-by: Joe Runde <[email protected]>
[pull] main from IBM:main
Copy link

openshift-ci bot commented Feb 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: heyselbi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@heyselbi
Copy link
Author

/override Build

Copy link

openshift-ci bot commented Feb 29, 2024

@heyselbi: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • Build

Only the following failed contexts/checkruns were expected:

  • branch-ci-opendatahub-io-text-generation-inference-main-fast-image-mirror
  • branch-ci-opendatahub-io-text-generation-inference-main-images
  • build
  • ci/prow/images
  • ci/prow/pr-image-mirror
  • integration-tests
  • pull-ci-opendatahub-io-text-generation-inference-main-images
  • pull-ci-opendatahub-io-text-generation-inference-main-pr-image-mirror
  • test-python
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override Build

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@heyselbi
Copy link
Author

/override build

Copy link

openshift-ci bot commented Feb 29, 2024

@heyselbi: Overrode contexts on behalf of heyselbi: build

In response to this:

/override build

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@heyselbi
Copy link
Author

/override branch-ci-opendatahub-io-text-generation-inference-main-fast-image-mirror branch-ci-opendatahub-io-text-generation-inference-main-images ci/prow/images ci/prow/pr-image-mirror

Copy link

openshift-ci bot commented Feb 29, 2024

@heyselbi: Overrode contexts on behalf of heyselbi: branch-ci-opendatahub-io-text-generation-inference-main-fast-image-mirror, branch-ci-opendatahub-io-text-generation-inference-main-images, ci/prow/images, ci/prow/pr-image-mirror

In response to this:

/override branch-ci-opendatahub-io-text-generation-inference-main-fast-image-mirror branch-ci-opendatahub-io-text-generation-inference-main-images ci/prow/images ci/prow/pr-image-mirror

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Xaenalt
Copy link
Member

Xaenalt commented Feb 29, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Feb 29, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit d8c81a3 into release Feb 29, 2024
8 of 10 checks passed
openshift-merge-bot bot pushed a commit that referenced this pull request Mar 21, 2024
#### Motivation

The ibm_fms engine is being too strict on the "variant" selection. The
FMS get_model call requires the variant, but also supports overriding
configuration parameters via kwargs.

#### Modifications

Allows for support for additional FMS Calico models that may have
differences from the default variant config, such as the size of the
vocab.

Signed-off-by: Travis Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants