Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync main and release branches #39

Merged
merged 42 commits into from
Jan 12, 2024
Merged

Sync main and release branches #39

merged 42 commits into from
Jan 12, 2024

Conversation

heyselbi
Copy link

@heyselbi heyselbi commented Jan 9, 2024

Description

Sync main and release to have most up to date stable image

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

tjohnson31415 and others added 30 commits November 14, 2023 17:47
Addresses CVE-2023-45803

Also fix break due to removed tests dir in newer miniconda
When using the download-weights CLI command and specifying a single extension.

This is used for slow tokenizers which can be subsequently converted to fast tokenizers.
This runs a series of tests to ensure consistency of output when the same input is included in a (padded) batch, as well as when batches are modified via pruning and concatenation operations while requests are in progress.
Adapted from corresponding changes to HF TGI (pre license-change)

Co-authored-by: Nicolas Patry <[email protected]>
Co-authored-by: Jamie Yang <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>
Addresses vuln WS-2023-0366
Reported in twistlock scan
CVE: GHSA-v8gr-m533-ghj9

Co-authored-by: Nick Hill <[email protected]>
This PR adds exllamav2 kernels.

The added changes are adapted from two open source repositories:
- https://github.com/turboderp/exllamav2
- https://github.com/PanQiWei/AutoGPTQ

Co-authored-by: Nick Hill <[email protected]>
This pull request (mostly) ports the heterogeneous next token chooser, which is used for flash models in TGI, into Causal LM.

Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>
Also updated patched transformers files with upstream updates
Inadvertently moved within the gptq-only block
From recent code observations
To avoid CPU-intensive tokenization on async event loop.

Determine thread pool size based on number of CPU cores and shard processes.

Also validate stop sequence lengths based on number of bytes rather than number of tokens (the latter doesn't make sense since we don't do token-based matching).

And add a couple of integration tests.
njhill and others added 10 commits November 14, 2023 18:35
Inadvertently broken by new dtype positional arg added to Batch.from_pb()
Since the decoding vectorization changes, the pad tokens are also passed in to the repetition penalty processor. In the case where the pad token id is equal to the EOS token id.

This bug was found when testing with the `EleutherAI/gpt-neox-20b` model in TGIS. Having pad token id == eos token id does not seem to be that common, but it is also the fallback if the pad token cannot be found another way.

There's also a little optimization change in this PR which is to pass a view over all_input_ids_tensor into `next_token_chooser` to avoid processing all of the pre-allocated output slots that have the pad token.

Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
Copy link

openshift-ci bot commented Jan 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: heyselbi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Jan 9, 2024
@dtrifiro
Copy link

Needs #40 to build succesfully

deps: bump optimum to 1.16.1
@dtrifiro dtrifiro merged commit 23ac6c1 into release Jan 12, 2024
4 of 5 checks passed
openshift-merge-bot bot pushed a commit that referenced this pull request Feb 29, 2024
This handles the OOM problem with large prefixes by both:

- Taking the max prefix cache size into account when running the memory usage estimator, to ensure a full prefix cache does not cause an OOM
- Taking the prefix length into consideration when deciding if a request will fit into a batch, to avoid large prefixes causing unexpected large memory allocations

This includes an api breaking change to the config, as the prefix cache will not be enabled unless a user explicitly sets PREFIX_STORE_PATH to some non-empty value.

Signed-off-by: Joe Runde <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.