Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] API support for beam search for MQLLMEngine #9117

Merged
merged 13 commits into from
Oct 8, 2024

Conversation

LunrEclipse
Copy link
Contributor

Added support for running a higher layer of beam search across openai API server with frontend multiprocessing.

Manually testing conducted to verify that requests are still ran in parallel and output is correct.

server side:

$ vllm serve meta-llama/Meta-Llama-3-8B

client side:

Completion

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="key123",
)

prompt = "Capital of France is"

try:
    completion = client.completions.create(
        model="meta-llama/Meta-Llama-3-8B",
        prompt=prompt,
        max_tokens=4,
        extra_body={'use_beam_search': True, 'best_of': 3}
    )
    print(completion.choices[0].text)
except Exception as e:
    print(e)

Chat

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="key123",
)

prompt = "Capital of France is"

try:
    completion = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3-8B",
        messages = [
            {"role": "system", "content": "You are a helpful AI assistant."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=10,
        extra_body={'use_beam_search': True, 'best_of': 3, 'temperature': 0}
    )
    print(completion)
    print(completion.choices[0].message.content)
except Exception as e:
    print(e)

Copy link

github-actions bot commented Oct 7, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@LunrEclipse LunrEclipse marked this pull request as ready for review October 7, 2024 19:24
@youkaichao
Copy link
Member

and also update

except BadRequestError as e:

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! left some comments.

elif isinstance(self.engine_client, MQLLMEngineClient):
generator = self.engine_client.beam_search(
prompt_inputs["prompt_token_ids"], request_id_item,
sampling_params, lora_request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is beam search supported with LoRA?

It seems like it does not work through the AsyncLLMEngine since we aren't passing the lora_request parameter there. So I think you could collapse these two cases since lora_request will always be none when passed to MQLLMEngine

prompt_inputs["prompt_token_ids"], request_id_item,
sampling_params)
" with AsyncLLMEngine and MQLLMEngineClient."
" please add `--disable-frontend-multiprocessing`"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--disable-frontend-multiprocessing will no longer resolve this error. There also is no other case here so this could be an assert.

" please add `--disable-frontend-multiprocessing`"
" to use beam search.")
assert isinstance(self.engine_client,
(AsyncLLMEngine,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, you don't actually need the assert

You can beam_search to the EngineClientProtocol

Copy link
Contributor Author

@LunrEclipse LunrEclipse Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the base EngineClientProtocol has a beam_search function. I can add it in though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. But the EngineClientProtocol defines the behavior of AsyncLLMEngine and MQLLMEngine. So now that both support it, you can expand EngineClientProtocol to include the beam_search api

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find EngineClientProtocol . does it exist now? @robertgshaw2-neuralmagic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, it should be EngineClient class. but MQLLMEngineClient does not inherit from EngineClient . we can make it a future step to absorb beam search implementation into the EngineClient .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EngineClient is a protocol. MQLLMEngine should inherit from this. If it doesn’t, I’ll submit a PR to make it (since we support the full API). On train so AFK

Either way, we are about to collapse MQLLMEngine and AsyncLLMEngine once we have PP working, so the concept of an EngineClient will be removed once this is done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. I'll go ahead and merge this pr after it is ready. and after you make MQLLMEngine inherit from EngineClient, we can merge separate beam search implementation in one place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's currently a Protocol, MQLLMEngine technically doesn't need to subclass it directly. But it would probably be good to anyhow and actually we could consider changing it to an ABC instead.

I agree with @robertgshaw2-neuralmagic that this method should just be added to EngineClient though and we should not need these type assertions.

Not directly related to this PR but I also think we should consider renaming it to something like AsyncEngineClient, and have a way to obtain an instance of an AsyncEngineClient which doesn't involve explicit construction. And that would replace explicit use of AsyncLLMEngine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are welcome on this part!

length_penalty)

tokenizer = await self.get_tokenizer(lora_request)
tokenizer = await self.get_tokenizer(None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some sort of kwarg to show why it is None here

vllm/utils.py Outdated
@@ -1380,3 +1412,12 @@ def get_beam_search_score(
seq_len -= 1

return cumulative_logprob / (seq_len**length_penalty)


def create_sort_beams_key_function(tokenizer, length_penalty):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def create_sort_beams_key_function(tokenizer, length_penalty):
def create_sort_beams_key_function(eos_token_id: int, length_penalty: float):

prompt_inputs["prompt_token_ids"], request_id_item,
sampling_params)
prompt_inputs["prompt_token_ids"],
request_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between request_id and request_id_item ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yeah it's supposed to be request_id_item, I was a bit careless when I was doing some refactoring. I think completion supports multiple prompts so each prompt has it's own request_id_item for a general request_id.

vllm/utils.py Outdated
Comment on lines 1367 to 1397
@dataclass
class BeamSearchSequence:
"""A sequence for beam search.
It keeps track of the tokens and the log probability of the sequence.
The text field is optional and will only be filled when the sequence is
about to be returned to the user.
"""
# The tokens includes the prompt.
tokens: List[int]
cum_logprob: float = 0.0
text: Optional[str] = None


@dataclass
class BeamSearchOutput:
"""The output of beam search.
It contains the list of the best beam search sequences.
The length of the list is equal to the beam width.
"""
sequences: List[BeamSearchSequence]


class BeamSearchInstance:

def __init__(self, prompt_tokens: List[int]):
self.beams: List[BeamSearchSequence] = [
BeamSearchSequence(tokens=prompt_tokens)
]
self.completed: List[BeamSearchSequence] = []


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move them to vllm/sequence.py ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a circular import error if I put these classes in vllm/sequence.py as BeamSearchSequence is needed in vllm/utils.py, but vllm/sequence.py indirectly imports from vllm/utils.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logically, vllm/utils.py should not import vllm/sequence.py . we should change the code if this is the case.

Copy link
Contributor Author

@LunrEclipse LunrEclipse Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i also move the create_sort_beams_key_function to vllm/sequence.py? That will solve the issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, go ahead!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether these should go in sequence.py since that holds "internal" data-structures used within the scheduler etc. and if I understand correctly, BeamSearchSequence etc. are only used in the outer layer(s). Maybe better to have a dedicated file an the appropriate place in the tree for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to have a dedicated file an the appropriate place in the tree for this?

makes sense. how about vllm/beam_search.py ?

@youkaichao youkaichao added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2024
@youkaichao youkaichao enabled auto-merge (squash) October 7, 2024 22:54
@youkaichao
Copy link
Member

code LGTM, and we can merge after all tests pass. thanks again for the great work @LunrEclipse !

@njhill
Copy link
Member

njhill commented Oct 8, 2024

I added some comments, might be good to address those first (unless you guys disagree with them).

@njhill njhill disabled auto-merge October 8, 2024 00:49
@youkaichao
Copy link
Member

DM with @njhill , it is good to merge now. we can also add followup pr if needed.

@youkaichao youkaichao enabled auto-merge (squash) October 8, 2024 05:06
@youkaichao youkaichao merged commit 8c74622 into vllm-project:main Oct 8, 2024
59 checks passed
@LunrEclipse LunrEclipse deleted the beam-search-mq-refactor branch October 8, 2024 05:52
shajrawi pushed a commit to ROCm/vllm that referenced this pull request Oct 9, 2024
* [Build/CI] Upgrade to gcc 10 in the base build Docker image (vllm-project#8814)

* [Docs] Add README to the build docker image (vllm-project#8825)

* [CI/Build] Fix missing ci dependencies (vllm-project#8834)

* [misc][installation] build from source without compilation (vllm-project#8818)

* [ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (vllm-project#8872)

Signed-off-by: kevin <[email protected]>

* [Bugfix] Include encoder prompts len to non-stream api usage response (vllm-project#8861)

* [Misc] Change dummy profiling and BOS fallback warns to log once (vllm-project#8820)

* [Bugfix] Fix print_warning_once's line info (vllm-project#8867)

* fix validation: Only set tool_choice `auto` if at least one tool is provided (vllm-project#8568)

* [Bugfix] Fixup advance_step.cu warning (vllm-project#8815)

* [BugFix] Fix test breakages from transformers 4.45 upgrade (vllm-project#8829)

* [Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (vllm-project#8764)

* [Feature] Add support for Llama 3.1 and 3.2 tool use (vllm-project#8343)

Signed-off-by: Max de Bayser <[email protected]>

* [Core] rename`PromptInputs` and `inputs` (vllm-project#8876)

* [misc] fix collect env (vllm-project#8894)

* [MISC] Fix invalid escape sequence '\' (vllm-project#8830)

Signed-off-by: Peter Pan <[email protected]>

* [Bugfix][VLM] Fix Fuyu batching inference with `max_num_seqs>1` (vllm-project#8892)

* [TPU] Update pallas.py to support trillium (vllm-project#8871)

* [torch.compile] use empty tensor instead of None for profiling (vllm-project#8875)

* [Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (vllm-project#7271)

* [Bugfix] fix for deepseek w4a16 (vllm-project#8906)

Co-authored-by: mgoin <[email protected]>

* [Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (vllm-project#8378)

Co-authored-by: Varun Sundar Rabindranath <[email protected]>

* [misc][distributed] add VLLM_SKIP_P2P_CHECK flag (vllm-project#8911)

* [Core] Priority-based scheduling in async engine (vllm-project#8850)

* [misc] fix wheel name (vllm-project#8919)

* [Bugfix][Intel] Fix XPU Dockerfile Build (vllm-project#7824)

Signed-off-by: tylertitsworth <[email protected]>
Co-authored-by: youkaichao <[email protected]>

* [Misc] Remove vLLM patch of `BaichuanTokenizer` (vllm-project#8921)

* [Bugfix] Fix code for downloading models from modelscope (vllm-project#8443)

* [Bugfix] Fix PP for Multi-Step (vllm-project#8887)

* [CI/Build] Update models tests & examples (vllm-project#8874)

Co-authored-by: Roger Wang <[email protected]>

* [Frontend] Make beam search emulator temperature modifiable (vllm-project#8928)

Co-authored-by: Eduard Balzin <[email protected]>

* [Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (vllm-project#8891)

* [doc] organize installation doc and expose per-commit docker (vllm-project#8931)

* [Core] Improve choice of Python multiprocessing method (vllm-project#8823)

Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: youkaichao <[email protected]>

* [Bugfix] Block manager v2 with preemption and lookahead slots (vllm-project#8824)

* [Bugfix] Fix Marlin MoE act order when is_k_full == False (vllm-project#8741)

Co-authored-by: Tyler Michael Smith <[email protected]>

* [CI/Build] Add test decorator for minimum GPU memory (vllm-project#8925)

* [Build/CI] Set FETCHCONTENT_BASE_DIR to one location for better caching (vllm-project#8930)

* [Model] Support Qwen2.5-Math-RM-72B (vllm-project#8896)

* [Model][LoRA]LoRA support added for MiniCPMV2.5 (vllm-project#7199)

* [BugFix] Fix seeded random sampling with encoder-decoder models (vllm-project#8870)

Co-authored-by: Roger Wang <[email protected]>

* [Misc] Fix typo in BlockSpaceManagerV1 (vllm-project#8944)

* [Frontend] Added support for HF's new `continue_final_message` parameter (vllm-project#8942)

* [Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (vllm-project#8533)

* [Model] support input embeddings for qwen2vl (vllm-project#8856)

* [Misc][CI/Build] Include `cv2` via `mistral_common[opencv]`  (vllm-project#8951)

* [Model][LoRA]LoRA support added for MiniCPMV2.6 (vllm-project#8943)

Co-authored-by: DarkLight1337 <[email protected]>

* [Model] Expose InternVL2 max_dynamic_patch as a mm_processor_kwarg (vllm-project#8946)

* [Core] Make scheduling policy settable via EngineArgs (vllm-project#8956)

* [Misc] Adjust max_position_embeddings for LoRA compatibility (vllm-project#8957)

* [ci] Add CODEOWNERS for test directories  (vllm-project#8795)

Signed-off-by: kevin <[email protected]>

* [CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (vllm-project#8975)

* [Frontend][Core] Move guided decoding params into sampling params (vllm-project#8252)

Signed-off-by: Joe Runde <[email protected]>
Co-authored-by: Nick Hill <[email protected]>

* [CI/Build] Fix machete generated kernel files ordering (vllm-project#8976)

Signed-off-by: kevin <[email protected]>
Co-authored-by: Cody Yu <[email protected]>

* [torch.compile] fix tensor alias (vllm-project#8982)

* [Misc] add process_weights_after_loading for DummyLoader (vllm-project#8969)

* [Bugfix] Fix Fuyu tensor parallel inference (vllm-project#8986)

* [Bugfix] Fix Token IDs Reference for MiniCPM-V When Images are Provided With No Placeholders (vllm-project#8991)

Signed-off-by: Alex-Brooks <[email protected]>

* [Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (vllm-project#8965)

* [Doc] Update list of supported models (vllm-project#8987)

* Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (vllm-project#8997)

* [Spec Decode] (1/2) Remove batch expansion (vllm-project#8839)

* [Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (vllm-project#8804)

Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>

* [Misc] Update Default Image Mapper Error Log (vllm-project#8977)

Signed-off-by: Alex-Brooks <[email protected]>
Co-authored-by: Roger Wang <[email protected]>

* [Core] CUDA Graphs for Multi-Step + Chunked-Prefill (vllm-project#8645)

Co-authored-by: Varun Sundar Rabindranath <[email protected]>

* [OpenVINO] Enable GPU support for OpenVINO vLLM backend (vllm-project#8192)

* [Model]  Adding Granite MoE. (vllm-project#8206)

Co-authored-by: Nick Hill <[email protected]>

* [Doc] Update Granite model docs (vllm-project#9025)

* [Bugfix] example template should not add parallel_tool_prompt if tools is none (vllm-project#9007)

* [Misc] log when using default MoE config (vllm-project#8971)

* [BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (vllm-project#9020)

* [Core] Make BlockSpaceManagerV2 the default BlockManager to use. (vllm-project#8678)

* [Frontend] [Neuron] Parse literals out of override-neuron-config (vllm-project#8959)

Co-authored-by: Jerzy Zagorski <[email protected]>

* [misc] add forward context for attention (vllm-project#9029)

* Fix failing spec decode test (vllm-project#9054)

* [Bugfix] Weight loading fix for OPT model (vllm-project#9042)

Co-authored-by: dvres <[email protected]>

* [Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (vllm-project#8405)

* [CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (vllm-project#8845)

* [Misc] Enable multi-step output streaming by default (vllm-project#9047)

* [Models] Add remaining model PP support (vllm-project#7168)

Signed-off-by: Muralidhar Andoorveedu <[email protected]>
Signed-off-by: Murali Andoorveedu <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>

* [Misc] Move registry to its own file (vllm-project#9064)

* [Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL (vllm-project#9071)

* [Bugfix] Flash attention arches not getting set properly (vllm-project#9062)

* [Model] add a bunch of supported lora modules for mixtral (vllm-project#9008)

Signed-off-by: Prashant Gupta <[email protected]>

* Remove AMD Ray Summit Banner (vllm-project#9075)

* [Hardware][PowerPC] Make oneDNN dependency optional for Power (vllm-project#9039)

Signed-off-by: Varad Ahirwadkar <[email protected]>

* [Core][VLM] Test registration for OOT multimodal models (vllm-project#8717)

Co-authored-by: DarkLight1337 <[email protected]>

* Adds truncate_prompt_tokens param for embeddings creation (vllm-project#8999)

Signed-off-by: Flavia Beo <[email protected]>

* [Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (vllm-project#8973)

Co-authored-by: Dipika <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>

* [CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (vllm-project#7412)

* [Misc] Improved prefix cache example (vllm-project#9077)

* [Misc] Add random seed for prefix cache benchmark (vllm-project#9081)

* [Misc] Fix CI lint (vllm-project#9085)

* [Hardware][Neuron] Add on-device sampling support for Neuron (vllm-project#8746)

Co-authored-by: Ashraf Mahgoub <[email protected]>

* [torch.compile] improve allreduce registration (vllm-project#9061)

* [Doc] Update README.md with Ray summit slides (vllm-project#9088)

* [Bugfix] use blockmanagerv1 for encoder-decoder (vllm-project#9084)

Co-authored-by: Roger Wang <[email protected]>

* [Bugfix] Fixes Phi3v & Ultravox Multimodal EmbeddingInputs (vllm-project#8979)

* [Model] Support Gemma2 embedding model (vllm-project#9004)

* [Bugfix] Deprecate registration of custom configs to huggingface (vllm-project#9083)

* [Bugfix] Fix order of arguments matters in config.yaml (vllm-project#8960)

* [core] use forward context for flash infer (vllm-project#9097)

* [Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model (vllm-project#9101)

* [Frontend] API support for beam search (vllm-project#9087)

Co-authored-by: youkaichao <[email protected]>

* [Misc] Remove user-facing error for removed VLM args (vllm-project#9104)

* [Model] PP support for embedding models and update docs (vllm-project#9090)

Co-authored-by: Roger Wang <[email protected]>

* [Bugfix] fix tool_parser error handling when serve a model not support it (vllm-project#8709)

* [Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (vllm-project#9038)

Co-authored-by: Varun Sundar Rabindranath <[email protected]>

* [Bugfix][Hardware][CPU] Fix CPU model input for decode (vllm-project#9044)

* [BugFix][Core] Fix BlockManagerV2 when Encoder Input is None (vllm-project#9103)

* [core] remove beam search from the core (vllm-project#9105)

* [Model] Explicit interface for vLLM models and support OOT embedding models (vllm-project#9108)

* [Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (vllm-project#9089)

* [Core] Refactor GGUF parameters packing and forwarding (vllm-project#8859)

* [Model] Support NVLM-D and fix QK Norm in InternViT (vllm-project#9045)

Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Isotr0py <[email protected]>

* [Doc]: Add deploying_with_k8s guide (vllm-project#8451)

* [CI/Build] Add linting for github actions workflows (vllm-project#7876)

Signed-off-by: Russell Bryant <[email protected]>

* [Doc] Include performance benchmark in README (vllm-project#9135)

* [misc] fix comment and variable name (vllm-project#9139)

* Add Slack to README (vllm-project#9137)

* [misc] update utils to support comparing multiple settings (vllm-project#9140)

* [Intel GPU] Fix xpu decode input  (vllm-project#9145)

* [misc] improve ux on readme (vllm-project#9147)

* [Frontend] API support for beam search for MQLLMEngine (vllm-project#9117)

* [Core][Frontend] Add Support for Inference Time mm_processor_kwargs (vllm-project#9131)

Signed-off-by: Alex-Brooks <[email protected]>

* Factor out common weight loading code

* Fix EAGLE model loading

* [Frontend] Add Early Validation For Chat Template / Tool Call Parser (vllm-project#9151)

Signed-off-by: Alex-Brooks <[email protected]>

* Improve efficiency

* Rename

* Update LLaVA-NeXT-Video

* [CI/Build] Add examples folder into Docker image so that we can leverage the templates*.jinja when serving models (vllm-project#8758)

Signed-off-by: Peter Pan <[email protected]>

* [Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (vllm-project#8537)

* Automatic loading and save memory

* Rename

* Update docstring

* Simplify

* Cleanup

* Fully enable recursive loading

* Clarify

* [Doc] Update vlm.rst to include an example on videos (vllm-project#9155)

Co-authored-by: Cyrus Leung <[email protected]>

* Fix incorrect semantics

* Move function

* Update error message

* Fix Ultravox loading

* spacing

* [Doc] Improve contributing and installation documentation (vllm-project#9132)

Signed-off-by: Rafael Vasquez <[email protected]>

* Fix server

* [Bugfix] Try to handle older versions of pytorch (vllm-project#9086)

---------

Signed-off-by: kevin <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Peter Pan <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Muralidhar Andoorveedu <[email protected]>
Signed-off-by: Murali Andoorveedu <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Varad Ahirwadkar <[email protected]>
Signed-off-by: Flavia Beo <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: fyuan1316 <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Kevin H. Luu <[email protected]>
Co-authored-by: Pernekhan Utemuratov <[email protected]>
Co-authored-by: Chirag Jain <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Maximilien de Bayser <[email protected]>
Co-authored-by: Peter Pan <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Brittany <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Sebastian Schoennenbeck <[email protected]>
Co-authored-by: Tyler Titsworth <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: tastelikefeet <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Edouard B. <[email protected]>
Co-authored-by: Eduard Balzin <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: sroy745 <[email protected]>
Co-authored-by: ElizaWszola <[email protected]>
Co-authored-by: Zilin Zhu <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: juncheoll <[email protected]>
Co-authored-by: danieljannai21 <[email protected]>
Co-authored-by: Mor Zusman <[email protected]>
Co-authored-by: whyiug <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Lily Liu <[email protected]>
Co-authored-by: Joe Runde <[email protected]>
Co-authored-by: Cody Yu <[email protected]>
Co-authored-by: Divakar Verma <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: vlsav <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>
Co-authored-by: Sergey Shlyapnikov <[email protected]>
Co-authored-by: Shawn Tan <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>
Co-authored-by: Guillaume Calmettes <[email protected]>
Co-authored-by: xendo <[email protected]>
Co-authored-by: Jerzy Zagorski <[email protected]>
Co-authored-by: Domen Vreš <[email protected]>
Co-authored-by: dvres <[email protected]>
Co-authored-by: 代君 <[email protected]>
Co-authored-by: Murali Andoorveedu <[email protected]>
Co-authored-by: Prashant Gupta <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Varad Ahirwadkar <[email protected]>
Co-authored-by: Flávia Béo <[email protected]>
Co-authored-by: Dipika <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
Co-authored-by: Andy Dai <[email protected]>
Co-authored-by: Chongming Ni <[email protected]>
Co-authored-by: Ashraf Mahgoub <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: hhzhang16 <[email protected]>
Co-authored-by: Xin Yang <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: Brendan Wong <[email protected]>
Co-authored-by: Yanyi Liu <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: TimWang <[email protected]>
Co-authored-by: Kunshang Ji <[email protected]>
Co-authored-by: Daniele <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: bnellnm <[email protected]>
@nFunctor
Copy link
Contributor

nFunctor commented Oct 10, 2024

Hi, I'm currently testing this new beam search with the nightly build.

  • The logs in the server are quite extensive, is this intended? They are not suppressed by --disable-log-requests
  • If I understand correctly, the num_beams is now the n of completion request, and not the "best_of" that used to be in extra-body? Your example uses best_of of the extra body but when I read the code for BeamSearchParams it seems to rely on n.

Sorry if those are to be addressed anyway, I am trying to figure out how it works before we get a new vllm release. Thanks!

PS should have probably asked it in #9087 , sorry, still that's your PR as well

@LunrEclipse LunrEclipse restored the beam-search-mq-refactor branch October 10, 2024 21:15
@youkaichao
Copy link
Member

The logs in the server are quite extensive, is this intended? They are not suppressed by --disable-log-requests

every beam search request will join and leave the request queue again and again. so it is expected. using --disable-log-requests is recommended.

If I understand correctly, the num_beams is now the n of completion request, and not the "best_of" that used to be in extra-body?

I think so. @LunrEclipse can you confirm it?

@nFunctor
Copy link
Contributor

@youkaichao well, what I meant is that --disable-log-requests does not help at all. I think what I see is logging.info(output) that is placed after output = [x[0] for x in output] in the code.

I am trying to make a PR for stop logic where I suppress this logging in passing.

opus24 added a commit to Hyper-Accel/vllm that referenced this pull request Oct 11, 2024
commit 94bf9ae4e9b8199636668ccbe4dabcdc3b9e5ae6
Author: Andy Dai <[email protected]>
Date:   Thu Oct 10 17:33:16 2024 -0700

    [Misc] Fix sampling from sonnet for long context case (#9235)

commit f990bab2a4198c4de6b5b349d35fc74bf0f36f3e
Author: omrishiv <[email protected]>
Date:   Thu Oct 10 16:36:32 2024 -0700

    [Doc][Neuron] add note to neuron documentation about resolving triton issue (#9257)

    Signed-off-by: omrishiv <[email protected]>

commit e00c094f15e79c5a113fdf975df1ee9018cb65b3
Author: youkaichao <[email protected]>
Date:   Thu Oct 10 15:54:23 2024 -0700

    [torch.compile] generic decorators (#9258)

commit a78c6ba7c88a7bb42b38410f9dcfa5b342b95b57
Author: Kevin H. Luu <[email protected]>
Date:   Thu Oct 10 15:45:09 2024 -0700

    [ci/build] Add placeholder command for custom models test (#9262)

commit fb870fd491482cfe5a41648b8c081d1bd6941205
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Oct 10 13:30:46 2024 -0700

    Bump actions/setup-python from 3 to 5 (#9195)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 270953bafb1ccf444f2018d1c0a88c51472de22e
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Oct 10 13:30:35 2024 -0700

    Bump actions/checkout from 3 to 4 (#9196)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 9cc811c4ff3d5200cc23f16709f540821531b77c
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Oct 10 13:30:24 2024 -0700

    Bump actions/github-script from 6 to 7 (#9197)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit e4d652ea3ed9b2a60c1582cb2e2605695e61280f
Author: youkaichao <[email protected]>
Date:   Thu Oct 10 12:39:36 2024 -0700

    [torch.compile] integration with compilation control (#9058)

commit 78c0b4166cb097de749993970b51cb7b8becba58
Author: Simon Mo <[email protected]>
Date:   Thu Oct 10 12:29:24 2024 -0700

    Suggest codeowners for the core componenets (#9210)

commit 21efb603f5f88a0d78ad11e4fbc6e18fe83916d4
Author: jordanyono <[email protected]>
Date:   Thu Oct 10 14:18:18 2024 -0400

    [CI/Build] Make the `Dockerfile.cpu` file's  `PIP_EXTRA_INDEX_URL` Configurable as a Build Argument (#9252)

commit 055f3270d40bbc492630d0f2c96ec8b64823ba34
Author: Rafael Vasquez <[email protected]>
Date:   Thu Oct 10 13:48:51 2024 -0400

    [Doc] Improve debugging documentation (#9204)

    Signed-off-by: Rafael Vasquez <[email protected]>

commit 18511aeda64b473314bb7727a97a220565e0af41
Author: Lucas Wilkinson <[email protected]>
Date:   Thu Oct 10 13:39:56 2024 -0400

    [Bugfix] Fix Machete unittests failing with `NotImplementedError` (#9218)

commit 83ea5c72b9a287b65c9f7b95fbd868b3f613e6f5
Author: Ilya Lavrenov <[email protected]>
Date:   Thu Oct 10 21:18:58 2024 +0400

    [OpenVINO] Use torch 2.4.0 and newer optimim version (#9121)

    Co-authored-by: DarkLight1337 <[email protected]>

commit 04de9057ab8099291e66ad876e78693c7c2f2ce5
Author: whyiug <[email protected]>
Date:   Thu Oct 10 23:00:47 2024 +0800

    [Model] support input image embedding for minicpmv (#9237)

commit 07c11cf4d4b9a913fa52142fe134849f1e25e393
Author: Isotr0py <[email protected]>
Date:   Thu Oct 10 21:11:56 2024 +0800

    [Bugfix] Fix lm_head weights tying with lora for llama (#9227)

commit f3a507f1d31e13a99c4fc8ac02738a73c3e3136f
Author: sroy745 <[email protected]>
Date:   Wed Oct 9 23:17:17 2024 -0700

    [Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)

commit a64e7b940734b68d849ed2b07ca1bc3824713555
Author: Lucas Wilkinson <[email protected]>
Date:   Thu Oct 10 02:16:17 2024 -0400

    [Bugfix] Machete garbage results for some models (large K dim) (#9212)

commit ce00231a8bfb5eae85167b5a3def1b7304c723b6
Author: Michael Goin <[email protected]>
Date:   Thu Oct 10 02:15:40 2024 -0400

    [Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213)

commit de895f1697d22ea19a5a4d4ab3dc17037a3e9af3
Author: youkaichao <[email protected]>
Date:   Wed Oct 9 21:58:27 2024 -0700

    [misc] improve model support check in another process (#9208)

commit cf25b93bddb607077e52cbe4681332ca61aff189
Author: Russell Bryant <[email protected]>
Date:   Thu Oct 10 00:10:09 2024 -0400

    [Core] Fix invalid args to _process_request (#9201)

    Signed-off-by: Russell Bryant <[email protected]>

commit d5fbb8706d2c7fd00b64cff2efbe7c771fe82c3c
Author: Michael Goin <[email protected]>
Date:   Wed Oct 9 14:51:47 2024 -0400

    [CI/Build] Update Dockerfile install+deploy image to ubuntu 22.04 (#9130)

    Co-authored-by: DarkLight1337 <[email protected]>

commit cdca8994bd856a234112875a92746c5782837768
Author: Russell Bryant <[email protected]>
Date:   Wed Oct 9 13:15:28 2024 -0400

    [CI/Build] mypy: check vllm/entrypoints (#9194)

    Signed-off-by: Russell Bryant <[email protected]>

commit ca77dd7a44f2bc103c668560818918ac0335835a
Author: Li, Jiang <[email protected]>
Date:   Thu Oct 10 00:28:08 2024 +0800

    [Hardware][CPU] Support AWQ for CPU backend (#7515)

commit 7dea289066eaed35538e74dfadafd1fea1dbe05d
Author: Ewout ter Hoeven <[email protected]>
Date:   Wed Oct 9 17:16:26 2024 +0200

    Add Dependabot configuration for GitHub Actions updates (#1217)

    Co-authored-by: DarkLight1337 <[email protected]>

commit cfaa6008e666d4e9bb5131ece68f8609b6f94ee4
Author: Cyrus Leung <[email protected]>
Date:   Wed Oct 9 22:59:57 2024 +0800

    [Bugfix] Access `get_vocab` instead of `vocab` in tool parsers (#9188)

commit 21906a6f50ee0edf49ede856a82e8840bab41471
Author: Ahmad Fahadh Ilyas <[email protected]>
Date:   Wed Oct 9 05:10:44 2024 -0700

    [Bugfix] Fix lora loading for Compressed Tensors in #9120 (#9179)

commit dc4aea677ab0520d91ff4979e80340cb5a090095
Author: Jiangtao Hu <[email protected]>
Date:   Wed Oct 9 16:59:42 2024 +0800

    [Doc] Fix VLM prompt placeholder sample bug (#9170)

commit c8627cd41b10747da393b76c382de5ef0eb635a2
Author: youkaichao <[email protected]>
Date:   Wed Oct 9 00:38:40 2024 -0700

    [ci][test] use load dummy for testing (#9165)

commit 8bfaa4e31eb63d41499fec933e68969ebbedb01f
Author: Cyrus Leung <[email protected]>
Date:   Wed Oct 9 15:36:55 2024 +0800

    [Bugfix] fix composite weight loading and EAGLE weight loading (#9160)

commit 0b5b5d767e7fdc0b1070b37319de749e46a4d42a
Author: AlpinDale <[email protected]>
Date:   Wed Oct 9 07:03:14 2024 +0000

    [Frontend] Log the maximum supported concurrency (#8831)

commit cdc72e3c80b7029c49de9667150f68481f386956
Author: Hui Liu <[email protected]>
Date:   Tue Oct 8 23:43:06 2024 -0700

    [Model] Remap FP8 kv_scale in CommandR and DBRX (#9174)

commit 7627172bf42b9cd628402c98845c6ac3de80859a
Author: Joe Rowell <[email protected]>
Date:   Wed Oct 9 06:43:34 2024 +0100

    [Bugfix][Doc] Report neuron error in output (#9159)

commit 480b7f40cfa9a900e03ea4e825abc1a46b5d085b
Author: Travis Johnson <[email protected]>
Date:   Tue Oct 8 22:54:48 2024 -0600

    [Misc] Improve validation errors around best_of and n (#9167)

    Signed-off-by: Travis Johnson <[email protected]>

commit acce7630c1dd655ca95a9f1abff23d92ef76262c
Author: Yuan Tang <[email protected]>
Date:   Tue Oct 8 23:58:49 2024 -0400

    Update link to KServe deployment guide (#9173)

commit ffc4b27ea8924b4b5add13552063c93d0a14fb85
Author: Yuan Tang <[email protected]>
Date:   Tue Oct 8 22:30:48 2024 -0400

    Add classifiers in setup.py (#9171)

commit 2f4117c38e101ee63b65521c93b22efe3526f77e
Author: chenqianfzh <[email protected]>
Date:   Tue Oct 8 18:52:19 2024 -0700

    support bitsandbytes quantization with more models (#9148)

commit 9ba0bd6aa6a9a3cefa5c320800ea736a0abbaf36
Author: Michael Goin <[email protected]>
Date:   Tue Oct 8 21:22:31 2024 -0400

    Add `lm-eval` directly to requirements-test.txt (#9161)

commit 2a131965a8144d571a4a211a44d1fc32e202ae10
Author: Russell Bryant <[email protected]>
Date:   Tue Oct 8 18:08:22 2024 -0400

    mypy: check additional directories (#9162)

    Signed-off-by: Russell Bryant <[email protected]>

commit bd37b9fbe274e28e12c0687cb9a8111dda270936
Author: bnellnm <[email protected]>
Date:   Tue Oct 8 17:28:12 2024 -0400

    [Bugfix] Try to handle older versions of pytorch (#9086)

commit de24046fcd24e8faa81de34b17351887bcdfbe51
Author: Rafael Vasquez <[email protected]>
Date:   Tue Oct 8 16:22:08 2024 -0400

    [Doc] Improve contributing and installation documentation (#9132)

    Signed-off-by: Rafael Vasquez <[email protected]>

commit 1874c6a1b0ae0f9eb2b485653b4e17ed1d861a32
Author: Sayak Paul <[email protected]>
Date:   Tue Oct 8 23:42:29 2024 +0530

    [Doc] Update vlm.rst to include an example on videos (#9155)

    Co-authored-by: Cyrus Leung <[email protected]>

commit 9a94ca4a5d31c0ba57ca67fc1c252233d3284012
Author: Daniele <[email protected]>
Date:   Tue Oct 8 18:38:40 2024 +0200

    [Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537)

commit cfba685bd462f360994da7ac0d33f9759589506e
Author: Peter Pan <[email protected]>
Date:   Wed Oct 9 00:37:34 2024 +0800

    [CI/Build] Add examples folder into Docker image so that we can leverage the templates*.jinja when serving models (#8758)

    Signed-off-by: Peter Pan <[email protected]>

commit 069d3bd8d01a72e93c0a5b51f8b567e8aaddc6e9
Author: Alex Brooks <[email protected]>
Date:   Tue Oct 8 08:31:26 2024 -0600

    [Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151)

    Signed-off-by: Alex-Brooks <[email protected]>

commit a3691b6b5eb7e60039a8ff34550be5a7e8365394
Author: Alex Brooks <[email protected]>
Date:   Tue Oct 8 08:12:56 2024 -0600

    [Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131)

    Signed-off-by: Alex-Brooks <[email protected]>

commit 8c746226c956f7c8a4672689fee91c7d22befed6
Author: Brendan Wong <[email protected]>
Date:   Mon Oct 7 22:51:43 2024 -0700

    [Frontend] API support for beam search for MQLLMEngine (#9117)

commit e1faa2a59876bba99d804c0a94d427cee87b0995
Author: youkaichao <[email protected]>
Date:   Mon Oct 7 22:26:25 2024 -0700

    [misc] improve ux on readme (#9147)

commit 80b57f00d554db8a2126d351bb5374c190b56699
Author: Kunshang Ji <[email protected]>
Date:   Tue Oct 8 11:51:14 2024 +0800

    [Intel GPU] Fix xpu decode input  (#9145)

commit 04c12f81572be22c819018c2fcbddac5f08715d0
Author: youkaichao <[email protected]>
Date:   Mon Oct 7 19:51:49 2024 -0700

    [misc] update utils to support comparing multiple settings (#9140)

commit 8eeb85708428b7735bbd1156c81692431fd5ff34
Author: Simon Mo <[email protected]>
Date:   Mon Oct 7 17:06:21 2024 -0700

    Add Slack to README (#9137)

commit fa45513a5189b3a9f73a59730c9ac65d061e1311
Author: youkaichao <[email protected]>
Date:   Mon Oct 7 16:07:05 2024 -0700

    [misc] fix comment and variable name (#9139)

commit c0d9a98d0c7182b73c2e7f88508e690a186bf0e3
Author: Kuntai Du <[email protected]>
Date:   Mon Oct 7 15:04:06 2024 -0700

    [Doc] Include performance benchmark in README (#9135)

commit e0dbdb013dfe5cdbe044317b4d7d55644d6399b3
Author: Russell Bryant <[email protected]>
Date:   Mon Oct 7 17:18:10 2024 -0400

    [CI/Build] Add linting for github actions workflows (#7876)

    Signed-off-by: Russell Bryant <[email protected]>

commit 93cf74a8a7b0b483becdba95e3056adbf201b7b2
Author: TimWang <[email protected]>
Date:   Tue Oct 8 04:31:45 2024 +0800

    [Doc]: Add deploying_with_k8s guide (#8451)

commit 151ef4efd2fb52554f4d30408aca619e181ea751
Author: Cyrus Leung <[email protected]>
Date:   Mon Oct 7 19:55:12 2024 +0800

    [Model] Support NVLM-D and fix QK Norm in InternViT (#9045)

    Co-authored-by: Roger Wang <[email protected]>
    Co-authored-by: Isotr0py <[email protected]>

commit f19da64871065510691cd4fcaa5f4096b661dcec
Author: Isotr0py <[email protected]>
Date:   Mon Oct 7 18:01:46 2024 +0800

    [Core] Refactor GGUF parameters packing and forwarding (#8859)

commit 4f95ffee6f40198911ee824ed06d645fe9678511
Author: Isotr0py <[email protected]>
Date:   Mon Oct 7 14:50:35 2024 +0800

    [Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089)

commit 8c6de96ea1e6e51e49a170c28ad3efc16db9413e
Author: Cyrus Leung <[email protected]>
Date:   Mon Oct 7 14:10:35 2024 +0800

    [Model] Explicit interface for vLLM models and support OOT embedding models (#9108)

commit 18b296fdb2248e8a65bf005e7193ebd523b875b6
Author: youkaichao <[email protected]>
Date:   Sun Oct 6 22:47:04 2024 -0700

    [core] remove beam search from the core (#9105)

commit c8f26bb63694adb4202ab275efb0759c13edcaa8
Author: sroy745 <[email protected]>
Date:   Sun Oct 6 20:52:42 2024 -0700

    [BugFix][Core] Fix BlockManagerV2 when Encoder Input is None (#9103)

commit 487678d046fe56560ff5dc6c91c3f3c31af7de6f
Author: Isotr0py <[email protected]>
Date:   Mon Oct 7 10:14:27 2024 +0800

    [Bugfix][Hardware][CPU] Fix CPU model input for decode (#9044)

commit cb3b2b9ba4a95c413a879e30e2b8674187519a93
Author: Varun Sundar Rabindranath <[email protected]>
Date:   Sun Oct 6 15:48:11 2024 -0400

    [Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038)

    Co-authored-by: Varun Sundar Rabindranath <[email protected]>

commit fdf59d30eaf1a62979b2a13016b4f47f28f12f88
Author: Yanyi Liu <[email protected]>
Date:   Sun Oct 6 20:51:08 2024 +0800

    [Bugfix] fix tool_parser error handling when serve a model not support it (#8709)

commit b22b79847153ae10710523cdb4a5fb98ac864cf4
Author: Cyrus Leung <[email protected]>
Date:   Sun Oct 6 16:35:27 2024 +0800

    [Model] PP support for embedding models and update docs (#9090)

    Co-authored-by: Roger Wang <[email protected]>

commit f22619fe96c842ee2406638678d2b60009d8ff14
Author: Cyrus Leung <[email protected]>
Date:   Sun Oct 6 16:33:52 2024 +0800

    [Misc] Remove user-facing error for removed VLM args (#9104)

commit 168cab6bbfb733f97defc8c1aa13df90c5319f19
Author: Brendan Wong <[email protected]>
Date:   Sat Oct 5 23:39:03 2024 -0700

    [Frontend] API support for beam search (#9087)

    Co-authored-by: youkaichao <[email protected]>

commit 23fea8714a1e90f018163e0eee59d73bc5a500e7
Author: TJian <[email protected]>
Date:   Sat Oct 5 22:00:04 2024 -0700

    [Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model (#9101)

commit f4dd830e0945300dbe2039af79d1994f074ffcbb
Author: youkaichao <[email protected]>
Date:   Sat Oct 5 19:37:31 2024 -0700

    [core] use forward context for flash infer (#9097)

commit 5df183489537a155bbaad9232f25b8e57694d7b8
Author: Andy Dai <[email protected]>
Date:   Sat Oct 5 10:35:11 2024 -0700

    [Bugfix] Fix order of arguments matters in config.yaml (#8960)

commit cfadb9c68798c0cc4d674de19970a8e3b5ea1273
Author: Chen Zhang <[email protected]>
Date:   Sat Oct 5 06:56:40 2024 -0700

    [Bugfix] Deprecate registration of custom configs to huggingface (#9083)

commit 15986f598c7b1f2969918c92f5c4cf7e28d5c0df
Author: Xin Yang <[email protected]>
Date:   Fri Oct 4 23:57:05 2024 -0700

    [Model] Support Gemma2 embedding model (#9004)

commit 53b3a330273967a3c4124cbfef2cacac92f553ba
Author: hhzhang16 <[email protected]>
Date:   Fri Oct 4 22:05:37 2024 -0700

    [Bugfix] Fixes Phi3v & Ultravox Multimodal EmbeddingInputs (#8979)

commit dac914b0d6bc36de4eb4bf70a9d20954560893ea
Author: Chen Zhang <[email protected]>
Date:   Fri Oct 4 21:45:38 2024 -0700

    [Bugfix] use blockmanagerv1 for encoder-decoder (#9084)

    Co-authored-by: Roger Wang <[email protected]>

commit a95354a36ee65523a499b3eb42f70a4a0ea4322d
Author: Zhuohan Li <[email protected]>
Date:   Fri Oct 4 19:54:45 2024 -0700

    [Doc] Update README.md with Ray summit slides (#9088)

commit 663874e048d88aa7bf087628430d50f9f5245175
Author: youkaichao <[email protected]>
Date:   Fri Oct 4 16:43:50 2024 -0700

    [torch.compile] improve allreduce registration (#9061)

commit cc90419e89c358f906e17a5ec484fbe04092c277
Author: Chongming Ni <[email protected]>
Date:   Fri Oct 4 16:42:20 2024 -0700

    [Hardware][Neuron] Add on-device sampling support for Neuron (#8746)

    Co-authored-by: Ashraf Mahgoub <[email protected]>

commit 27302dd5841d4b0fa4788076ad9ff2993e133409
Author: Cody Yu <[email protected]>
Date:   Fri Oct 4 16:07:54 2024 -0700

    [Misc] Fix CI lint (#9085)

commit 0cc566ca8fd2d21a94f3a8e48bf5c5b60d42b59f
Author: Andy Dai <[email protected]>
Date:   Fri Oct 4 14:58:57 2024 -0700

    [Misc] Add random seed for prefix cache benchmark (#9081)

commit 05c531be476e8a864a1ab83a65f7e056315ea1fc
Author: Andy Dai <[email protected]>
Date:   Fri Oct 4 14:38:42 2024 -0700

    [Misc] Improved prefix cache example (#9077)

commit fbb74420e7018bf0cc1bc81e6fd71a2392347227
Author: Kuntai Du <[email protected]>
Date:   Fri Oct 4 14:01:44 2024 -0700

    [CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412)

commit 05d686432f2e13296127962861b21c25cdcdfc8b
Author: ElizaWszola <[email protected]>
Date:   Fri Oct 4 20:34:44 2024 +0200

    [Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)

    Co-authored-by: Dipika <[email protected]>
    Co-authored-by: Dipika Sikka <[email protected]>

commit 0dcc8cbe5abd4f2fafd495bd1c65fdd75d8dd919
Author: Flávia Béo <[email protected]>
Date:   Fri Oct 4 15:31:40 2024 -0300

    Adds truncate_prompt_tokens param for embeddings creation (#8999)

    Signed-off-by: Flavia Beo <[email protected]>

commit 26aa325f4ffe8bf1d9b921535cc02fb31d80a96d
Author: Roger Wang <[email protected]>
Date:   Fri Oct 4 10:38:25 2024 -0700

    [Core][VLM] Test registration for OOT multimodal models (#8717)

    Co-authored-by: DarkLight1337 <[email protected]>

commit e5dc713c2343b3549b43d6e2764a1036e4052bf8
Author: Varad Ahirwadkar <[email protected]>
Date:   Fri Oct 4 22:54:42 2024 +0530

    [Hardware][PowerPC] Make oneDNN dependency optional for Power (#9039)

    Signed-off-by: Varad Ahirwadkar <[email protected]>

commit 36eecfbddb9ac2c491174c86b28ee83c4773eb5e
Author: Simon Mo <[email protected]>
Date:   Fri Oct 4 10:17:16 2024 -0700

    Remove AMD Ray Summit Banner (#9075)

commit 9ade8bbc8dc63c03b9399f05e85a0d0ddc6f5788
Author: Prashant Gupta <[email protected]>
Date:   Fri Oct 4 09:24:40 2024 -0700

    [Model] add a bunch of supported lora modules for mixtral (#9008)

    Signed-off-by: Prashant Gupta <[email protected]>

commit 22482e495e00d409c9b5c78dade6e672ddf7fbc2
Author: Lucas Wilkinson <[email protected]>
Date:   Fri Oct 4 11:43:15 2024 -0400

    [Bugfix] Flash attention arches not getting set properly (#9062)

commit 3d826d2c52242f4f78789adcb7c02938c84ed18b
Author: whyiug <[email protected]>
Date:   Fri Oct 4 22:34:58 2024 +0800

    [Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL (#9071)

commit 0e36fd4909780392a9c5d0e367b0a84250d55fa8
Author: Cyrus Leung <[email protected]>
Date:   Fri Oct 4 18:01:37 2024 +0800

    [Misc] Move registry to its own file (#9064)

commit 0f6d7a9a347944bffd2204cbf9686299e9dd6557
Author: Murali Andoorveedu <[email protected]>
Date:   Thu Oct 3 19:56:58 2024 -0700

    [Models] Add remaining model PP support (#7168)

    Signed-off-by: Muralidhar Andoorveedu <[email protected]>
    Signed-off-by: Murali Andoorveedu <[email protected]>
    Co-authored-by: DarkLight1337 <[email protected]>

commit 303d44790a2ccab86257f1b6097e67795f0845d4
Author: Michael Goin <[email protected]>
Date:   Thu Oct 3 22:55:42 2024 -0400

    [Misc] Enable multi-step output streaming by default (#9047)

commit aeb37c2a725554791ff6f258b1e18830867a3ab9
Author: Lucas Wilkinson <[email protected]>
Date:   Thu Oct 3 22:55:25 2024 -0400

    [CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845)

commit 3dbb215b38c010c050f7fde3528fe2c6673f7a07
Author: 代君 <[email protected]>
Date:   Fri Oct 4 10:36:39 2024 +0800

    [Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405)

commit 2838d6b38e1e37b303b01f2af0a9ddee2dd66f39
Author: Domen Vreš <[email protected]>
Date:   Fri Oct 4 01:53:29 2024 +0200

    [Bugfix] Weight loading fix for OPT model (#9042)

    Co-authored-by: dvres <[email protected]>

commit 91add85ec409a3628d01a1e4d4b3230e0fd3aa3f
Author: sroy745 <[email protected]>
Date:   Thu Oct 3 16:07:29 2024 -0700

    Fix failing spec decode test (#9054)

commit 9aaf14c62e16a7c74b5192a44d01a78125dab2fc
Author: youkaichao <[email protected]>
Date:   Thu Oct 3 12:09:42 2024 -0700

    [misc] add forward context for attention (#9029)

commit 63e39937f990818e2f22a9b821a4aa22387057a7
Author: xendo <[email protected]>
Date:   Thu Oct 3 20:02:07 2024 +0200

    [Frontend] [Neuron] Parse literals out of override-neuron-config (#8959)

    Co-authored-by: Jerzy Zagorski <[email protected]>

commit f5d72b2fc6771de19c351945f1fbbb0198d53b8e
Author: sroy745 <[email protected]>
Date:   Thu Oct 3 09:44:21 2024 -0700

    [Core] Make BlockSpaceManagerV2 the default BlockManager to use. (#8678)

commit 83caf35e082b2657dce5f71ff965a13653a763b0
Author: Guillaume Calmettes <[email protected]>
Date:   Thu Oct 3 10:44:52 2024 +0200

    [BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (#9020)

commit 01843c89b8ddae00d4a0f0f56b8aa7fbaa3efc42
Author: Divakar Verma <[email protected]>
Date:   Wed Oct 2 23:31:07 2024 -0500

    [Misc] log when using default MoE config (#8971)

commit 19a4dd09904975d121a10e5e3f707927f3e09faa
Author: Travis Johnson <[email protected]>
Date:   Wed Oct 2 21:04:17 2024 -0600

    [Bugfix] example template should not add parallel_tool_prompt if tools is none (#9007)

commit 18c2e30c5754dc83f86d9b8c75af0499a77e4b3f
Author: Nick Hill <[email protected]>
Date:   Thu Oct 3 03:42:24 2024 +0100

    [Doc] Update Granite model docs (#9025)

commit 19f0d2579695e518c9bfc166544cf23775772bf8
Author: Shawn Tan <[email protected]>
Date:   Wed Oct 2 21:33:57 2024 -0400

    [Model]  Adding Granite MoE. (#8206)

    Co-authored-by: Nick Hill <[email protected]>

commit f58d4fccc9b270838be438f5f0db71bea156a56d
Author: Sergey Shlyapnikov <[email protected]>
Date:   Thu Oct 3 01:50:01 2024 +0400

    [OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192)

commit afb050b29d0cac27c32c19c8206a9ac2a4662de2
Author: Varun Sundar Rabindranath <[email protected]>
Date:   Wed Oct 2 15:44:39 2024 -0400

    [Core] CUDA Graphs for Multi-Step + Chunked-Prefill (#8645)

    Co-authored-by: Varun Sundar Rabindranath <[email protected]>

commit 7f60520deb05d2e097b408e3310f1d383fbf1de6
Author: Alex Brooks <[email protected]>
Date:   Wed Oct 2 05:44:38 2024 -0600

    [Misc] Update Default Image Mapper Error Log (#8977)

    Signed-off-by: Alex-Brooks <[email protected]>
    Co-authored-by: Roger Wang <[email protected]>

commit 563649aafe7d4b9cb0047bba60d6f58efa53fd28
Author: afeldman-nm <[email protected]>
Date:   Wed Oct 2 03:52:20 2024 -0400

    [Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (#8804)

    Co-authored-by: Varun Sundar Rabindranath <[email protected]>
    Co-authored-by: Andrew Feldman <[email protected]>

commit 15702038642192002cd8973cf8948751b750fd07
Author: Lily Liu <[email protected]>
Date:   Tue Oct 1 16:04:42 2024 -0700

    [Spec Decode] (1/2) Remove batch expansion (#8839)

commit 22f5851b807376a836eb3551903c7fc6c81eaa9b
Author: vlsav <[email protected]>
Date:   Tue Oct 1 21:07:06 2024 +0300

    Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997)

commit 4f341bd4bf35c5b431dc523bab86e4ae210baaf8
Author: Cyrus Leung <[email protected]>
Date:   Wed Oct 2 00:35:39 2024 +0800

    [Doc] Update list of supported models (#8987)

commit 35bd2151684ffb20cdad825abe33e0e6f0cc005a
Author: Sebastian Schoennenbeck <[email protected]>
Date:   Tue Oct 1 11:58:06 2024 +0200

    [Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965)

commit 1fe0a4264aa94ceeccc7e8d99ac0d72f0560f541
Author: Alex Brooks <[email protected]>
Date:   Tue Oct 1 03:52:44 2024 -0600

    [Bugfix] Fix Token IDs Reference for MiniCPM-V When Images are Provided With No Placeholders (#8991)

    Signed-off-by: Alex-Brooks <[email protected]>

commit bc4eb65b5492b4f84a1b714bfc14bcff73d401f1
Author: Isotr0py <[email protected]>
Date:   Tue Oct 1 17:51:41 2024 +0800

    [Bugfix] Fix Fuyu tensor parallel inference (#8986)

commit 82f3937e599a4f088a62e59abe81d51e11bb8f83
Author: Divakar Verma <[email protected]>
Date:   Mon Sep 30 22:46:41 2024 -0500

    [Misc] add process_weights_after_loading for DummyLoader (#8969)

commit 7da2487591888da043254f8c7045a48d5dbcc753
Author: youkaichao <[email protected]>
Date:   Mon Sep 30 20:40:48 2024 -0700

    [torch.compile] fix tensor alias (#8982)

commit aaccca2b4d3895d64d34b123e61731404c8fc2c0
Author: Kevin H. Luu <[email protected]>
Date:   Mon Sep 30 20:33:12 2024 -0700

    [CI/Build] Fix machete generated kernel files ordering (#8976)

    Signed-off-by: kevin <[email protected]>
    Co-authored-by: Cody Yu <[email protected]>

commit 062c89e7c9c6fa9fd7fb2d28fd50321c6f78f389
Author: Joe Runde <[email protected]>
Date:   Mon Sep 30 19:34:25 2024 -0600

    [Frontend][Core] Move guided decoding params into sampling params (#8252)

    Signed-off-by: Joe Runde <[email protected]>
    Co-authored-by: Nick Hill <[email protected]>

commit bce324487a8e36140143ea37f4b27d273a0fd661
Author: Lily Liu <[email protected]>
Date:   Mon Sep 30 17:51:40 2024 -0700

    [CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975)

commit 1425a1bcf9c53e24fe5f4812acc5b656f2aa02f3
Author: Kevin H. Luu <[email protected]>
Date:   Mon Sep 30 17:47:08 2024 -0700

    [ci] Add CODEOWNERS for test directories  (#8795)

    Signed-off-by: kevin <[email protected]>

commit 1cabfcefb64a489c8ff9dcb289b4dd47cf8f89cf
Author: Jee Jee Li <[email protected]>
Date:   Mon Sep 30 20:57:39 2024 +0800

    [Misc] Adjust max_position_embeddings for LoRA compatibility (#8957)

commit be76e5aabf8c026e1a82028ad70167e8c652cee9
Author: Sebastian Schoennenbeck <[email protected]>
Date:   Mon Sep 30 14:28:44 2024 +0200

    [Core] Make scheduling policy settable via EngineArgs (#8956)

commit 2ae25f79cf1e8d21f7bcba097e4c039463c22be4
Author: Isotr0py <[email protected]>
Date:   Mon Sep 30 13:01:20 2024 +0800

    [Model] Expose InternVL2 max_dynamic_patch as a mm_processor_kwarg (#8946)

commit 8e60afa15eb9a0540ce6c453b974a945adff3320
Author: Jee Jee Li <[email protected]>
Date:   Mon Sep 30 12:31:55 2024 +0800

    [Model][LoRA]LoRA support added for MiniCPMV2.6 (#8943)

    Co-authored-by: DarkLight1337 <[email protected]>

commit b6d7392579286b6dbd8ca96c0bcb4cc6f7c3c4a0
Author: Roger Wang <[email protected]>
Date:   Sun Sep 29 21:28:26 2024 -0700

    [Misc][CI/Build] Include `cv2` via `mistral_common[opencv]`  (#8951)

commit e01ab595d897698c9a5fe9eaebd983eb3e23470a
Author: whyiug <[email protected]>
Date:   Mon Sep 30 11:16:10 2024 +0800

    [Model] support input embeddings for qwen2vl (#8856)

commit f13a07b1f8c11ddbdc53b40f1fbb24bf3166b900
Author: Mor Zusman <[email protected]>
Date:   Mon Sep 30 00:35:58 2024 +0300

    [Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533)

commit 6c9ba48fdebe2f44c82eabfe136dc8dc6ad6f4ed
Author: danieljannai21 <[email protected]>
Date:   Sun Sep 29 20:59:47 2024 +0300

    [Frontend] Added support for HF's new `continue_final_message` parameter (#8942)

commit 1fb9c1b0bf8e65e6576ff4c45f5623d233d7194b
Author: juncheoll <[email protected]>
Date:   Mon Sep 30 00:05:54 2024 +0900

    [Misc] Fix typo in BlockSpaceManagerV1 (#8944)

commit 31f46a0d35da80118bac5f80c533019cd50ddd9a
Author: Nick Hill <[email protected]>
Date:   Sun Sep 29 10:43:14 2024 +0100

    [BugFix] Fix seeded random sampling with encoder-decoder models (#8870)

    Co-authored-by: Roger Wang <[email protected]>

commit 3d49776bbb25927abf91bb7c5537e0006c199c16
Author: Jee Jee Li <[email protected]>
Date:   Sun Sep 29 14:59:45 2024 +0800

    [Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199)

commit bc2ef1f77c1578612198f60ec392731efb3847c5
Author: Zilin Zhu <[email protected]>
Date:   Sun Sep 29 12:19:39 2024 +0800

    [Model] Support Qwen2.5-Math-RM-72B (#8896)

commit 2e7fe7e79f41e294eeed2f484eeb791284ec48a2
Author: Tyler Michael Smith <[email protected]>
Date:   Sat Sep 28 23:13:01 2024 -0400

    [Build/CI] Set FETCHCONTENT_BASE_DIR to one location for better caching (#8930)

commit 26a68d5d7e7dd47c7d8538a326493c8a171f5016
Author: Cyrus Leung <[email protected]>
Date:   Sun Sep 29 10:50:51 2024 +0800

    [CI/Build] Add test decorator for minimum GPU memory (#8925)

commit d081da0064b5cda9e344f0fd519d67523a437a39
Author: ElizaWszola <[email protected]>
Date:   Sun Sep 29 03:19:40 2024 +0200

    [Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741)

    Co-authored-by: Tyler Michael Smith <[email protected]>

commit 5bf8789b2a28df1305f92b9999fe60264f839caa
Author: sroy745 <[email protected]>
Date:   Sat Sep 28 18:17:45 2024 -0700

    [Bugfix] Block manager v2 with preemption and lookahead slots (#8824)

commit d1537039ce7e6018db510d0c0d9b0c0fccb62b63
Author: Russell Bryant <[email protected]>
Date:   Sat Sep 28 21:17:07 2024 -0400

    [Core] Improve choice of Python multiprocessing method (#8823)

    Signed-off-by: Russell Bryant <[email protected]>
    Co-authored-by: youkaichao <[email protected]>

commit cc276443b5ac0732b00a88472f4bc4330aa14606
Author: youkaichao <[email protected]>
Date:   Sat Sep 28 17:48:41 2024 -0700

    [doc] organize installation doc and expose per-commit docker (#8931)

commit e585b583a92903c9a5cc8055a444a208f4387891
Author: Chen Zhang <[email protected]>
Date:   Sat Sep 28 11:51:22 2024 -0700

    [Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (#8891)

commit 090e945e36cfe849b484db5414f64df96e97d678
Author: Edouard B. <[email protected]>
Date:   Sat Sep 28 20:30:21 2024 +0200

    [Frontend] Make beam search emulator temperature modifiable (#8928)

    Co-authored-by: Eduard Balzin <[email protected]>

commit e1a3f5e831a467b2867a66e0e56ac0f70ed44394
Author: Cyrus Leung <[email protected]>
Date:   Sun Sep 29 00:54:35 2024 +0800

    [CI/Build] Update models tests & examples (#8874)

    Co-authored-by: Roger Wang <[email protected]>

commit 19d02ff93812fb6a28f0f1a0a0f9233e9388d616
Author: Varun Sundar Rabindranath <[email protected]>
Date:   Sat Sep 28 11:52:46 2024 -0400

    [Bugfix] Fix PP for Multi-Step (#8887)

commit 39d3f8d94fd2691b70ee809e7565402f8a061c6b
Author: tastelikefeet <[email protected]>
Date:   Sat Sep 28 23:24:12 2024 +0800

    [Bugfix] Fix code for downloading models from modelscope (#8443)

commit b0298aa8cc4a54bde659e57271778630785abc9b
Author: Cyrus Leung <[email protected]>
Date:   Sat Sep 28 16:11:25 2024 +0800

    [Misc] Remove vLLM patch of `BaichuanTokenizer` (#8921)

commit 260024a3749fb6856625dfee28560a98a92dd339
Author: Tyler Titsworth <[email protected]>
Date:   Fri Sep 27 23:45:50 2024 -0700

    [Bugfix][Intel] Fix XPU Dockerfile Build (#7824)

    Signed-off-by: tylertitsworth <[email protected]>
    Co-authored-by: youkaichao <[email protected]>

commit d86f6b2afb006ea4b4b14a49a58f64bf3b952de6
Author: youkaichao <[email protected]>
Date:   Fri Sep 27 22:10:44 2024 -0700

    [misc] fix wheel name (#8919)

commit bd429f2b75f3622fabaf9c9470ca2e921f6f56ca
Author: Sebastian Schoennenbeck <[email protected]>
Date:   Sat Sep 28 00:07:10 2024 +0200

    [Core] Priority-based scheduling in async engine (#8850)

commit 18e60d7d1394541b48bf48b0a57a546a93607ac2
Author: youkaichao <[email protected]>
Date:   Fri Sep 27 14:27:56 2024 -0700

    [misc][distributed] add VLLM_SKIP_P2P_CHECK flag (#8911)

commit c2ec430ab5713d0626c1a7809718ef6c4eebf389
Author: Varun Sundar Rabindranath <[email protected]>
Date:   Fri Sep 27 16:32:07 2024 -0400

    [Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378)

    Co-authored-by: Varun Sundar Rabindranath <[email protected]>

commit c5d55356f9d2b2075ac53cf20453358c1e2b7bde
Author: Lucas Wilkinson <[email protected]>
Date:   Fri Sep 27 15:12:34 2024 -0400

    [Bugfix] fix for deepseek w4a16 (#8906)

    Co-authored-by: mgoin <[email protected]>

commit 172d1cd27634e9e7adc9cb9feac73552cfae1b24
Author: Luka Govedič <[email protected]>
Date:   Fri Sep 27 14:25:10 2024 -0400

    [Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271)

commit a9b15c606fea67a072416ea0ea115261a2756058
Author: youkaichao <[email protected]>
Date:   Fri Sep 27 08:11:32 2024 -0700

    [torch.compile] use empty tensor instead of None for profiling (#8875)

commit 8df2dc3c8812c0abb97ce3e2913411d88524e59f
Author: Brittany <[email protected]>
Date:   Fri Sep 27 01:16:55 2024 -0700

    [TPU] Update pallas.py to support trillium (#8871)

commit 6d792d2f31b2cfb335d1a4a7c45fe4ce143c203a
Author: Isotr0py <[email protected]>
Date:   Fri Sep 27 16:15:58 2024 +0800

    [Bugfix][VLM] Fix Fuyu batching inference with `max_num_seqs>1` (#8892)

commit 0e088750af2e8035c07d356b56c03393cfb56004
Author: Peter Pan <[email protected]>
Date:   Fri Sep 27 16:13:25 2024 +0800

    [MISC] Fix invalid escape sequence '\' (#8830)

    Signed-off-by: Peter Pan <[email protected]>

commit dc4e3df5c23282b2ebaead95f179c25c9d7ec4d8
Author: youkaichao <[email protected]>
Date:   Fri Sep 27 00:26:38 2024 -0700

    [misc] fix collect env (#8894)

commit 3b00b9c26c91e9f9ada12975b613555698054e39
Author: Cyrus Leung <[email protected]>
Date:   Fri Sep 27 11:35:15 2024 +0800

    [Core] rename`PromptInputs` and `inputs` (#8876)

commit 344cd2b6f4c22bf278cff96066001d216ec1fe82
Author: Maximilien de Bayser <[email protected]>
Date:   Thu Sep 26 21:01:42 2024 -0300

    [Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)

    Signed-off-by: Max de Bayser <[email protected]>

commit 1b49148e474d4d18731e159ea0460145ae52e220
Author: Cyrus Leung <[email protected]>
Date:   Fri Sep 27 07:54:09 2024 +0800

    [Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (#8764)

commit 4b377d6febed7ddd964f1b96079d7e78c231325e
Author: Nick Hill <[email protected]>
Date:   Fri Sep 27 00:46:43 2024 +0100

    [BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)

commit 71d21c73abfb9b12ea402ce6b11c1b8e31eddf4c
Author: Tyler Michael Smith <[email protected]>
Date:   Thu Sep 26 19:23:45 2024 -0400

    [Bugfix] Fixup advance_step.cu warning (#8815)

commit ee2da3e9efb38add804e2023d47e9f42f38bd638
Author: Chirag Jain <[email protected]>
Date:   Fri Sep 27 04:53:17 2024 +0530

    fix validation: Only set tool_choice `auto` if at least one tool is provided (#8568)

commit e2f6f26e8636b8a23e5c0cda533a70c40ade01ec
Author: Tyler Michael Smith <[email protected]>
Date:   Thu Sep 26 19:18:26 2024 -0400

    [Bugfix] Fix print_warning_once's line info (#8867)

commit b28d2104dea6ba80c0f1f6c4596b5703d7ef923d
Author: Michael Goin <[email protected]>
Date:   Thu Sep 26 19:18:14 2024 -0400

    [Misc] Change dummy profiling and BOS fallback warns to log once (#8820)

commit 93d364da3406f5523e5e4772ffbc3c72dac7bbf4
Author: Pernekhan Utemuratov <[email protected]>
Date:   Thu Sep 26 15:47:00 2024 -0700

    [Bugfix] Include encoder prompts len to non-stream api usage response (#8861)

commit d9cfbc891e2e1d62d74c7aae93bde436a29bd574
Author: Kevin H. Luu <[email protected]>
Date:   Thu Sep 26 15:02:16 2024 -0700

    [ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872)

    Signed-off-by: kevin <[email protected]>

commit 70de39f6b46f6b90aecba52358825127a50b3921
Author: youkaichao <[email protected]>
Date:   Thu Sep 26 13:19:04 2024 -0700

    [misc][installation] build from source without compilation (#8818)

commit 68988d4e0d8765901c51f07f9bfbda58f35f6f63
Author: fyuan1316 <[email protected]>
Date:   Fri Sep 27 02:04:39 2024 +0800

    [CI/Build] Fix missing ci dependencies (#8834)

commit 520db4dbc10cfc60be65e85ff4ef3a6aeeeb7836
Author: Michael Goin <[email protected]>
Date:   Thu Sep 26 14:02:52 2024 -0400

    [Docs] Add README to the build docker image (#8825)

commit f70bccac75a0aecc0a5fc934859158a3e1f019a5
Author: Tyler Michael Smith <[email protected]>
Date:   Thu Sep 26 13:07:18 2024 -0400

    [Build/CI] Upgrade to gcc 10 in the base build Docker image (#8814)

commit 4bb98f2190aaf408cb063df5184829fb54ee5f81
Author: Roger Wang <[email protected]>
Date:   Thu Sep 26 07:45:30 2024 -0700

    [Misc] Update config loading for Qwen2-VL and remove Granite (#8837)

commit 7193774b1ff8603ad5bf4598e5efba0d9a39b436
Author: Michael Goin <[email protected]>
Date:   Wed Sep 25 17:46:22 2024 -0400

    [Misc] Support quantization of MllamaForCausalLM (#8822)

commit e2c6e0a8291126c868b669f631837c7781646fdc
Author: Roger Wang <[email protected]>
Date:   Wed Sep 25 13:29:48 2024 -0700

    [Doc] Update doc for Transformers 4.45 (#8817)

commit 770ec6024fc00cd696899f5c6fdc53b7148876e6
Author: Chen Zhang <[email protected]>
Date:   Wed Sep 25 13:29:32 2024 -0700

    [Model] Add support for the multi-modal Llama 3.2 model (#8811)

    Co-authored-by: simon-mo <[email protected]>
    Co-authored-by: Chang Su <[email protected]>
    Co-authored-by: Simon Mo <[email protected]>
    Co-authored-by: Roger Wang <[email protected]>
    Co-authored-by: Roger Wang <[email protected]>

commit 4f1ba0844b83b4e7d0ff1672b7ba502ce8732f95
Author: Simon Mo <[email protected]>
Date:   Wed Sep 25 10:36:26 2024 -0700

    Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810)

commit 873edda6cf8a2902e8b08eea0bf8f8f6d73704a8
Author: Michael Goin <[email protected]>
Date:   Wed Sep 25 12:43:36 2024 -0400

    [Misc] Support FP8 MoE for compressed-tensors (#8588)

commit 64840dfae48621c5c2004eb8f1cb7fba49f9b24e
Author: 科英 <[email protected]>
Date:   Thu Sep 26 00:37:41 2024 +0800

    [Frontend] MQLLMEngine supports profiling. (#8761)

commit 28e1299e60e565a56a2db41396380f74b8d29e57
Author: Cyrus Leung <[email protected]>
Date:   Thu Sep 26 00:36:47 2024 +0800

    rename PromptInputs and inputs with backward compatibility (#8760)

commit 0c4d2ad5e641de145682674066a84ffc632e714e
Author: DefTruth <[email protected]>
Date:   Thu Sep 26 00:35:53 2024 +0800

    [VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614)

commit c6f2485c823b5cd76cca70798e653c6eadb811de
Author: Jee Jee Li <[email protected]>
Date:   Thu Sep 26 00:35:23 2024 +0800

    [[Misc]] Add extra deps for openai server image (#8792)

commit 300da09177477d0a4d2b55790addefd971f52ae0
Author: bnellnm <[email protected]>
Date:   Wed Sep 25 10:35:52 2024 -0400

    [Kernel] Fullgraph and opcheck tests (#8479)

commit 1c046447a6d1ac3c99b9f453796f0d355d673deb
Author: Hongxia Yang <[email protected]>
Date:   Wed Sep 25 10:26:37 2024 -0400

    [CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777)

commit 8fae5ed7f6bfd63b81310fcb24b310d9205c9687
Author: Woo-Yeon Lee <[email protected]>
Date:   Wed Sep 25 16:53:03 2024 +0900

    [Misc] Fix minor typo in scheduler (#8765)

commit 3368c3ab36436af1342a3156971412e9efdb6419
Author: David Newman <[email protected]>
Date:   Wed Sep 25 17:52:26 2024 +1000

    [Bugfix] Ray 2.9.x doesn't expose available_resources_per_node (#8767)

    Signed-off-by: darthhexx <[email protected]>

commit 1ac3de09cd87290f7494ce6337623d6edd3f8667
Author: Adam Tilghman <[email protected]>
Date:   Wed Sep 25 00:49:26 2024 -0700

    [Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer (#8672)

commit 3e073e66f1790f7ce339dad71514983e6e402f30
Author: sohamparikh <[email protected]>
Date:   Wed Sep 25 02:16:30 2024 -0400

    [Bugfix] load fc bias from config for eagle (#8790)

commit c23953675f78bc85045d66fa98aea7d0581c2167
Author: Isotr0py <[email protected]>
Date:   Wed Sep 25 14:16:11 2024 +0800

    [Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770)

commit e3dd0692fa2c803cd6f59a88d2fdf8bca26d8d96
Author: zifeitong <[email protected]>
Date:   Tue Sep 24 22:53:43 2024 -0700

    [BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv (#8250)

commit fc3afc20df410dd523f94967b98836084f561ab7
Author: sroy745 <[email protected]>
Date:   Tue Sep 24 21:26:36 2024 -0700

    Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752)

commit b4522474a32b6e0bf5573a9b6a6830cb787dfb63
Author: sasha0552 <[email protected]>
Date:   Wed Sep 25 04:26:33 2024 +0000

    [Bugfix][Kernel] Implement acquire/release polyfill for Pascal (#8776)

commit ee777d9c30418ffa9d98f98dd27c0ddea346c49c
Author: sroy745 <[email protected]>
Date:   Tue Sep 24 21:26:18 2024 -0700

    Fix test_schedule_swapped_simple in test_scheduler.py (#8780)

commit 6e0c9d6bd07464b311eb098e2dac8196eed16721
Author: Joe Runde <[email protected]>
Date:   Tue Sep 24 21:37:38 2024 -0600

    [Bugfix] Use heartbeats instead of health checks (#8583)

commit 6da1ab6b4134d76391a0c31a048e5d04b6283769
Author: Archit Patke <[email protected]>
Date:   Tue Sep 24 21:50:50 2024 -0500

    [Core] Adding Priority Scheduling (#5958)

commit 01b6f9e1f0530a7cb81486ff34d3d935e4f75d28
Author: Travis Johnson <[email protected]>
Date:   Tue Sep 24 18:29:56 2024 -0600

    [Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047)

    Signed-off-by: Travis Johnson <[email protected]>

commit 13f9f7a3d0373421ee9fd7498e450214e134aa6c
Author: Jee Jee Li <[email protected]>
Date:   Wed Sep 25 08:08:55 2024 +0800

    [[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768)

commit 1e7d5c01f5c35424eede1bbe6f723dd8781120f0
Author: youkaichao <[email protected]>
Date:   Tue Sep 24 15:48:39 2024 -0700

    [misc] soft drop beam search (#8763)

commit 2467b642dd9bde32a334fe5967efd78a53aa49da
Author: Daniele <[email protected]>
Date:   Tue Sep 24 21:38:12 2024 +0200

    [CI/Build] fix setuptools-scm usage (#8771)

commit 72fc97a0f100b92f1ff6c6a16e27d12f1c7569aa
Author: Lucas Wilkinson <[email protected]>
Date:   Tue Sep 24 14:33:21 2024 -0400

    [Bugfix] Fix torch dynamo fixes caused by `replace_parameters` (#8748)

commit 2529d09b5a4a124a316b6976e7d782f54e0bddde
Author: Andy <[email protected]>
Date:   Tue Sep 24 12:44:11 2024 -0400

    [Frontend] Batch inference for llm.chat() API  (#8648)

    Co-authored-by: Cyrus Leung <[email protected]>
    Co-authored-by: Cyrus Leung <[email protected]>
    Co-authored-by: Roger Wang <[email protected]>
    Co-authored-by: Roger Wang <[email protected]>

commit a928ded99519f803d4cf6389df6acc707239a5cc
Author: ElizaWszola <[email protected]>
Date:   Tue Sep 24 18:31:42 2024 +0200

    [Kernel] Split Marlin MoE kernels into multiple files (#8661)

    Co-authored-by: mgoin <[email protected]>

commit cc4325b66ac49e403ed9e1a8c38156a5324e1174
Author: Hanzhi Zhou <[email protected]>
Date:   Tue Sep 24 01:08:14 2024 -0700

    [Bugfix] Fix potentially unsafe custom allreduce synchronization (#8558)

commit 8ff7ced996d5dc8b682913471f36c9fefb0e843f
Author: Alex Brooks <[email protected]>
Date:   Tue Sep 24 01:36:46 2024 -0600

    [Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658)

    Signed-off-by: Alex-Brooks <[email protected]>
    Co-authored-by: Cyrus Leung <[email protected]>
    Co-authored-by: DarkLight1337 <[email protected]>

commit 3f06bae9079ee495a34cfadcd9c1ef2a23636084
Author: Peter Salas <[email protected]>
Date:   Tue Sep 24 00:14:15 2024 -0700

    [Core][Model] Support loading weights by ID within models (#7931)

commit b8747e8a7c318ab774862f94ccbdbba5b7d9dd4a
Author: Cody Yu <[email protected]>
Date:   Mon Sep 23 23:10:03 2024 -0700

    [MISC] Skip dumping inputs when unpicklable (#8744)

commit 3185fb0ccae73816018d0936c03171b7cf1ba2f8
Author: Simon Mo <[email protected]>
Date:   Mon Sep 23 22:45:20 2024 -0700

    Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" (#8750)

commit 0250dd68c5df12ead29d2ec7d922855c9a257b06
Author: youkaichao <[email protected]>
Date:   Mon Sep 23 22:08:12 2024 -0700

    re-implement beam search on top of vllm core (#8726)

    Co-authored-by: Brendan Wong <[email protected]>

commit 88577ac92808cfd9468e4b54b757d5fcbe9aa486
Author: sroy745 <[email protected]>
Date:   Mon Sep 23 21:43:13 2024 -0700

    Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728)

commit 530821d00cb2beeb8dc62f74f0e4e0003868dc93
Author: Hongxia Yang <[email protected]>
Date:   Mon Sep 23 21:52:39 2024 -0400

    [Hardware][AMD] ROCm6.2 upgrade (#8674)

commit 1a2aef3e59f5429299618bd3b242833cb377f554
Author: Alexander Matveev <[email protected]>
Date:   Mon Sep 23 18:38:04 2024 -0400

    Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335)

commit 5f7bb584272ee15147a411b887e7ababd6b9b9d0
Author: jiqing-feng <[email protected]>
Date:   Tue Sep 24 03:32:27 2024 +0800

    Fix typical acceptance sampler with correct recovered token ids (#8562)

commit b05f5c9238c3e0c3a98080b4ffc90acfa33f9e1f
Author: Russell Bryant <[email protected]>
Date:   Mon Sep 23 15:15:41 2024 -0400

    [Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575)

    Signed-off-by: Russell Bryant <[email protected]>

commit 9b0e3ec970f6a19427be358848a2ed663fd735e1
Author: Jee Jee Li <[email protected]>
Date:   Tue Sep 24 02:57:42 2024 +0800

    [Kernel][LoRA]  Add assertion for punica sgmv kernels (#7585)

commit 86e9c8df29a954a7a2fc46e9985fecc2a2e15ae8
Author: Lucas Wilkinson <[email protected]>
Date:   Mon Sep 23 13:46:26 2024 -0400

    [Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701)

    Co-authored-by: mgoin <[email protected]>
    Co-authored-by: Divakar Verma <[email protected]>
    Co-authored-by: Tyler Michael Smith <[email protected]>

commit ee5f34b1c2c71b2d56054a5ca23fe1c50c1458bb
Author: Daniele <[email protected]>
Date:   Mon Sep 23 18:44:26 2024 +0200

    [CI/Build] use setuptools-scm to set __version__ (#4738)

    Co-authored-by: youkaichao <[email protected]>

commit f2bd246c17ba67d7749a2560a30711f74cd19177
Author: Jani Monoses <[email protected]>
Date:   Mon Sep 23 17:43:09 2024 +0300

    [VLM] Fix paligemma, fuyu and persimmon with transformers 4.45 : use config.text_config.vocab_size (#8707)

commit a79e5229843e2800956956d0668b1b4858dbb61e
Author: Yanyi Liu <[email protected]>
Date:   Mon Sep 23 21:46:59 2024 +0800

    [Model] Support pp for qwen2-vl (#8696)

commit 3e83c12b5caa466bf533b144a9ec7944a9ce9d49
Author: Li, Jiang <[email protected]>
Date:   Mon Sep 23 21:15:16 2024 +0800

    [Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733)

commit e551ca1555b64ba1ecb2310ea658f3e25c62571d
Author: Isotr0py <[email protected]>
Date:   Mon Sep 23 20:12:20 2024 +0800

    [Hardware][CPU] Refactor CPU model runner (#8729)

commit 9b8c8ba1198cbcd311d28b7647f0f8d5dcdc9212
Author: Alex Brooks <[email protected]>
Date:   Mon Sep 23 01:44:48 2024 -0600

    [Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657)

    Signed-off-by: Alex-Brooks <[email protected]>

commit d23679eb9960ad2a876b88ebd0028dbe55c3172a
Author: Yan Ma <[email protected]>
Date:   Mon Sep 23 13:54:18 2024 +0800

    [Bugfix] fix docker build for xpu (#8652)

commit 57a0702e63d9dc477ab7a82e686a30d14fb6c69d
Author: Luka Govedič <[email protected]>
Date:   Sun Sep 22 23:40:46 2024 -0400

    [Bugfix] Fix CPU CMake build (#8723)

    Co-authored-by: Yuan <[email protected]>

commit 3dda7c22502033854e963fef3826c1f64627e33b
Author: Tyler Michael Smith <[email protected]>
Date:   Sun Sep 22 22:24:59 2024 -0400

    [Bugfix] Avoid some bogus messages RE CUTLASS's revision when building (#8702)

commit 92ba7e7477619ec81464ccb64a17226f3d5047bb
Author: youkaichao <[email protected]>
Date:   Sun Sep 22 15:41:59 2024 -0700

    [misc] upgrade mistral-common (#8715)

commit d4a2ac830291305f202a85e157bff3a07b58e616
Author: youkaichao <[email protected]>
Date:   Sun Sep 22 12:47:54 2024 -0700

    [build] enable existing pytorch (for GH200, aarch64, nightly) (#8713)

commit c6bd70d7728b50f358cb5cb6e66e02b75aeb3d20
Author: Lily Liu <[email protected]>
Date:   Sun Sep 22 12:34:14 2024 -0700

    [SpecDec][Misc] Cleanup, remove bonus token logic. (#8701)

commit 5b59532760c82a9d91f65a3e227524da2af7d4ef
Author: litianjian <[email protected]>
Date:   Mon Sep 23 01:51:44 2024 +0800

    [Model][VLM] Add LLaVA-Onevision model support (#8486)

    Co-authored-by: litianjian <[email protected]>
    Co-authored-by: Cyrus Leung <[email protected]>
    Co-authored-by: Roger Wang <[email protected]>
    Co-authored-by: DarkLight1337 <[email protected]>

commit ca2b628b3c25b014b9951731c0331b75262a59e0
Author: Huazhong Ji <[email protected]>
Date:   Mon Sep 23 01:44:09 2024 +0800

    [MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703)

commit 8ca5051b9afb6f8d2b3ae1b71d45d84e5d1c6f57
Author: Alex Brooks <[email protected]>
Date:   Sun Sep 22 06:56:20 2024 -0600

    [Misc] Use NamedTuple in Multi-image example (#8705)

    Signed-off-by: Alex-Brooks <[email protected]>

commit 06ed2815e2be50e527839c7ab09ce2639b7910b6
Author: Cyrus Leung <[email protected]>
Date:   Sun Sep 22 20:24:21 2024 +0800

    [Model] Refactor BLIP/BLIP-2 to support composite model loading (#8407)

commit 0e40ac9b7b5d953dfe38933bc7d2fb0a6c8da53c
Author: youkaichao <[email protected]>
Date:   Sat Sep 21 23:24:58 2024 -0700

    [ci][build] fix vllm-flash-attn (#8699)

commit 13d88d4137f97b8cf3c79f39d7df5e4c8348603a
Author: Isotr0py <[email protected]>
Date:   Sun Sep 22 12:33:27 2024 +0800

    [Bugfix] Refactor composite weight loading logic (#8656)

commit d66ac62854e04c8fda83506dc93ef7971ebf593a
Author: Tyler Michael Smith <[email protected]>
Date:   Sat Sep 21 19:45:02 2024 -0400

    [Kernel][Bugfix] Delete some more useless code in marlin_moe_ops.cu (#8643)

commit 9dc7c6c7f332ac6c08311c7a946c6945e0782701
Author: Divakar Verma <[email protected]>
Date:   Sat Sep 21 16:09:39 2024 -0500

    [dbrx] refactor dbrx experts to extend FusedMoe class (#8518)

commit ec4aaad8124baadc7954e30c612ca9444b22d7e7
Author: rasmith <[email protected]>
Date:   Sat Sep 21 04:20:54 2024 -0500

    [Kernel][Triton][AMD] Remove tl.atomic_add from awq_gemm_kernel, 2-5x speedup MI300, minor improvement for MI250 (#8646)

commit 4dfdf4319676c3dca72cdfba20470ac76d0cadf4
Author: Andy Dai <[email protected]>
Date:   Sat Sep 21 00:24:12 2024 -0700

    [Doc] Fix typo in AMD installation guide (#8689)

commit 5e85f4f82a5b6eaad6869198d6ac76a0c12cf6d0
Author: Cyrus Leung <[email protected]>
Date:   Sat Sep 21 14:28:56 2024 +0800

    [VLM] Use `SequenceData.from_token_counts` to create dummy data (#8687)

commit 71c60491f287d8a23bed1743513b4b3e7927c69e
Author: Luka Govedič <[email protected]>
Date:   Sat Sep 21 02:27:10 2024 -0400

    [Kernel] Build flash-attn from source (#8245)

commit 0faab90eb006c677add65cd4c2d0f740a63e064d
Author: youkaichao <[email protected]>
Date:   Fri Sep 20 19:55:33 2024 -0700

    [beam search] add output for manually checking the correctness (#8684)

commit 0455c46ed434d70f0a6219204e89ee04f1d01336
Author: Cyrus Leung <[email protected]>
Date:   Sat Sep 21 10:30:39 2024 +0800

    [Core] Factor out common code in `SequenceData` and `Sequence` (#8675)

commit d4bf085ad064ba68a77862e2022f37c33a66e94a
Author: Kunshang Ji <[email protected]>
Date:   Sat Sep 21 10:03:55 2024 +0800

    [MISC] add support custom_op check (#8557)

    Co-authored-by: youkaichao <[email protected]>

commit 0057894ef7f8db0d51385aa7254219d7fbd6c784
Author: Cyrus Leung <[email protected]>
Date:   Sat Sep 21 10:00:54 2024 +0800

    [Core] Rename `PromptInputs` and `inputs`(#8673)

commit 0f961b3ce9ac3d3fd13e201c4358884bc094905e
Author: zyddnys <[email protected]>
Date:   Fri Sep 20 18:48:32 2024 -0400

    [Bugfix] Fix incorrect llava next feature size calculation (#8496)

commit 7f9c8902e3d50a9d715b38e0531280a58d2bbe14
Author: omrishiv <[email protected]>
Date:   Fri Sep 20 15:19:44 2024 -0700

    [Hardware][AWS] update neuron to 2.20 (#8676)

    Signed-off-by: omrishiv <[email protected]>

commit 7c8566aa4ff16b79a576436fbb50f03643febf07
Author: omrishiv <[email protected]>
Date:   Fri Sep 20 15:04:37 2024 -0700

    [Doc] neuron documentation update (#8671)

    Signed-off-by: omrishiv <[email protected]>

commit b4e4eda92e1d3a013fc4007db64b69d8604264ff
Author: Patrick von Platen <[email protected]>
Date:   Fri Sep 20 23:33:03 2024 +0200

    [Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640)

commit 2874bac618052a079efd837fc82cf3f3519079c7
Author: Pastel! <[email protected]>
Date:   Sat Sep 21 05:00:45 2024 +0800

    [Bugfix] Config got an unexpected keyword argument 'engine' (#8556)

commit 035fa895ecedea87810889aabbe50ba8a2ad7d5d
Author: Cyrus Leung <[email protected]>
Date:   Sat Sep 21 04:52:19 2024 +0800

    [Misc] Show AMD GPU topology in `collect_env.py` (#8649)

commit b28298f2f4bd4ec6d1020c10b923a9eb7993dc89
Author: saumya-saran <[email protected]>
Date:   Fri Sep 20 12:46:02 2024 -0700

    [Bugfix] Validate SamplingParam n is an int (#8548)

commit 2940afa04e39fa9f248c565687d9a2acf7401355
Author: Alexey Kondratiev(AMD) <[email protected]>
Date:   Fri Sep 20 13:27:44 2024 -0400

    [CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670)

commit 3b63de9353ce51ba6c1c167ae8d4b87b8bcf9c9e
Author: Niklas Muennighoff <[email protected]>
Date:   Fri Sep 20 09:31:41 2024 -0700

    [Model] Add OLMoE (#7922)

commit 260d40b5ea48df9421325388abcc8d907a560fc5
Author: Jiaxin Shan <[email protected]>
Date:   Thu Sep 19 23:20:56 2024 -0700

    [Core] Support Lora lineage and base model metadata management (#6315)

commit 9e5ec35b1f8239453b1aaab28e7a02307db4ab1f
Author: William Lin <[email protected]>
Date:   Thu Sep 19 20:49:54 2024 -0700

    [bugfix] [AMD] add multi-step advance_step to ROCmFlashAttentionMetadata (#8474)

commit 18ae428a0d8792d160d811a9cd5bb004d68ea8bd
Author: Amit Garg <[email protected]>
Date:   Thu Sep 19 17:54:02 2024 -0700

    [Bugfix] Fix Phi3.5 mini and MoE LoRA inference (#8571)

commit de6f90a13d7b98c4958ba107ec16cb6f95efb10f
Author: bnellnm <[email protected]>
Date:   Thu Sep 19 18:36:30 2024 -0400

    [Misc] guard against change in cuda library name (#8609)

commit 6cb748e190a94e20987314025614b8bd806602f2
Author: Alexey Kondratiev(AMD) <[email protected]>
Date:   Thu Sep 19 16:06:32 2024 -0400

    [CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551)

commit 9e99407e3ccbb290bae77af230da38c70a52a055
Author: Simon Mo <[email protected]>
Date:   Thu Sep 19 12:16:28 2024 -0700

    Create SECURITY.md (#8642)

commit ea4647b7d77c4738c5ed2ab77a2c9f5ad335f6fb
Author: Isotr0py <[email protected]>
Date:   Fri Sep 20 03:15:55 2024 +0800

    [Doc] Add documentation for GGUF quantization (#8618)

commit e42c634acbd1b86b5becca51e8b8108a32a438d5
Author: 盏一 <[email protected]>
Date:   Fri Sep 20 02:28:25 2024 +0800

    [Core] simplify logits resort in _apply_top_k_top_p (#8619)

commit 9cc373f39036af789fb1ffc1e06b23766996d3f4
Author: Charlie Fu <[email protected]>
Date:   Thu Sep 19 12:37:57 2024 -0500

    [Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention (#8577)

commit 76515f303b44cb3ffc6de63c49148d5081a77119
Author: Nick Hill <[email protected]>
Date:   Thu Sep 19 17:51:06 2024 +0100

    [Frontend] Use MQLLMEngine for embeddings models too (#8584)

commit 855c8ae2c9a4085b1ebd66d9a978fb23f47f822c
Author: Kunshang Ji <[email protected]>
Date:   Thu Sep 19 13:33:20 2024 +0800

    [MISC] remove engine_use_ray in benchmark_throughput.py (#8615)

commit c52ec5f03471008fa1312d82fb17d40b95a3ca5d
Author: Kuntai Du <[email protected]>
Date:   Wed Sep 18 22:24:24 2024 -0700

    [Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616)

commit 02c9afa2d04a85269faa2760e9af30527a61d7f6
Author: Roger Wang <[email protected]>
Date:   Wed Sep 18 21:14:28 2024 -0700

    Revert "[Misc][Bugfix] Disable guided decoding for mistral tokenizer" (#8593)

commit 3118f63385c0d767fba8b6d2039fc35440678da9
Author: sroy745 <[email protected]>
Date:   Wed Sep 18 19:24:15 2024 -0700

    [Bugfix] [Encoder-Decoder] Bugfix for encoder specific metadata construction during decode of encoder-decoder models.  (#8545)

commit 4c34ce8916da0e4967eadefcb7f91eb58dd7ac61
Author: Tyler Michael Smith <[email protected]>
Date:   Wed Sep 18 21:42:49 2024 -0400

    [Kernel] Remove marlin moe templating on thread_m_blocks (#8573)

    Co-authored-by: [email protected]

commit 0d47bf3bf40edfe9fcfd7e5cd909388497535bc5
Author: Joe Runde <[email protected]>
Date:   Wed Sep 18 16:10:01 2024 -0600

    [Bugfix] add `dead_error` property to engine client (#8574)

    Signed-off-by: Joe Runde <[email protected]>

commit d9cd78eb718c233ebc5b84377fc2226af7ef0fa2
Author: Nick Hill <[email protected]>
Date:   Wed Sep 18 21:17:55 2024 +0100

    [BugFix] Nonzero exit code if MQLLMEngine startup fails (#8572)

commit db9120cdedba5033037432775417df0b6117495d
Author: Tyler Michael Smith <[email protected]>
Date:   Wed Sep 18 16:05:06 2024 -0400

    [Kernel] Change interface to Mamba selective_state_update for continuous batching (#8039)

commit b3195bc9e4d57b6107af2222afea26c51475e262
Author: Gregory Shtrasberg <[email protected]>
Date:   Wed Sep 18 13:41:08 2024 -0400

    [AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (#8380)

    Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]>
    Co-authored-by: Michael Goin <[email protected]>

commit e18749ff09c277f7cdab278895ebdd9b1041b6e8
Author: Geun, Lim <[email protected]>
Date:   Thu Sep 19 02:04:00 2024 +0900

    [Model] Support Solar Model (#8386)

    Co-authored-by: Michael Goin <[email protected]>

commit d65798f78c76f03f068fc2f69a68cff430ee6b6f
Author: Russell Bryant <[email protected]>
Date:   Wed Sep 18 12:10:27 2024 -0400

    [Core] zmq: bind only to 127.0.0.1 for local-only usage (#8543)

    Signed-off-by: Russell Bryant <[email protected]>

commit a8c1d161a7d87dbc6c7cccfce303dcbe2e4ed6be
Author: afeldman-nm <[email protected]>
Date:   Wed Sep 18 11:38:43 2024 -0400

    [Core] *Prompt* logprobs support in Multi-step (#8199)

commit 7c7714d856eee6fa94aade729b67f00584f72a4c
Author: Alexander Matveev <[email protected]>
Date:   Wed Sep 18 09:56:58 2024 -0400

    [Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157)

    Co-authored-by: Nick Hill <[email protected]>
    Co-authored-by: [email protected] <[email protected]>
    Co-authored-by: Robert Shaw <[email protected]>
    Co-authored-by: Simon Mo <[email protected]>

commit 9d104b5beb7bbb51c64b680e007f39169489ea86
Author: Aaron Pham <[email protected]>
Date:   Wed Sep 18 07:00:56 2024 -0400

    [CI/Build] Update Ruff version (#8469)

    Signed-off-by: Aaron Pham <[email protected]>
    Co-authored-by: Cyrus Leung <[email protected]>

commit 6ffa3f314c59e42238f1c5f923ff2839e0af9698
Author: Cyrus Leung <[email protected]>
Date:   Wed Sep 18 18:38:11 2024 +0800

    [CI/Build] Avoid CUDA initialization (#8534)

commit e351572900f7d87e14fe203ea3a49c1c7ddae0d6
Author: Jiaxin Shan <[email protected]>
Date:   Wed Sep 18 02:51:59 2024 -0700

    [Misc] Add argument to disable FastAPI docs (#8554)

commit 95965d31b6ac2c9557816a6ffabe4a3117a5ccb2
Author: Daniele <[email protected]>
Date:   Wed Sep 18 04:49:53 2024 +0200

    [CI/Build] fix Dockerfile.cpu on podman (#8540)

commit 8110e44529f431d54b02060528601c0d3e3f7d02
Author: Tyler Michael Smith <[email protected]>
Date:   Tue Sep 17 19:44:27 2024 -0400

    [Kernel] Change interface to Mamba causal_conv1d_update for continuous batching (#8012)

commit 09deb4721f830602d0417604c7e18b7e384f9594
Author: Alexey Kondratiev(AMD) <[email protected]>
Date:   Tue Sep 17 19:40:29 2024 -0400

    [CI/Build] Excluding kernels/test_gguf.py from ROCm (#8520)

commit fa0c114fad4e2b807503e78d5110558cfee92ba4
Author: youkaichao <[email protected]>
Date:   Tue Sep 17 16:24:06 2024 -0700

    [doc] improve installation doc (#8550)

    Co-authored-by: Andy Dai <[email protected]>

commit 98f9713399bd602ff954a83e6e6abcb4cf8b8864
Author: Joe Runde <[email protected]>
Date:   Tue Sep 17 17:17:08 2024 -0600

    [Bugfix] Fix TP > 1 for new granite (#8544)

    Signed-off-by: Joe Runde <[email protected]>

commit 56c3de018c35580fd088655c2f9951cd4da5335d
Author: Nick Hill <[email protected]>
Date:   Tue Sep 17 20:24:29 2024 +0100

    [Misc] Don't dump contents of kvcache tensors on errors (#8527)

commit a54ed8024953dc6b59906072a7a89cd4791ec4f0
Author: Patrick von Platen <[email protected]>
Date:   Tue Sep 17 19:50:37 2024 +0200

    [Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)

    Co-authored-by: Cyrus Leung <[email protected]>

commit 9855b99502c7537db5ef018129e603650800ac46
Author: chenqianfzh <[email protected]>
Date:   Tue Sep 17 08:09:12 2024 -0700

    [Feature][kernel] tensor parallelism with bitsandbytes quantization (#8434)

commit 1009e93c5d634c724eeff3d4e453369337f502d4
Author: sroy745 <[email protected]>
Date:   Tue Sep 17 07:35:01 2024 -0700

    [Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631)

commit 1b6de8352b878348974b3f117cbb68ed18daa609
Author: Isotr0py <[email protected]>
Date:   Tue Sep 17 15:34:27 2024 +0800

    [Benchmark] Support sample from HF datasets and image input for benchmark_serving (#8495)

commit cbdb25225914a04d94e8830f4e739faca8ff3b9d
Author: Rui Qiao <[email protected]>
Date:   Tue Sep 17 00:06:26 2024 -0700

    [Misc] Limit to ray[adag] 2.35 to avoid backward incompatible change…
maxdebayser added a commit to maxdebayser/vllm that referenced this pull request Oct 11, 2024
commit 579337372e694d900a4e01899b81fe0afcf82c10
Merge: e7044a61c 30b0f2156
Author: Max de Bayser <[email protected]>
Date:   Tue Oct 8 10:34:17 2024 -0300

    Merge branch 'bert' into roberta_embedding

    Signed-off-by: Max de Bayser <[email protected]>

commit 30b0f2156bccbfb11def0d7902acb8b56d24a98a
Merge: 80c18855f 8c746226c
Author: Max de Bayser <[email protected]>
Date:   Tue Oct 8 10:33:05 2024 -0300

    Merge branch 'upstream_main' into bert

commit 8c746226c956f7c8a4672689fee91c7d22befed6
Author: Brendan Wong <[email protected]>
Date:   Mon Oct 7 22:51:43 2024 -0700

    [Frontend] API support for beam search for MQLLMEngine (#9117)

commit 80c18855fcff195175b7046923c4b0c3815f141a
Author: laishzh <[email protected]>
Date:   Mon Oct 7 12:04:34 2024 +0800

    feat: update with origin/main

commit 6440795f407c652ecdb045d1b141913afdb8b5e1
Merge: 04b0bc6ff 487678d04
Author: laishzh <[email protected]>
Date:   Mon Oct 7 11:28:19 2024 +0800

    Merge branch 'origin/main'

commit 04b0bc6ff534495a9627f5548767f5bfb95268e8
Author: laishzh <[email protected]>
Date:   Mon Oct 7 02:54:55 2024 +0800

    feat: revert embedding_block_manager

commit 352d8b2641d11ffa0e153462fd89b54525998843
Merge: 3fbfdf429 107d9c207
Author: laishzh <[email protected]>
Date:   Mon Oct 7 00:45:52 2024 +0800

    Merge remote-tracking branch 'maxdebayser/bert'

commit e7044a61cebf6b9229a50a8396fdef104e799a9e
Merge: a14b4e39d 107d9c207
Author: Max de Bayser <[email protected]>
Date:   Wed Oct 2 18:04:38 2024 -0300

    Merge branch 'bert' into roberta_embedding

commit 107d9c207808c6f070ef086e3ea748cecbc9d809
Merge: 57bdd6049 7f60520de
Author: Max de Bayser <[email protected]>
Date:   Wed Oct 2 17:52:52 2024 -0300

    Merge branch 'upstream_main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit a14b4e39d26eb953c569ebb219aa3cb7203699ec
Merge: 08f1781d6 57bdd6049
Author: Max de Bayser <[email protected]>
Date:   Thu Sep 26 17:25:28 2024 -0300

    Merge branch 'bert' into roberta_embedding

    Signed-off-by: Max de Bayser <[email protected]>

commit 57bdd6049129b43244d3c70ea876e784762e96e9
Merge: 2c8a5b922 7193774b1
Author: Max de Bayser <[email protected]>
Date:   Thu Sep 26 17:15:18 2024 -0300

    Merge branch 'upstream_main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit 3fbfdf42966c7324466e266dc6d4b5c26131aee5
Merge: 2c8a5b922 873edda6c
Author: laishzh <[email protected]>
Date:   Thu Sep 26 23:23:39 2024 +0800

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	vllm/inputs/data.py

commit 08f1781d6bd49653bd62ffdfde4f86d903f0c65a
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 17:04:35 2024 -0300

    add head size 32

    Signed-off-by: Max de Bayser <[email protected]>

commit 2c8a5b9224ce9e26b2e43bb2312be91e2c74de9c
Merge: 15be7fa8b f2bd246c1
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:48:10 2024 -0300

    Merge branch 'main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit 30c875e9e61f1e9e4d556014f49362adff76269a
Merge: afd997ba9 464a90f4e
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:59:23 2024 -0300

    Merge branch 'bert' into roberta_embedding

commit 464a90f4e09165ab724de26b35e9d7913c5d6560
Merge: 15be7fa8b f2bd246c1
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:48:10 2024 -0300

    Merge branch 'main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit afd997ba9f6ec2513145c0ca469a15783e0c96e5
Merge: 7d0ecb90c 15be7fa8b
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:14:29 2024 -0300

    Merge branch '5447' into roberta_embedding

commit 15be7fa8bce185f64fafecaabdb8c828e83f4ad8
Author: laishzh <[email protected]>
Date:   Mon Sep 9 23:04:44 2024 +0800

    feat: fix lint

commit 0ea4da1c549bf35c8456c47729da46dd33481cac
Author: laishzh <[email protected]>
Date:   Mon Sep 9 23:01:22 2024 +0800

    feat: fix lint

commit 776dcbdae9d693dbd6546b7784712c06e6ef473c
Merge: 3ff2d3637 4ef41b847
Author: laishzh <[email protected]>
Date:   Mon Sep 9 10:32:46 2024 +0800

    Merge branch 'main' of https://github.com/vllm-project/vllm

    # Conflicts:
    #	vllm/core/embedding_model_block_manager.py

commit 3ff2d36375d9560f87c56860ffff8a774a217cf9
Author: laishzh <[email protected]>
Date:   Mon Sep 9 10:29:01 2024 +0800

    feat: some changes on test_embedding.py

commit e351bfd0febe4bbf8030fcd07f39eef5cce97641
Author: laishzh <[email protected]>
Date:   Sun Sep 8 23:50:18 2024 +0800

    feat: bert embedding implemented, but still have some bugs with mistral,

commit 7d0ecb90c5034d41f0d9b38eede25f50bf941e3d
Author: Max de Bayser <[email protected]>
Date:   Wed Aug 28 16:35:25 2024 -0300

    Add support for Roberta embedding models

    It's almost identical to the Bert models

    Signed-off-by: Max de Bayser <[email protected]>

commit 612cf1a969fa46105c3685b2eb025cde6416747d
Author: laishzh <[email protected]>
Date:   Tue Aug 27 15:19:50 2024 +0800

    feat: modify test_embedding

commit fc1f2b7ceb69f9588799820831145babf29aaa64
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:39:33 2024 +0800

    chore: fix lint

commit d09860763500b85193230588386f0e3d515e231c
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:24:51 2024 +0800

    feat: remove embedding_model_block_manager.py

commit 37f698b4241a42c9634030e372e419b47e2a1e9c
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:16:34 2024 +0800

    feat: move BertEmbeddingModel to the end of file

commit 6f006f5ad698d76599e0b005520e65921042d07b
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:06:21 2024 +0800

    chore: fix lint

commit bfd7ec9e043cf304e6dea024912eb2a18c786bd6
Author: laishzh <[email protected]>
Date:   Mon Aug 19 14:59:06 2024 +0800

    feat: model input

commit 8b107a24a4ef9abb194686066c3bebc6923c6876
Author: laishzh <[email protected]>
Date:   Mon Aug 19 13:41:49 2024 +0800

    feat: fix lint

commit e15d0cce60e3f39f2aaf8c3f62314a6d6b4ea091
Merge: b76da51c0 f710fb526
Author: laishzh <[email protected]>
Date:   Mon Aug 19 12:45:26 2024 +0800

    Merge branch 'main' into main

commit b76da51c0d9ba1b4e39d432b8fb557ed8319034f
Author: laishzh <[email protected]>
Date:   Mon Aug 19 11:35:22 2024 +0800

    feat: enc_dec_runner base

commit b99d783bd852eb4cae228fcd8faf3344cd9a6fed
Author: laishzh <[email protected]>
Date:   Sun Aug 18 00:49:57 2024 +0800

    feat: remove embedding block space manager

commit 7e1196d25054d76d92b3777bc077d3cffd742599
Author: laishzh <[email protected]>
Date:   Sat Aug 17 14:43:32 2024 +0800

    fix: fix hint

commit ce9a599194dbc3a208a6a4a21fdccaaa5c26ece8
Author: laishzh <[email protected]>
Date:   Sat Aug 17 02:18:54 2024 +0800

    feat: bos_token_id

commit 275f49de32136eb9e4298d42aa85a1e2dc56924c
Author: laishzh <[email protected]>
Date:   Sat Aug 17 01:03:55 2024 +0800

    feat: embedding model prompt

commit 0b3f55c66e5eb40808f46ebde3c38213478050c7
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:12:51 2024 +0800

    feat: fix lint

commit 91e23d8ad2b45790590889d6ee437702f5003792
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:04:30 2024 +0800

    feat: fix lint

commit 7657af3f49cdb567bc96b44157c89f18cc4d0a22
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:01:26 2024 +0800

    feat: fix lint

commit f2158848b9abd839c515c568acd592d0416c6682
Author: laishzh <[email protected]>
Date:   Fri Aug 16 11:21:54 2024 +0800

    chore: recover

commit a0ad0df28c9de89bdd66b587502f6af9265065be
Author: laishzh <[email protected]>
Date:   Fri Aug 16 11:15:28 2024 +0800

    chore: recover unchanged files

commit 872e79531b39d1bf12ea81ddcd5bf919dd97265d
Author: laishzh <[email protected]>
Date:   Thu Aug 15 21:40:55 2024 +0800

    feat: embedding model forward

commit 682c455bb0b8c950e1e00b43a6841f433f62db97
Author: laishzh <[email protected]>
Date:   Thu Aug 15 14:36:40 2024 +0800

    feat: recover sequence

commit aca786e4359ef55d0af006199728c8b941558579
Author: laishzh <[email protected]>
Date:   Thu Aug 15 13:44:03 2024 +0800

    feat: default bos_token_id of encoder model

commit 76b47fb1b7920fb50a889f19e1c1421e4385d1ca
Author: laishzh <[email protected]>
Date:   Thu Aug 15 13:18:53 2024 +0800

    chore: recover

commit 37bcba01408d37b192063e2ee2b9ac1c3087393c
Author: laishzh <[email protected]>
Date:   Wed Aug 14 17:47:05 2024 +0800

    feat: full pipeline

commit 63fb7a582cef08ec29a8b30024a01602dc5ee636
Author: laishzh <[email protected]>
Date:   Wed Aug 14 02:39:31 2024 +0800

    WIP: bert embedding

commit 53c5148e9f5024f2eb6a83bbf7af191dc88fe555
Author: laishzh <[email protected]>
Date:   Tue Aug 13 16:11:53 2024 +0800

    (WIP)feat: EmbeddingModelRunner support encoder model

commit 12a9869b5324fa9a4f7090eb8967c81f47f87f75
Merge: 59bf8c44d 97a6be95b
Author: laishzh <[email protected]>
Date:   Tue Aug 13 11:22:44 2024 +0800

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	.buildkite/test-pipeline.yaml
    #	examples/offline_inference_encoder_decoder.py
    #	tests/conftest.py
    #	tests/core/test_scheduler_encoder_decoder.py
    #	tests/kernels/test_encoder_decoder_attn.py
    #	tests/models/test_bart.py
    #	tests/worker/test_encoder_decoder_model_runner.py
    #	vllm/core/scheduler.py
    #	vllm/engine/llm_engine.py
    #	vllm/inputs/__init__.py
    #	vllm/inputs/data.py
    #	vllm/model_executor/models/bart.py
    #	vllm/sequence.py
    #	vllm/utils.py
    #	vllm/worker/enc_dec_model_runner.py
    #	vllm/worker/worker.py

commit 59bf8c44dd79c832a37949d0698bacef6ecc2136
Merge: a40828921 a936faa57
Author: laishzh <[email protected]>
Date:   Thu Jul 25 23:02:34 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit a936faa57000aca5be159de260fae8c8849148b6
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:52:50 2024 -0400

    removed prefix caching from enc/dec modelrunner

commit 4bb7fc442f67dd162a001900e485d02d64fa24ed
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:45:03 2024 -0400

    removed chunked prefill logic/docstring text from enc/dec modelrunner

commit f0abcc27e642dda6371eb1440de519166642a9e7
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:37:45 2024 -0400

    format

commit d1751db42bac1baf50b5fa542c770fbab13ba9ff
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:35:45 2024 -0400

    removed flashinfer references from enc/dec modelrunner

commit 64685acfe52177d1e01362ece71d3faab73e8e45
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:13:44 2024 -0400

    Sequence docstring

commit 035d90dfc21bbc12d12d2368a2d5d5175ead31ca
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:01:31 2024 -0400

    updated RequestOutput docstring

commit 1bb7ad9f2f5e4c84e283c5c0c59006d817440609
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:59:34 2024 -0400

    updated RequestOutput docstring

commit 47c5548936cd7bfe476d31e8248e3208a8a663d1
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:53:23 2024 -0400

    checked out examples/offline_inference.py from main

commit 3327e5be3b07bc35a607a1f4fa1fba2fc4f5904e
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:49:44 2024 -0400

    removed lora & vision & mm code from enc/dec modelrunner

commit 175ea95baf0537209a8aa0e9c94f711f794f0f51
Merge: c2cc010ac 316a41ac1
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:25:53 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c2cc010acc1bb632bb7297da970ff865b22c7f27
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 01:33:04 2024 -0400

    Removed lora from enc/dec model runner

commit fb5a2bcb2baa984b884ba8bdd6293dd06cb8756b
Merge: 393515eb0 9e169a4c6
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:52:21 2024 -0400

    upstream merge

commit 393515eb07a84c3d1604f0c0bc52eb2d8f7c5ae0
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:50:27 2024 -0400

    formatting

commit 47b4eb2a06bf0811f143668fbfe1f8c2caedc827
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:50:08 2024 -0400

    fixed bug caused by upstream refactoring

commit bed9bcd356c3526f5697ddfc2052d5bfca5fa9d2
Merge: 0af58ec10 740374d45
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 21:04:09 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 0af58ec10ac6eb9cab3f78abfa62390ade9ca64c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 05:10:20 2024 -0400

    responses to feedback

commit d82b27346b444778eeba42e015ac716883c37f76
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 05:01:27 2024 -0400

    enc/dec example comments'

commit 4b5b2cf956141e3adbc22a7a2aa2ebbb9bad8979
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:51:48 2024 -0400

    removed unnecessary argument reordering

commit ed4a56b9ca31cdf06033611887114920318ad397
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:46:49 2024 -0400

    formatting

commit 5a270ff49f3ebafecf8fb45e090f08d705aa416a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:46:32 2024 -0400

    refactoring

commit 02114bdcd5a832c3610318a8d0b8cfb26070f3ef
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:31:32 2024 -0400

    _free_seq_group() -> _free_seq_group_cross_attn_blocks()

commit be58d8ab92fd4ddab1f48b246a5233ee3a71bcf0
Merge: c493d4029 ccc4a7325
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:20:18 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c493d402929d023a0924018a928502cb05605a2f
Merge: f36ffb569 5e8ca973e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:34:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f36ffb5695b0694947f4ae9e7417cc1afa85e19c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:33:47 2024 -0400

    example includes prompt zipper

commit 61d2ad2cc7791b6e32c8678b8e88ed99bbab4118
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:28:20 2024 -0400

    fixed bugs in handling non-text formats for individual prompts

commit dd784b5423ba21fc6b8188908df417d128376a1f
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 21:37:19 2024 -0400

    typing fix

commit 0b29fd27f17f2751550262f218e6ef1afbef7087
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 21:35:25 2024 -0400

    enc/dec handles empty str and None decoder prompts correctly

commit aa01d71f90f0c3cda8a7ea419ff4f1fb6dc9d13c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 20:56:51 2024 -0400

    empty-string decoder input is now handled for encoder/decoder

commit 4a6e39e67c2bb4c2d685df9031cbf64956be4255
Merge: 7e7bbd9e1 87525fab9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 20:16:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 7e7bbd9e16900449e350bf8634d584e4b1a5c2f0
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:57:41 2024 -0400

    deleted unnecessary dependency

commit 229847b431469bd17b2d13f3651b322c7b280274
Merge: 059273f3c 1bedf210e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:56:27 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 059273f3ca43947413572a0014c1437a53e33b8a
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:56:07 2024 -0400

    wip

commit b283544d820bfd96ac80845d2ddd7ad057cca6e9
Merge: 48a742d41 b01937f0c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:15:18 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_correctness' into infra_enc_dec_model_runner_reviews

commit 48a742d4155cba0ffc7effb1c9fdad0706493c43
Merge: 427032a08 bb2fc0807
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:15:03 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit b01937f0ce29bc9e417e85cb4dd18ddb47a98e3b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:14:06 2024 -0400

    set up None/empty str tests which are not passing

commit c51a1682be7443ec7d32062491868bd49c631eb8
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 01:47:43 2024 -0400

    fixed bug in how conftest was handling HF encoder/decoder outputs; disabled HF engram repeat checks

commit 427032a085cd48701f7abf64518563929a844d6c
Merge: 14831b09d fea59c771
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 17:14:13 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 14831b09da05f6d8e689568c77f7dfc5c33895ab
Merge: c43a6ed19 b90b6b6ff
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 13:52:34 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner

commit b90b6b6ffb4417ec64b382e9211273bca1eebbb7
Merge: b174c7ab2 739b61a34
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 13:51:35 2024 -0400

    upstream merge

commit a40828921c18faf70f4239d90e599da4311b284e
Merge: 7ace684da c43a6ed19
Author: laishzh <[email protected]>
Date:   Mon Jul 22 19:00:06 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit c43a6ed191e76f81bfd27f25e2ca8bac1fc01bcc
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 04:03:59 2024 -0400

    commented out BART TP=4

commit b174c7ab2da60e24a2ca576eccee671541ae142a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 04:02:56 2024 -0400

    bart is parallelized, modulo an unfortunate hack for QKVParallelLinear in cross-attention

commit 3551b6bf56ab74228c923b698e59a88b06bac81c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:59:22 2024 -0400

    fixed bug where underlying Attention was constructed using full head-count

commit fdf71de8557d588ff3b5767e96df09de4e9278d5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:48:35 2024 -0400

    parallelized enc/dec cross-attention, using a slight hack

commit 9bbed43ab159063a8dff27587dae909b11e1a703
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:20:20 2024 -0400

    parallelized LM head

commit 74abe22287374c9dd801ef059692016ef09777cb
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:01:07 2024 -0400

    encoder attention & decoder self-attention parallelized

commit e5bb9de596bd7f4b5d85ab6d0a2440cae06f982a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:33:02 2024 -0400

    all attention layer output linears are parallelized

commit fb3227f68714ba6ed00e67e8a242db88288cdb8e
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:25:12 2024 -0400

    parallelized BART learned positional embedding

commit 00198a633605b786c5f1fdef007c965d6284b39b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:22:01 2024 -0400

    BART MLPs parallelized

commit abbb42749a628f5d199b62046200a6eb85025ab8
Merge: a33b50171 a16cabb90
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:54:59 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit a16cabb9029d86221a69975935622dd53084a554
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:54:22 2024 -0400

    equalized some generation/sampling config settings between enc/dec HF,vLLM, nonetheless still not perfect match

commit a33b50171b6147ad1ff3db16adef4bb3a7819b33
Merge: 584c01e87 32967c1ca
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:35:22 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit 32967c1ca7d706f1e59cbd604b58588210aeeee3
Merge: c00e0a8b5 89c1c6a19
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:30:53 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c00e0a8b561a8243080ef40b1c1b8f0b8257d026
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 00:28:29 2024 -0400

    CommonMetadataBuilder sets block_tables constructor arg of metadata

commit a22f56c8bbb1dde2bd3a440bb0c037ed73ca17e1
Merge: ffa99b2dd 42de2cefc
Author: Andrew Feldman <[email protected]>
Date:   Sun Jul 21 22:28:38 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ffa99b2dd61cfe21222a98ed2f95d608d6f6a8a2
Merge: 41ccf0c8c 9364f74ee
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 20 16:08:20 2024 -0400

    additional merge

commit 41ccf0c8ce9079a89ace594a3a0f2eb573c2d6c0
Merge: 9fdd04705 a5314e869
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 20 16:06:16 2024 -0400

    wip merge

commit 7ace684da139b43f38a4ebc328e17056ebfbe18a
Merge: fe7786c8a c092ed476
Author: laishzh <[email protected]>
Date:   Fri Jul 19 00:27:56 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit 584c01e875e12d870312ab210dec809325482ae3
Merge: 69f0379d2 9fdd04705
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 16:59:40 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_parallel_bart

commit 9fdd0470597025057a473eb8e20946f71db54daf
Merge: c092ed476 5f0b9933e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 16:59:18 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 69f0379d24323958dd9b332884f7c57a222acfc6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:23:42 2024 -0400

    wip:

commit d7bd617c84880f477a0ce7ae3d1de1215e26748f
Merge: 31e335fd2 c092ed476
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:13:04 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit c092ed47621f9061395ea3e89386c997f856c6b3
Merge: 949ac02c5 2fa4623d9
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:09:14 2024 -0400

    merged in upstream changes; left some formatting issues which I expect to be fixed upstream

commit 31e335fd206985f5b3791b6a3cfaa021d21d3629
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:03:58 2024 -0400

    wip activation parallelization

commit 88c058e8fe5ae00b39f88f57be745d1b819dbca5
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 12:23:31 2024 -0400

    wip parallelizing BART

commit 949ac02c5694069edf3338b2202717dffda276e6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 11:18:01 2024 -0400

    formatting

commit 6c940f886950ba0ae77ccb9002a161cf95b686ad
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 11:00:34 2024 -0400

    modified HF behavior in BART test to be truly greedy

commit f15eacf140810512335a7ac422b09788a1c1964e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 10:55:46 2024 -0400

    wip

commit 180884605ffd911c553c6b2585c2993204e4a629
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:34:42 2024 -0400

    formatting

commit 1f8c52fac27ed8f10b94a3ecb08e15c1118c186a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:34:29 2024 -0400

    tweaks to enc/dec example

commit 9da8fb3ef77b64c0152e3699513053e1ea4e21a4
Merge: 94c904fb5 a9a2e74d2
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:24:19 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 94c904fb5ff01f7e1c93b8d4a5f195ca2bea5bc0
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 08:43:16 2024 -0400

    wip parallel bart but encountering GPU count issue

commit 9f5a02c21e785704114f8c15bb829f4fe4cded55
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 08:27:53 2024 -0400

    RequestOutput & SequenceGroup now include encoder prompt in output, as does encoder/decoder example.

commit 597a07da54fa4c399e42bccbb4a14957d782e37c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:59:42 2024 -0400

    refactor

commit f54f2762f4b4d14165371e3dfc300f1ef3afa9b6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:53:12 2024 -0400

    wip refactoring

commit cac6283f60f1edc55950eaae54e74db0902ebfd8
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:25:58 2024 -0400

    added encoder/decoder example to examples test

commit b277180575d7d9c85708e2622cc6c32afbc0a383
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:17:40 2024 -0400

    formatting

commit 50ad5ffc753d1e7b39dfd55822ac0e405533168d
Merge: ef9462321 e09ce759a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:16:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ef94623218a718a437526917a8c95e933d614ee9
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:16:10 2024 -0400

    added examples utils w/ context manager for backend override; applied to enc/dec example to force XFormers

commit aee5f1615347dcfe2acea9abe16ac61df3404a99
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 06:14:51 2024 -0400

    fixed sequence bug

commit 3656dc6c843cbf41b99ab4b0c88a974d1cedba2e
Merge: 0cc14abc5 5fa6e9876
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 05:23:05 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 0cc14abc5a5569c6ae641c5d3efc0251fd946507
Merge: 1c6e06d0b 10383887e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 02:10:34 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 1c6e06d0be66bf8cbf98cc8429a060b60bb65700
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 02:10:12 2024 -0400

    bugfix

commit 31127faf0c4637c6b80540c9693c7d5f135416d5
Merge: c2ff615de 1d094fd7c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:48:22 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c2ff615deebea4457721a457103d8e405346b1a5
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:44:16 2024 -0400

    format

commit f8dd4a5955ec478720531c47945ddc26e450f743
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:43:52 2024 -0400

    fixed scheduler bug

commit ef80c85f7dd3febc9c76c793427c444f9e62caa6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:35:57 2024 -0400

    wip

commit 03aea187652fc0418d9a66f7eb5af6bc53c9e535
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:34:45 2024 -0400

    wip

commit 16c9aa2278e7f9d9b5f5ccffb085b0142a7e20ec
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 22:36:44 2024 -0400

    bugfix

commit 159c7bcf47aa86e4abbd88ad72a34e196c56626e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 21:58:15 2024 -0400

    fixed decoder-only bug

commit aea8d34385a64d6e6efa87729fee8fa4c4f15818
Merge: 713d095b4 7f62077af
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 21:09:06 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 713d095b4036404f4580225720da17d7d4e431cb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:49:17 2024 -0400

    incorporated encoder sequence into request-add functionality

commit 87ed3b6fe380f75ebdafd3bc4da003b42802c18c
Merge: 97d81f0a5 94162beb9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:17:29 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 97d81f0a53506cf6292f24117e8ecbfca5803805
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:17:09 2024 -0400

    encoder/decoder input processing; formatting

commit e534ffc156479d1b4dbec905ccc0877b746cc068
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 13:25:27 2024 -0400

    wip

commit 3c7e19d3d0e4c53ca363f40712fe2df160be1d9e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 10:44:23 2024 -0400

    zip enc/dec prompts; formatting

commit 850a97e812662645452989341eb44b79aa4b3276
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 10:25:38 2024 -0400

    bart parallel vocab

commit 42ac66b469891ba3085eaa1265c2bd9d445e0839
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:59:04 2024 -0400

    VllmRunner encoder/decoder methods

commit 796d7a3e7f8a67b644f6a88446e4162a09a1fbac
Merge: 374880f71 7508a3dc3
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:55:37 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 374880f71d6f81bd2a933b237ff6fa46e0324e6b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:49:30 2024 -0400

    input preparation now includes encoder-oriented input setup:

commit c5846ac9b31777d131bb0e3af2ad62a74eab1978
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:40:46 2024 -0400

    Hfrunner greedy logprobs limit

commit 92d9f486b2455ff5ea5215eb61b9cb1e375b17ff
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:33:41 2024 -0400

    conftest: encoder/decoder example prompts

commit 54ff1420cac3edccff6c751e4930f7fa1b3be247
Merge: ddaf0ade2 7a3d2a5b9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:28:46 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ddaf0ade21142daafc504df83e15d31911dee497
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:28:21 2024 -0400

    wip

commit 914134749aee12e273f38273ed4cfda866ec837f
Merge: 251f899ea ec9933f4a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 16:33:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 251f899ea158af33ffe1367c57137ac9ed9212ad
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 16:33:10 2024 -0400

    wip

commit f85997b4bb63352fc1bad72b54eea358f89ec5b0
Merge: 46397c74e 64fdc08c7
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 13:30:57 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 46397c74e7c094d86d4f49fc3230cb92985d5fc5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 13:30:21 2024 -0400

    wip

commit 336a77d62d2d31a2ed6c9eba9e36190b50cca713
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:34:47 2024 -0400

    formatting

commit 8dccaa510a67e8de71811c13371468024843b71d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:34:14 2024 -0400

    correctly constructing enc/dec sequences

commit dd4031c8e3201ee2e874e40df69c1bd52e7c54be
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:11:34 2024 -0400

    wip but having wllm.commit_id error

commit 552551137b19a9e9c2ebc13856c8e5a22834ae1b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 08:51:18 2024 -0400

    Sequence may be constructed with encoder/decoder LLMInput configurations

commit 7b0803b1bb9fbf222be2b719729b3494ade79087
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:41:25 2024 -0400

    formatting?

commit 304caed04dcbc25b76d8e80321da00414ac7dc17
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:36:33 2024 -0400

    formatting

commit 6c953808f11122a0c5482786b41825a79788a9a4
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:25:01 2024 -0400

    wip engine is_encoder_decoder() setting

commit 78d3d3c00d30af324dbd1ca0973c1dd68d4cdb5b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:20:50 2024 -0400

    modified LLM.generate() error message

commit 10ed7145053546d2112ed98252dc46f782a04b72
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:18:13 2024 -0400

    Format

commit 83c5c43dd6e06d13d9d05c01882b6d705a5aefaa
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:14:34 2024 -0400

    prompt type checks

commit 94c083cabff971da175eca616ff4b2c94299573b
Merge: 64d71980c 0cca1646d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:00:30 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner

commit 0cca1646dce64fbdf2419b7f075e15da6264ee84
Merge: db5539a85 6ae1597dd
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:00:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 64d71980c823c167239d5c7338096a144586b7f3
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:59:49 2024 -0400

    wip

commit ff940f7adf771465e92a6fad350fb2f1aca4f694
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:18:58 2024 -0400

    formatting

commit 8b8d9812f7b7317448d4872db32cffcb45444c02
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:17:41 2024 -0400

    refactored AttentionType and related imports; skip BART test definitions entirely if on vllm CPU version (to avoid xformers import

commit 590a240fe53dd78e62c78f7ac0263b0c3fda6949
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:05:18 2024 -0400

    Formatting

commit 760355bfeea93c7b85cf440f597485e11a7357b1
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:04:43 2024 -0400

    bart test skipped on CPU version of vllm

commit db5539a85f83ceaa929e2c02129a1a174fa29424
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 05:00:25 2024 -0400

    format

commit 3d5bb888cfc10c835ff17c18ca367c930a335785
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 04:48:48 2024 -0400

    EncoderDecoderModelInput correctly handles encoder token/position fields

commit 447a5c7e10b09c1e5cff95e907198d6d050f1ffa
Merge: 9ce2da454 22e79ee8f
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 04:29:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 9ce2da45412de77bb358c2ce97521fa6a8b7990d
Merge: c5ceb2348 eeceadaec
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 13 19:26:27 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c5ceb23486c3f3ddd15faf8fcf06fcc1ba722fe1
Merge: 196f30cd7 41708e503
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 13 02:18:32 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 196f30cd7f25a682dc3d2320d994f949b00084a2
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 11:15:56 2024 -0400

    enc/dec decoder test working, sans sampling check

commit 9c898f5b28113ea53758c447175fd9cfd67b2066
Merge: 685604cfc f7160d946
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 09:41:15 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 685604cfcb90b6e74e37dbf5b5ee478e157f8191
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 09:40:42 2024 -0400

    wip modelrunner

commit f6499442e7c434c3ce4a187b344481988f106471
Merge: 9a63f51bd b422d4961
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 10 12:51:51 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner

commit 9a63f51bde8059fc361cc7abb2249ce1efb54163
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 10 12:50:40 2024 -0400

    wip model runner

commit fe7786c8a510d2280f3e25a8461474bb17ab8e11
Merge: 26b6271ca a5c28fca8
Author: laishzh <[email protected]>
Date:   Thu Jul 11 00:27:08 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit 6a71f8f4359dab04b9811b84d338db40dafa72bc
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:23:01 2024 -0400

    formatting

commit b4a461d983ed0215777c89f6b2ecbaa754422d4e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:18:56 2024 -0400

    formatting

commit d1343aac0fe6c0063f950e3600f9264aacb0836d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:07:43 2024 -0400

    scheduler test passes

commit c95adf50adcdc315f63b276f52ac9a6a2d35b5fa
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:49:34 2024 -0400

    scheduler supports encoder-/cross-attention & passes existing scheduler tests, but needs new encoder/decoder-specific tests

commit 4c01f1300161bb4a16fdc27612cdace516aedebc
Merge: 2c80185fb 4d6ada947
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:38:22 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2c80185fb81602a9a39afe4137bc5f59bcb69f57
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:36:11 2024 -0400

    formatting

commit bd14d29177dda7bd1f2ddd41ccba71703dbaa07d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:17:24 2024 -0400

    wip scheduler

commit c90140fba9d3ec2ee8a065a267aef571e93c64db
Merge: 88e284a53 4f0e0ea13
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 17:55:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner2

commit 88e284a5344699e099e5510e5a353b9c5a54d0c7
Merge: db49d48f2 543aa4857
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 13:26:10 2024 -0400

    merge from main

commit db49d48f2a0913251385e324b28af06bd81cc121
Merge: 22d013c1d 6cd595c3c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 11:15:43 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit 6cd595c3c879d4ee603bb6a5bc0f1724647a5135
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:47:20 2024 -0400

    formatting

commit 5df73fc708bf3370a5f6d7f85cce4772d5c679b5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:47:04 2024 -0400

    xformers backend cleanup

commit d8a692b7dde0656696b726497030970aac0b53d3
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:39:37 2024 -0400

    cleaning up a number of backends & backends utils.py

commit 097aff2029e4560ae28bd7a7acf0f20509f803fe
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:36:05 2024 -0400

    vllm/attention/backends/flash_attn.py cleanup

commit 45fc9f71641bdd17c67997598463f12ead3998b2
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:35:00 2024 -0400

    vllm/attention/backends/blocksparse_attn.py cleanup

commit 5ee30fed1d27dbef98dc3e4512741c9ca301197c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:31:09 2024 -0400

    vllm/attention/backends/abstract.py cleanup

commit 4f27946dcfb73f0a60420eb3ca6c9a74f6c6d3d1
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:27:35 2024 -0400

    tests/kernels/utils.py cleanup

commit a1bf65212cab0933b2520d8557a9d9132fff8c3d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:17:04 2024 -0400

    test_encoder_decoder_attn.py cleanup

commit 9ae6728ecfe48769f578b0fad3f8e3950daa683d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:46:58 2024 -0400

    fixed specific point-changes requested by woosuk

commit 7ce9a51d4fb3e286fdaa3a3ba12e60d0908d2d64
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:38:03 2024 -0400

    merged in first pieces of woosuk feedback & latest main; formatting

commit e837a73be0b61434116d1f332a84266d05cd61fc
Merge: 07df0e158 7e0bc5725
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:36:30 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn_reviews' into infra_enc_dec_cross_attn

commit 7e0bc572541e6018a7cfcebd16ea08b26826b975
Merge: 13f5b5078 717f4bcea
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:35:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 07df0e158a60b7d2a90407eecc868eaa10a58180
Author: afeldman-nm <[email protected]>
Date:   Mon Jul 8 09:33:03 2024 -0400

    Update vllm/attention/layer.py

    Co-authored-by: Woosuk Kwon <[email protected]>

commit 5dbebbc6f3aafe706a5555119fefa519b71c4634
Author: afeldman-nm <[email protected]>
Date:   Mon Jul 8 09:32:43 2024 -0400

    Update vllm/attention/backends/torch_sdpa.py

    nit: This will reduce the number of line changes and make the code look better.

    Co-authored-by: Woosuk Kwon <[email protected]>

commit 13f5b5078cdd81f58ed88a653ecc8ddc0968c073
Merge: d81662c57 abad5746a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 5 15:07:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 22d013c1de08aa8bc5747c513b12e0c3dd59d144
Merge: ba09fbcd6 d81662c57
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 4 00:24:29 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit d81662c572948ca9e01db21ec5f14f71c9fd1764
Merge: 2f0eb9b59 3dd507083
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 22:59:32 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 2f0eb9b591f298879df48be6d0a74196cf32a5cf
Merge: 65e47db5a 966fe7214
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 18:58:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit ba09fbcd6b7efff359b1a0cef47c385d130b777d
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 11:32:18 2024 -0400

    refactored where a number of constants are stored, primarily constants related to encoder/decoder

commit b085795eefcf31303c5e38bd734544664b5757c5
Merge: 44c62708f 65e47db5a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 11:22:23 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit 44c62708f3645f8a82b17a63849c1822a2dca645
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 10:15:57 2024 -0400

    manually merged BART code in from previous modelrunner attempt, it won't work tho until new modelrunner is finished

commit 65e47db5a59087af005e97df20f9d1a5be466a3c
Merge: 2828aa793 7cd2ebb02
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 07:52:12 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 2828aa7936adab0d2ee3b49ffb0cfd01848581ab
Merge: 5ff9c7686 af9ad46fc
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 30 20:16:34 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 5ff9c7686339f8d5f8e42060c1772f43468f2459
Merge: 8d36458fb 7836fdcc1
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 30 18:21:25 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 8d36458fb640e61fd70844739d107f41c0f3e631
Merge: 64981b535 75aa1442d
Author: Andrew Feldman <[email protected]>
Date:   Sat Jun 29 14:15:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 64981b535c557ada816b338f83cccf8c11ad0f83
Merge: 83d474e93 2cd402e16
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 28 15:37:00 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 83d474e93559ebbaf51194ef818f2308fd16ef1a
Merge: a5018499e 57f09a419
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 28 10:18:17 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit a5018499e3b8475749a8d1af80e14c8d172cf2c7
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 18:57:56 2024 -0400

    reverted unnecessarily vllm/utils.py changes

commit c8f8d59d4ce7e1a3c104bd417f256e9b8f954815
Merge: bcccc3486 c3dde367f
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 17:34:16 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit bcccc34863f5864307ef9c781471cef4e5d38ba8
Merge: 75756b91e 3fd02bda5
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 13:59:00 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 75756b91e3753a9c2a60dbae42b2e46d3612ece5
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 11:28:19 2024 -0400

    removed redundant elif

commit c24697fe82c844e13c820db916efef0a6b789374
Merge: 7ca0d7a39 e9d32d077
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 11:23:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 7ca0d7a399da475099cf501b1f4981a7dffc067a
Merge: 4dabe1974 294104c3f
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 26 19:37:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit a5c28fca8f5e21653c6e5874719467e08d3d8503
Merge: ba4e2c12e 4dabe1974
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 15:52:22 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 4dabe1974766c6db8fd6ce8b6688c25bbd85b633
Merge: e2a46e3b7 dd248f767
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 15:48:31 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit ba4e2c12e6f1a03e3381cabda8902d55df9a292e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 04:05:23 2024 -0400

    Removed unnecessary position arguments from BART routine; formatting

commit 41e31e861b01896a99fba2f2ea44b717164c4398
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:59:48 2024 -0400

    BART with new explanatory comments & passing formatting tests

commit e61385d90e40b423e1e5d98839413a76ffcd11fb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:49:18 2024 -0400

    fixed bug caused by overzealous refactoring

commit 4400d7733f7dca2acffac916a00f5edc6a89e14e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:36:28 2024 -0400

    some reformatting

commit 5169a2a6518d5ae338001eae0eae6dad64bf52eb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:25:40 2024 -0400

    removed unnecessary positions arguments from BART encoder, decoder forward()

commit d43141f20514e77963e1c13ba857b1d3cb71c210
Merge: 753bab068 e2a46e3b7
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:16:19 2024 -0400

    merge; a lot of formatting fixes to bart code but not fully passing

commit e2a46e3b7b9f9d1a9cc751046c3cddd1522620ed
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:53:35 2024 -0400

    formatting

commit 1a6e5a31846e2ef886b66e9cc9216ffe983d0ec0
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:52:04 2024 -0400

    moved make_tensor_with_pad() helper function back to vllm.utils

commit d23c28466765496049a1696d0a053a0a2505ce9a
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:38:08 2024 -0400

    typing and formatting; fixed escape sequences in comments

commit 2f0b05bb805513e73eb0609ea87b6367ec9d4803
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:35:34 2024 -0400

    typing and formatting

commit 47c9f396fdcd40895597423ebfefe585b014c2f3
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:32:52 2024 -0400

    removed attention_type

commit 06c7f7500140c574d20a12079dbd1ef83db29688
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:28:42 2024 -0400

    reorganized helper functions that were only being used for testing into tests/kernels/utils.py from vllm/utils.py

commit a178b7a8c9838665ee7e169471206b70d62e1b71
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:20:00 2024 -0400

    changed nested if/else to elif/else in xformers mask computation code

commit 597526a49e041ec99329add79ef272ce6e457b9e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:18:02 2024 -0400

    removed extra line

commit 125e5dc46724155f5d81e93a7644a3889e864a2f
Merge: 5ce2dd083 e9de9dd55
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:16:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 753bab06880a05726b2b8274a20d8f9d179c9576
Merge: 919bf88f8 e9de9dd55
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:14:20 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 919bf88f8925b2e60c765f309df655318c392c2e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:13:52 2024 -0400

    BART e2e test runs but does not pass

commit b7ff75fc3d3cb5d447503daa8a4a78aa6bf1a18d
Merge: 2d8429e1b ba991d5c8
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 19:25:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2d8429e1b0002eccb7deaa805d25ebb6d5616187
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 18:47:19 2024 -0400

    fixed a number of bugs related to BART decode-phase; added support for the particular architecture alias used by bart-large-cnn

commit 8f9ee625557ec34ec29787b6b66ec760ff390e77
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 18:06:10 2024 -0400

    wip bart-cnn summarization example

commit d58e8c8464d5bcf41121a582b035f5f290658657
Merge: 6fd4c020a 1744cc99b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:50:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 6fd4c020a9c5ee8ecbf6e086d8b9dfefb3f8396f
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:42:09 2024 -0400

    fixed prompt processing bug that was preventing inference from starting

commit 7d2fcf90a6516be432ffd39f4571ed0a524438b2
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:39:07 2024 -0400

    BART passes profile run

commit 3b95225850af9b81a15142344c4c8bae7257a519
Merge: 8b8c40943 b8d5637c5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 13:19:42 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_bart' into infra_enc_dec_model_runner_reviews

commit 8b8c40943e2e0a4b104ca65c76441d3db03a017d
Merge: 42c364439 5ce2dd083
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 13:04:54 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 5ce2dd08345da9e5a19a913214e5a73ed4923c8d
Merge: ce88fa36e c24621295
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 12:55:03 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit b8d5637c510b42a6503d9b0c4d810fe3568314dd
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 12:50:25 2024 -0400

    wip bart

commit 59caabecf2666c33306625843908b1d9dc2ffa8b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 21:42:39 2024 -0400

    BART almost passing profile_run()

commit f2dac1ce0ae1033b5143b8f1cd234e1eee5e67ee
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 20:13:05 2024 -0400

    wip

commit 082be510533d1e39008db19ca8754a91aa4879d3
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 19:36:46 2024 -0400

    loading tied weights

commit 42c36443981dd89c9defaf2f51c1481ddb0a5751
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 16:24:26 2024 -0400

    encoder decoder model runner fails for unsupported scenarios

commit 9ad5143ab290419d27fcde1287d9bea853a58be3
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 16:00:15 2024 -0400

    refactored backend constants

commit 001cb185141278b6ea3a2fbbf6200032104229e0
Merge: 6219d9590 ce88fa36e
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:40:19 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ce88fa36e6cdbe0352348207a6a4dc405fcd9d76
Merge: ca68c63db f1e72cc19
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:39:06 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 6219d9590dfae14c574d598ce879af58fe97177f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:36:36 2024 -0400

    Formatting

commit 576c26c86a9b210fcca29180ed20fd15770f2660
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:35:11 2024 -0400

    first pass a BART load_weights; probably not handling qkv correctly

commit c11db0fd30e326d2273da95439c5087e83725b04
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:21:15 2024 -0400

    integrating BART weight loading code

commit 2123517ef5fc8a5593e693b7d28d8c217c729282
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:13:36 2024 -0400

    formatting

commit 97cad4b875ee09ebeff455a20fdf351eef9d2f16
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 14:40:40 2024 -0400

    wip BART model cleanup

commit 45a53877dc815398f1f190fa7e7d513db7928b6f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 14:28:59 2024 -0400

    pruning out training functionality & unnecessary code from BART

commit 30becae9d35d4b994bcd995c81603a97b93d0e3d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 13:45:48 2024 -0400

    profiling fix; wip bart

commit d2ad2328e41ad7a8898ddbb37db8c1bfaf2ae803
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 13:37:27 2024 -0400

    wip bart integration

commit ed610b0b9a6abcdaf874d16225a441509a207076
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:09:51 2024 -0400

    pulled in bart model code

commit 28f0d2fff6752a90227aa8aa07ca32e43bee395d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:06:56 2024 -0400

    pulled in bart code

commit 213dc597274da4c963510b1d72166d0a8eddbc7b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:03:50 2024 -0400

    test_bart.py

commit 49c7162d70441963ec6c26430a8e36426fbfe1aa
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:01:59 2024 -0400

    formatting

commit 84c0dcc5fe2b653cb0517df523504a107055061a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 11:58:45 2024 -0400

    scheduler tests

commit c15731710bd5c317638fef4d861567031d6126b8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 11:30:25 2024 -0400

    free sequence groups

commit 614de4e13869f1b2938d1f30369bbb98752a20c6
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:54:25 2024 -0400

    formatting

commit b6d4383e141e1fc23ee0c8c6bb9a7d172949266a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:46:15 2024 -0400

    enc/dec integrated in Scheduler.schedule()

commit 89b0e445bb32bbd5758bdcc05cd1bb869101029e
Merge: beec4f571 ca68c63db
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:27:42 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ca68c63db6ef8b9fcd132e84ffc6db1b7c7f618f
Merge: e9d7ede3b bd620b01f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:26:54 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit beec4f5717d5c8193d70449c066f2aa469bf50b0
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:24:50 2024 -0400

    enc/dec support in LLMEngine._add_processed_request()

commit a1ab7a110c334f54dc451f1b273c3b0f0345332e
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:50:37 2024 -0400

    removing BART test

commit 7000573396666a58cf5ca06d626f2b4c2e4f8bb2
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:49:37 2024 -0400

    temporarily removing BART work

commit 1bd916c2f91f7b8d755a9142ee3daeb7d5e489cb
Merge: 2b2d2e9df bd620b01f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:38:05 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2b2d2e9df2b1535883e36b8353a26d52200f7783
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 08:55:19 2024 -0400

    wip encoder/decoder API integration; WIP BART integration; WIP BART example

commit e9ecd25cb733b220785611056295ea9787b1ce47
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 05:48:50 2024 -0400

    added unoptimized BART example

commit 2fccd1832a0933dca8537e436449dad4d52fa0c3
Merge: de967174d 0f645112d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:28:07 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_bart

commit 0f645112de4e1784cd43be505e659f3d3bd56581
Merge: 58139e380 e9d7ede3b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:27:25 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit e9d7ede3bfef92527a643809f4beb20cb780e7c0
Merge: 67ed41961 d9a252bc8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:26:01 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit de967174dcbbdb5e81d975edf158416bcbeb74cd
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:25:36 2024 -0400

    wip bart test

commit 58139e3808060c550264c800e605129d0082af5c
Merge: f8569facd d9a252bc8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 01:55:08 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f8569facd10b0cbf05689bfc364831a37bb48b45
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:35:24 2024 -0400

    formatting

commit eb5819be6025f0e598831e7e13c0656e184e9524
Merge: a0068fc91 1f5674218
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:23:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit a0068fc9112c5acefe69f5a8e30470c73a90a039
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:21:05 2024 -0400

    Encoder/decoder model runner passes prefill/decode/empty-SG tests

commit f0094bd8a90cc26325f1ea7ca1506fc459a312c9
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 20 10:59:52 2024 -0400

    wip enc/dec modelrunner prepare_prompt test

commit 736cf45223517f5720aedc53b65258ee8a75a25c
Merge: 1581eb7f9 f9f9ae39e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:56:31 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_bart

commit f9f9ae39eea1dd6367cec3b2e878e1d2f3bef4ad
Merge: a8a52d293 67ed41961
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:31:41 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 67ed419619301a39c04417b29c90822a837e6362
Merge: ea37e17ab 3730a1c83
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:29:04 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 1581eb7f978a83690e0aaa2b390be491b42ffb15
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:28:28 2024 -0400

    wip

commit fbec309f0cc8d94df6ba7ab3f71f172d30f73531
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 01:14:35 2024 -0400

    moved enc/dec error strings to top-level vllm utils

commit a8a52d2935d5a2ab969c05d498ec2423ae19507b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 23:39:15 2024 -0400

    some formatting fixes

commit 37aeed66141b10b0d43c8e6d56613806dc7108ff
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 23:35:11 2024 -0400

    enc dec model runner testable if only for encoder decoder model

commit e3ba61e368f0085fe64e8dae3d80494f5254164c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:44:23 2024 -0400

    wip

commit 3311aac9bddd474d0a7037b53c53dfc515df0bcc
Merge: f9314fd7d 59a1eb59c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:43:23 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f9314fd7d1ae0d3146d7456eb41e6885f0055a5d
Merge: 89fdb8116 ea37e17ab
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:43:07 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ea37e17ab5ad7c084c13bf8e8492039d6a9bcdbf
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 19:16:38 2024 -0400

    merge conflict; typing; formatting

commit 91cbaa63d35e72ed0c14b65ed7f79bffdda2da97
Merge: 525303c7c 2bd231a7b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 19:15:10 2024 -0400

    merge; resolve conflicts

commit 525303c7c61127900680ff06b6cc09610001b71e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 18:06:33 2024 -0400

    num encoder tokens

commit 5f8c7f6cd6776cbda8289a5cee28e5cd8b858f4d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:26:24 2024 -0400

    Moved attention type for attn_metadata to attention forward(); added NotImplement failures to backends in non-decoder-only scenarios

commit c3f7da7620921e14e6c7efabeb0c54fd3d08b30b
Merge: 7b9cb7f43 13db4369d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:01:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 7b9cb7f4339364b66180bf5cf7015f8fea67479d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:01:05 2024 -0400

    Replace attn_metadata.attention_type and attn_metadata._attn_type with attn_type argument to forward()

commit d0fd9e10ff13157183fc24dfcb558f83c716ead6
Merge: addde7d22 4ad7b53e5
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 09:58:57 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 26b6271caa9b776b0093b874ab94dc8df0bb36b9
Merge: 3ea38598e db5ec52ad
Author: laishzh <[email protected]>
Date:   Tue Jun 18 17:49:40 2024 +0800

    Merge branch 'vllm-project:main' into main

commit addde7d22cda9ab0d006538ec0f900ac593c9292
Merge: 47586807a 114d7270f
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 00:53:01 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 89fdb811629bfe86ce5aaf85e078ce953e03e700
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 00:52:29 2024 -0400

    first pass at _prepare_encoder_model_input()

commit c7bf81228dc06a1ed2c9d7e7e6f0d61e476e7e7b
Merge: 830a05126 47586807a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 10:37:42 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 47586807a3e8e75c6e9c27d1d17aeb22b0dff63d
Merge: 90aec385a e2b85cf86
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 10:35:45 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 830a051267732f60b04b99a15552ea984b9f43f8
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 01:16:25 2024 -0400

    format

commit e5c029926043518e63b85739d369b6cbbb9eddda
Merge: 9cb8ee685 90aec385a
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:59:32 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 90aec385a0e77574f5b575257e29b194f6974521
Merge: e229e0018 845a3f26f
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:50:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit e229e0018138698bf13135f067eaf32a8cbf9167
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:47:04 2024 -0400

    format

commit 4dccd51c91fd3c1ae3a9ecea4baa46cad2a5f7dd
Merge: b3c3411e2 f07d51332
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 20:26:41 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit b3c3411e26b7cf6f27604825d99a920c34605c9c
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 14 16:39:35 2024 -0400

    formatting

commit f06c6873d77962c7b27fc7f0c29397381dd0a5be
Merge: 708a4b39a e2afb03c9
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 14 16:38:18 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_encoder_only

commit 708a4b39a73c48e…
maxdebayser added a commit to maxdebayser/vllm that referenced this pull request Oct 11, 2024
commit 935c58d9e70ed6e84559e95f696c65dfb282e422
Author: Max de Bayser <[email protected]>
Date:   Fri Oct 11 14:28:57 2024 -0300

    add registry of encoder-only models

    Signed-off-by: Max de Bayser <[email protected]>

commit 579337372e694d900a4e01899b81fe0afcf82c10
Merge: e7044a61c 30b0f2156
Author: Max de Bayser <[email protected]>
Date:   Tue Oct 8 10:34:17 2024 -0300

    Merge branch 'bert' into roberta_embedding

    Signed-off-by: Max de Bayser <[email protected]>

commit 30b0f2156bccbfb11def0d7902acb8b56d24a98a
Merge: 80c18855f 8c746226c
Author: Max de Bayser <[email protected]>
Date:   Tue Oct 8 10:33:05 2024 -0300

    Merge branch 'upstream_main' into bert

commit 8c746226c956f7c8a4672689fee91c7d22befed6
Author: Brendan Wong <[email protected]>
Date:   Mon Oct 7 22:51:43 2024 -0700

    [Frontend] API support for beam search for MQLLMEngine (#9117)

commit 80c18855fcff195175b7046923c4b0c3815f141a
Author: laishzh <[email protected]>
Date:   Mon Oct 7 12:04:34 2024 +0800

    feat: update with origin/main

commit 6440795f407c652ecdb045d1b141913afdb8b5e1
Merge: 04b0bc6ff 487678d04
Author: laishzh <[email protected]>
Date:   Mon Oct 7 11:28:19 2024 +0800

    Merge branch 'origin/main'

commit 04b0bc6ff534495a9627f5548767f5bfb95268e8
Author: laishzh <[email protected]>
Date:   Mon Oct 7 02:54:55 2024 +0800

    feat: revert embedding_block_manager

commit 352d8b2641d11ffa0e153462fd89b54525998843
Merge: 3fbfdf429 107d9c207
Author: laishzh <[email protected]>
Date:   Mon Oct 7 00:45:52 2024 +0800

    Merge remote-tracking branch 'maxdebayser/bert'

commit e7044a61cebf6b9229a50a8396fdef104e799a9e
Merge: a14b4e39d 107d9c207
Author: Max de Bayser <[email protected]>
Date:   Wed Oct 2 18:04:38 2024 -0300

    Merge branch 'bert' into roberta_embedding

commit 107d9c207808c6f070ef086e3ea748cecbc9d809
Merge: 57bdd6049 7f60520de
Author: Max de Bayser <[email protected]>
Date:   Wed Oct 2 17:52:52 2024 -0300

    Merge branch 'upstream_main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit a14b4e39d26eb953c569ebb219aa3cb7203699ec
Merge: 08f1781d6 57bdd6049
Author: Max de Bayser <[email protected]>
Date:   Thu Sep 26 17:25:28 2024 -0300

    Merge branch 'bert' into roberta_embedding

    Signed-off-by: Max de Bayser <[email protected]>

commit 57bdd6049129b43244d3c70ea876e784762e96e9
Merge: 2c8a5b922 7193774b1
Author: Max de Bayser <[email protected]>
Date:   Thu Sep 26 17:15:18 2024 -0300

    Merge branch 'upstream_main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit 3fbfdf42966c7324466e266dc6d4b5c26131aee5
Merge: 2c8a5b922 873edda6c
Author: laishzh <[email protected]>
Date:   Thu Sep 26 23:23:39 2024 +0800

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	vllm/inputs/data.py

commit 08f1781d6bd49653bd62ffdfde4f86d903f0c65a
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 17:04:35 2024 -0300

    add head size 32

    Signed-off-by: Max de Bayser <[email protected]>

commit 2c8a5b9224ce9e26b2e43bb2312be91e2c74de9c
Merge: 15be7fa8b f2bd246c1
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:48:10 2024 -0300

    Merge branch 'main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit 30c875e9e61f1e9e4d556014f49362adff76269a
Merge: afd997ba9 464a90f4e
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:59:23 2024 -0300

    Merge branch 'bert' into roberta_embedding

commit 464a90f4e09165ab724de26b35e9d7913c5d6560
Merge: 15be7fa8b f2bd246c1
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:48:10 2024 -0300

    Merge branch 'main' into bert

    Signed-off-by: Max de Bayser <[email protected]>

commit afd997ba9f6ec2513145c0ca469a15783e0c96e5
Merge: 7d0ecb90c 15be7fa8b
Author: Max de Bayser <[email protected]>
Date:   Mon Sep 23 13:14:29 2024 -0300

    Merge branch '5447' into roberta_embedding

commit 15be7fa8bce185f64fafecaabdb8c828e83f4ad8
Author: laishzh <[email protected]>
Date:   Mon Sep 9 23:04:44 2024 +0800

    feat: fix lint

commit 0ea4da1c549bf35c8456c47729da46dd33481cac
Author: laishzh <[email protected]>
Date:   Mon Sep 9 23:01:22 2024 +0800

    feat: fix lint

commit 776dcbdae9d693dbd6546b7784712c06e6ef473c
Merge: 3ff2d3637 4ef41b847
Author: laishzh <[email protected]>
Date:   Mon Sep 9 10:32:46 2024 +0800

    Merge branch 'main' of https://github.com/vllm-project/vllm

    # Conflicts:
    #	vllm/core/embedding_model_block_manager.py

commit 3ff2d36375d9560f87c56860ffff8a774a217cf9
Author: laishzh <[email protected]>
Date:   Mon Sep 9 10:29:01 2024 +0800

    feat: some changes on test_embedding.py

commit e351bfd0febe4bbf8030fcd07f39eef5cce97641
Author: laishzh <[email protected]>
Date:   Sun Sep 8 23:50:18 2024 +0800

    feat: bert embedding implemented, but still have some bugs with mistral,

commit 7d0ecb90c5034d41f0d9b38eede25f50bf941e3d
Author: Max de Bayser <[email protected]>
Date:   Wed Aug 28 16:35:25 2024 -0300

    Add support for Roberta embedding models

    It's almost identical to the Bert models

    Signed-off-by: Max de Bayser <[email protected]>

commit 612cf1a969fa46105c3685b2eb025cde6416747d
Author: laishzh <[email protected]>
Date:   Tue Aug 27 15:19:50 2024 +0800

    feat: modify test_embedding

commit fc1f2b7ceb69f9588799820831145babf29aaa64
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:39:33 2024 +0800

    chore: fix lint

commit d09860763500b85193230588386f0e3d515e231c
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:24:51 2024 +0800

    feat: remove embedding_model_block_manager.py

commit 37f698b4241a42c9634030e372e419b47e2a1e9c
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:16:34 2024 +0800

    feat: move BertEmbeddingModel to the end of file

commit 6f006f5ad698d76599e0b005520e65921042d07b
Author: laishzh <[email protected]>
Date:   Mon Aug 19 15:06:21 2024 +0800

    chore: fix lint

commit bfd7ec9e043cf304e6dea024912eb2a18c786bd6
Author: laishzh <[email protected]>
Date:   Mon Aug 19 14:59:06 2024 +0800

    feat: model input

commit 8b107a24a4ef9abb194686066c3bebc6923c6876
Author: laishzh <[email protected]>
Date:   Mon Aug 19 13:41:49 2024 +0800

    feat: fix lint

commit e15d0cce60e3f39f2aaf8c3f62314a6d6b4ea091
Merge: b76da51c0 f710fb526
Author: laishzh <[email protected]>
Date:   Mon Aug 19 12:45:26 2024 +0800

    Merge branch 'main' into main

commit b76da51c0d9ba1b4e39d432b8fb557ed8319034f
Author: laishzh <[email protected]>
Date:   Mon Aug 19 11:35:22 2024 +0800

    feat: enc_dec_runner base

commit b99d783bd852eb4cae228fcd8faf3344cd9a6fed
Author: laishzh <[email protected]>
Date:   Sun Aug 18 00:49:57 2024 +0800

    feat: remove embedding block space manager

commit 7e1196d25054d76d92b3777bc077d3cffd742599
Author: laishzh <[email protected]>
Date:   Sat Aug 17 14:43:32 2024 +0800

    fix: fix hint

commit ce9a599194dbc3a208a6a4a21fdccaaa5c26ece8
Author: laishzh <[email protected]>
Date:   Sat Aug 17 02:18:54 2024 +0800

    feat: bos_token_id

commit 275f49de32136eb9e4298d42aa85a1e2dc56924c
Author: laishzh <[email protected]>
Date:   Sat Aug 17 01:03:55 2024 +0800

    feat: embedding model prompt

commit 0b3f55c66e5eb40808f46ebde3c38213478050c7
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:12:51 2024 +0800

    feat: fix lint

commit 91e23d8ad2b45790590889d6ee437702f5003792
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:04:30 2024 +0800

    feat: fix lint

commit 7657af3f49cdb567bc96b44157c89f18cc4d0a22
Author: laishzh <[email protected]>
Date:   Fri Aug 16 15:01:26 2024 +0800

    feat: fix lint

commit f2158848b9abd839c515c568acd592d0416c6682
Author: laishzh <[email protected]>
Date:   Fri Aug 16 11:21:54 2024 +0800

    chore: recover

commit a0ad0df28c9de89bdd66b587502f6af9265065be
Author: laishzh <[email protected]>
Date:   Fri Aug 16 11:15:28 2024 +0800

    chore: recover unchanged files

commit 872e79531b39d1bf12ea81ddcd5bf919dd97265d
Author: laishzh <[email protected]>
Date:   Thu Aug 15 21:40:55 2024 +0800

    feat: embedding model forward

commit 682c455bb0b8c950e1e00b43a6841f433f62db97
Author: laishzh <[email protected]>
Date:   Thu Aug 15 14:36:40 2024 +0800

    feat: recover sequence

commit aca786e4359ef55d0af006199728c8b941558579
Author: laishzh <[email protected]>
Date:   Thu Aug 15 13:44:03 2024 +0800

    feat: default bos_token_id of encoder model

commit 76b47fb1b7920fb50a889f19e1c1421e4385d1ca
Author: laishzh <[email protected]>
Date:   Thu Aug 15 13:18:53 2024 +0800

    chore: recover

commit 37bcba01408d37b192063e2ee2b9ac1c3087393c
Author: laishzh <[email protected]>
Date:   Wed Aug 14 17:47:05 2024 +0800

    feat: full pipeline

commit 63fb7a582cef08ec29a8b30024a01602dc5ee636
Author: laishzh <[email protected]>
Date:   Wed Aug 14 02:39:31 2024 +0800

    WIP: bert embedding

commit 53c5148e9f5024f2eb6a83bbf7af191dc88fe555
Author: laishzh <[email protected]>
Date:   Tue Aug 13 16:11:53 2024 +0800

    (WIP)feat: EmbeddingModelRunner support encoder model

commit 12a9869b5324fa9a4f7090eb8967c81f47f87f75
Merge: 59bf8c44d 97a6be95b
Author: laishzh <[email protected]>
Date:   Tue Aug 13 11:22:44 2024 +0800

    Merge remote-tracking branch 'origin/main'

    # Conflicts:
    #	.buildkite/test-pipeline.yaml
    #	examples/offline_inference_encoder_decoder.py
    #	tests/conftest.py
    #	tests/core/test_scheduler_encoder_decoder.py
    #	tests/kernels/test_encoder_decoder_attn.py
    #	tests/models/test_bart.py
    #	tests/worker/test_encoder_decoder_model_runner.py
    #	vllm/core/scheduler.py
    #	vllm/engine/llm_engine.py
    #	vllm/inputs/__init__.py
    #	vllm/inputs/data.py
    #	vllm/model_executor/models/bart.py
    #	vllm/sequence.py
    #	vllm/utils.py
    #	vllm/worker/enc_dec_model_runner.py
    #	vllm/worker/worker.py

commit 59bf8c44dd79c832a37949d0698bacef6ecc2136
Merge: a40828921 a936faa57
Author: laishzh <[email protected]>
Date:   Thu Jul 25 23:02:34 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit a936faa57000aca5be159de260fae8c8849148b6
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:52:50 2024 -0400

    removed prefix caching from enc/dec modelrunner

commit 4bb7fc442f67dd162a001900e485d02d64fa24ed
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:45:03 2024 -0400

    removed chunked prefill logic/docstring text from enc/dec modelrunner

commit f0abcc27e642dda6371eb1440de519166642a9e7
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:37:45 2024 -0400

    format

commit d1751db42bac1baf50b5fa542c770fbab13ba9ff
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:35:45 2024 -0400

    removed flashinfer references from enc/dec modelrunner

commit 64685acfe52177d1e01362ece71d3faab73e8e45
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:13:44 2024 -0400

    Sequence docstring

commit 035d90dfc21bbc12d12d2368a2d5d5175ead31ca
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 10:01:31 2024 -0400

    updated RequestOutput docstring

commit 1bb7ad9f2f5e4c84e283c5c0c59006d817440609
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:59:34 2024 -0400

    updated RequestOutput docstring

commit 47c5548936cd7bfe476d31e8248e3208a8a663d1
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:53:23 2024 -0400

    checked out examples/offline_inference.py from main

commit 3327e5be3b07bc35a607a1f4fa1fba2fc4f5904e
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:49:44 2024 -0400

    removed lora & vision & mm code from enc/dec modelrunner

commit 175ea95baf0537209a8aa0e9c94f711f794f0f51
Merge: c2cc010ac 316a41ac1
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 09:25:53 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c2cc010acc1bb632bb7297da970ff865b22c7f27
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 01:33:04 2024 -0400

    Removed lora from enc/dec model runner

commit fb5a2bcb2baa984b884ba8bdd6293dd06cb8756b
Merge: 393515eb0 9e169a4c6
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:52:21 2024 -0400

    upstream merge

commit 393515eb07a84c3d1604f0c0bc52eb2d8f7c5ae0
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:50:27 2024 -0400

    formatting

commit 47b4eb2a06bf0811f143668fbfe1f8c2caedc827
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 25 00:50:08 2024 -0400

    fixed bug caused by upstream refactoring

commit bed9bcd356c3526f5697ddfc2052d5bfca5fa9d2
Merge: 0af58ec10 740374d45
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 21:04:09 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 0af58ec10ac6eb9cab3f78abfa62390ade9ca64c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 05:10:20 2024 -0400

    responses to feedback

commit d82b27346b444778eeba42e015ac716883c37f76
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 05:01:27 2024 -0400

    enc/dec example comments'

commit 4b5b2cf956141e3adbc22a7a2aa2ebbb9bad8979
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:51:48 2024 -0400

    removed unnecessary argument reordering

commit ed4a56b9ca31cdf06033611887114920318ad397
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:46:49 2024 -0400

    formatting

commit 5a270ff49f3ebafecf8fb45e090f08d705aa416a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:46:32 2024 -0400

    refactoring

commit 02114bdcd5a832c3610318a8d0b8cfb26070f3ef
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:31:32 2024 -0400

    _free_seq_group() -> _free_seq_group_cross_attn_blocks()

commit be58d8ab92fd4ddab1f48b246a5233ee3a71bcf0
Merge: c493d4029 ccc4a7325
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 04:20:18 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c493d402929d023a0924018a928502cb05605a2f
Merge: f36ffb569 5e8ca973e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:34:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f36ffb5695b0694947f4ae9e7417cc1afa85e19c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:33:47 2024 -0400

    example includes prompt zipper

commit 61d2ad2cc7791b6e32c8678b8e88ed99bbab4118
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 24 00:28:20 2024 -0400

    fixed bugs in handling non-text formats for individual prompts

commit dd784b5423ba21fc6b8188908df417d128376a1f
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 21:37:19 2024 -0400

    typing fix

commit 0b29fd27f17f2751550262f218e6ef1afbef7087
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 21:35:25 2024 -0400

    enc/dec handles empty str and None decoder prompts correctly

commit aa01d71f90f0c3cda8a7ea419ff4f1fb6dc9d13c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 20:56:51 2024 -0400

    empty-string decoder input is now handled for encoder/decoder

commit 4a6e39e67c2bb4c2d685df9031cbf64956be4255
Merge: 7e7bbd9e1 87525fab9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 20:16:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 7e7bbd9e16900449e350bf8634d584e4b1a5c2f0
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:57:41 2024 -0400

    deleted unnecessary dependency

commit 229847b431469bd17b2d13f3651b322c7b280274
Merge: 059273f3c 1bedf210e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:56:27 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 059273f3ca43947413572a0014c1437a53e33b8a
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 16:56:07 2024 -0400

    wip

commit b283544d820bfd96ac80845d2ddd7ad057cca6e9
Merge: 48a742d41 b01937f0c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:15:18 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_correctness' into infra_enc_dec_model_runner_reviews

commit 48a742d4155cba0ffc7effb1c9fdad0706493c43
Merge: 427032a08 bb2fc0807
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:15:03 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit b01937f0ce29bc9e417e85cb4dd18ddb47a98e3b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 04:14:06 2024 -0400

    set up None/empty str tests which are not passing

commit c51a1682be7443ec7d32062491868bd49c631eb8
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 23 01:47:43 2024 -0400

    fixed bug in how conftest was handling HF encoder/decoder outputs; disabled HF engram repeat checks

commit 427032a085cd48701f7abf64518563929a844d6c
Merge: 14831b09d fea59c771
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 17:14:13 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 14831b09da05f6d8e689568c77f7dfc5c33895ab
Merge: c43a6ed19 b90b6b6ff
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 13:52:34 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner

commit b90b6b6ffb4417ec64b382e9211273bca1eebbb7
Merge: b174c7ab2 739b61a34
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 13:51:35 2024 -0400

    upstream merge

commit a40828921c18faf70f4239d90e599da4311b284e
Merge: 7ace684da c43a6ed19
Author: laishzh <[email protected]>
Date:   Mon Jul 22 19:00:06 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit c43a6ed191e76f81bfd27f25e2ca8bac1fc01bcc
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 04:03:59 2024 -0400

    commented out BART TP=4

commit b174c7ab2da60e24a2ca576eccee671541ae142a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 04:02:56 2024 -0400

    bart is parallelized, modulo an unfortunate hack for QKVParallelLinear in cross-attention

commit 3551b6bf56ab74228c923b698e59a88b06bac81c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:59:22 2024 -0400

    fixed bug where underlying Attention was constructed using full head-count

commit fdf71de8557d588ff3b5767e96df09de4e9278d5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:48:35 2024 -0400

    parallelized enc/dec cross-attention, using a slight hack

commit 9bbed43ab159063a8dff27587dae909b11e1a703
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:20:20 2024 -0400

    parallelized LM head

commit 74abe22287374c9dd801ef059692016ef09777cb
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 03:01:07 2024 -0400

    encoder attention & decoder self-attention parallelized

commit e5bb9de596bd7f4b5d85ab6d0a2440cae06f982a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:33:02 2024 -0400

    all attention layer output linears are parallelized

commit fb3227f68714ba6ed00e67e8a242db88288cdb8e
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:25:12 2024 -0400

    parallelized BART learned positional embedding

commit 00198a633605b786c5f1fdef007c965d6284b39b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 02:22:01 2024 -0400

    BART MLPs parallelized

commit abbb42749a628f5d199b62046200a6eb85025ab8
Merge: a33b50171 a16cabb90
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:54:59 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit a16cabb9029d86221a69975935622dd53084a554
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:54:22 2024 -0400

    equalized some generation/sampling config settings between enc/dec HF,vLLM, nonetheless still not perfect match

commit a33b50171b6147ad1ff3db16adef4bb3a7819b33
Merge: 584c01e87 32967c1ca
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:35:22 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit 32967c1ca7d706f1e59cbd604b58588210aeeee3
Merge: c00e0a8b5 89c1c6a19
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 01:30:53 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c00e0a8b561a8243080ef40b1c1b8f0b8257d026
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 22 00:28:29 2024 -0400

    CommonMetadataBuilder sets block_tables constructor arg of metadata

commit a22f56c8bbb1dde2bd3a440bb0c037ed73ca17e1
Merge: ffa99b2dd 42de2cefc
Author: Andrew Feldman <[email protected]>
Date:   Sun Jul 21 22:28:38 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ffa99b2dd61cfe21222a98ed2f95d608d6f6a8a2
Merge: 41ccf0c8c 9364f74ee
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 20 16:08:20 2024 -0400

    additional merge

commit 41ccf0c8ce9079a89ace594a3a0f2eb573c2d6c0
Merge: 9fdd04705 a5314e869
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 20 16:06:16 2024 -0400

    wip merge

commit 7ace684da139b43f38a4ebc328e17056ebfbe18a
Merge: fe7786c8a c092ed476
Author: laishzh <[email protected]>
Date:   Fri Jul 19 00:27:56 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit 584c01e875e12d870312ab210dec809325482ae3
Merge: 69f0379d2 9fdd04705
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 16:59:40 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_parallel_bart

commit 9fdd0470597025057a473eb8e20946f71db54daf
Merge: c092ed476 5f0b9933e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 16:59:18 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 69f0379d24323958dd9b332884f7c57a222acfc6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:23:42 2024 -0400

    wip:

commit d7bd617c84880f477a0ce7ae3d1de1215e26748f
Merge: 31e335fd2 c092ed476
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:13:04 2024 -0400

    Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_runner_parallel_bart

commit c092ed47621f9061395ea3e89386c997f856c6b3
Merge: 949ac02c5 2fa4623d9
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:09:14 2024 -0400

    merged in upstream changes; left some formatting issues which I expect to be fixed upstream

commit 31e335fd206985f5b3791b6a3cfaa021d21d3629
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 13:03:58 2024 -0400

    wip activation parallelization

commit 88c058e8fe5ae00b39f88f57be745d1b819dbca5
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 12:23:31 2024 -0400

    wip parallelizing BART

commit 949ac02c5694069edf3338b2202717dffda276e6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 11:18:01 2024 -0400

    formatting

commit 6c940f886950ba0ae77ccb9002a161cf95b686ad
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 11:00:34 2024 -0400

    modified HF behavior in BART test to be truly greedy

commit f15eacf140810512335a7ac422b09788a1c1964e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 10:55:46 2024 -0400

    wip

commit 180884605ffd911c553c6b2585c2993204e4a629
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:34:42 2024 -0400

    formatting

commit 1f8c52fac27ed8f10b94a3ecb08e15c1118c186a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:34:29 2024 -0400

    tweaks to enc/dec example

commit 9da8fb3ef77b64c0152e3699513053e1ea4e21a4
Merge: 94c904fb5 a9a2e74d2
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 09:24:19 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 94c904fb5ff01f7e1c93b8d4a5f195ca2bea5bc0
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 08:43:16 2024 -0400

    wip parallel bart but encountering GPU count issue

commit 9f5a02c21e785704114f8c15bb829f4fe4cded55
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 08:27:53 2024 -0400

    RequestOutput & SequenceGroup now include encoder prompt in output, as does encoder/decoder example.

commit 597a07da54fa4c399e42bccbb4a14957d782e37c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:59:42 2024 -0400

    refactor

commit f54f2762f4b4d14165371e3dfc300f1ef3afa9b6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:53:12 2024 -0400

    wip refactoring

commit cac6283f60f1edc55950eaae54e74db0902ebfd8
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:25:58 2024 -0400

    added encoder/decoder example to examples test

commit b277180575d7d9c85708e2622cc6c32afbc0a383
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:17:40 2024 -0400

    formatting

commit 50ad5ffc753d1e7b39dfd55822ac0e405533168d
Merge: ef9462321 e09ce759a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:16:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ef94623218a718a437526917a8c95e933d614ee9
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 07:16:10 2024 -0400

    added examples utils w/ context manager for backend override; applied to enc/dec example to force XFormers

commit aee5f1615347dcfe2acea9abe16ac61df3404a99
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 06:14:51 2024 -0400

    fixed sequence bug

commit 3656dc6c843cbf41b99ab4b0c88a974d1cedba2e
Merge: 0cc14abc5 5fa6e9876
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 05:23:05 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 0cc14abc5a5569c6ae641c5d3efc0251fd946507
Merge: 1c6e06d0b 10383887e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 02:10:34 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 1c6e06d0be66bf8cbf98cc8429a060b60bb65700
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 02:10:12 2024 -0400

    bugfix

commit 31127faf0c4637c6b80540c9693c7d5f135416d5
Merge: c2ff615de 1d094fd7c
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:48:22 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c2ff615deebea4457721a457103d8e405346b1a5
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:44:16 2024 -0400

    format

commit f8dd4a5955ec478720531c47945ddc26e450f743
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:43:52 2024 -0400

    fixed scheduler bug

commit ef80c85f7dd3febc9c76c793427c444f9e62caa6
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:35:57 2024 -0400

    wip

commit 03aea187652fc0418d9a66f7eb5af6bc53c9e535
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 17 00:34:45 2024 -0400

    wip

commit 16c9aa2278e7f9d9b5f5ccffb085b0142a7e20ec
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 22:36:44 2024 -0400

    bugfix

commit 159c7bcf47aa86e4abbd88ad72a34e196c56626e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 21:58:15 2024 -0400

    fixed decoder-only bug

commit aea8d34385a64d6e6efa87729fee8fa4c4f15818
Merge: 713d095b4 7f62077af
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 21:09:06 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 713d095b4036404f4580225720da17d7d4e431cb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:49:17 2024 -0400

    incorporated encoder sequence into request-add functionality

commit 87ed3b6fe380f75ebdafd3bc4da003b42802c18c
Merge: 97d81f0a5 94162beb9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:17:29 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 97d81f0a53506cf6292f24117e8ecbfca5803805
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 14:17:09 2024 -0400

    encoder/decoder input processing; formatting

commit e534ffc156479d1b4dbec905ccc0877b746cc068
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 13:25:27 2024 -0400

    wip

commit 3c7e19d3d0e4c53ca363f40712fe2df160be1d9e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 10:44:23 2024 -0400

    zip enc/dec prompts; formatting

commit 850a97e812662645452989341eb44b79aa4b3276
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 10:25:38 2024 -0400

    bart parallel vocab

commit 42ac66b469891ba3085eaa1265c2bd9d445e0839
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:59:04 2024 -0400

    VllmRunner encoder/decoder methods

commit 796d7a3e7f8a67b644f6a88446e4162a09a1fbac
Merge: 374880f71 7508a3dc3
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:55:37 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 374880f71d6f81bd2a933b237ff6fa46e0324e6b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:49:30 2024 -0400

    input preparation now includes encoder-oriented input setup:

commit c5846ac9b31777d131bb0e3af2ad62a74eab1978
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:40:46 2024 -0400

    Hfrunner greedy logprobs limit

commit 92d9f486b2455ff5ea5215eb61b9cb1e375b17ff
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:33:41 2024 -0400

    conftest: encoder/decoder example prompts

commit 54ff1420cac3edccff6c751e4930f7fa1b3be247
Merge: ddaf0ade2 7a3d2a5b9
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:28:46 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit ddaf0ade21142daafc504df83e15d31911dee497
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 16 09:28:21 2024 -0400

    wip

commit 914134749aee12e273f38273ed4cfda866ec837f
Merge: 251f899ea ec9933f4a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 16:33:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 251f899ea158af33ffe1367c57137ac9ed9212ad
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 16:33:10 2024 -0400

    wip

commit f85997b4bb63352fc1bad72b54eea358f89ec5b0
Merge: 46397c74e 64fdc08c7
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 13:30:57 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 46397c74e7c094d86d4f49fc3230cb92985d5fc5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 13:30:21 2024 -0400

    wip

commit 336a77d62d2d31a2ed6c9eba9e36190b50cca713
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:34:47 2024 -0400

    formatting

commit 8dccaa510a67e8de71811c13371468024843b71d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:34:14 2024 -0400

    correctly constructing enc/dec sequences

commit dd4031c8e3201ee2e874e40df69c1bd52e7c54be
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 09:11:34 2024 -0400

    wip but having wllm.commit_id error

commit 552551137b19a9e9c2ebc13856c8e5a22834ae1b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 08:51:18 2024 -0400

    Sequence may be constructed with encoder/decoder LLMInput configurations

commit 7b0803b1bb9fbf222be2b719729b3494ade79087
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:41:25 2024 -0400

    formatting?

commit 304caed04dcbc25b76d8e80321da00414ac7dc17
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:36:33 2024 -0400

    formatting

commit 6c953808f11122a0c5482786b41825a79788a9a4
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:25:01 2024 -0400

    wip engine is_encoder_decoder() setting

commit 78d3d3c00d30af324dbd1ca0973c1dd68d4cdb5b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:20:50 2024 -0400

    modified LLM.generate() error message

commit 10ed7145053546d2112ed98252dc46f782a04b72
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:18:13 2024 -0400

    Format

commit 83c5c43dd6e06d13d9d05c01882b6d705a5aefaa
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:14:34 2024 -0400

    prompt type checks

commit 94c083cabff971da175eca616ff4b2c94299573b
Merge: 64d71980c 0cca1646d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:00:30 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner

commit 0cca1646dce64fbdf2419b7f075e15da6264ee84
Merge: db5539a85 6ae1597dd
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 07:00:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 64d71980c823c167239d5c7338096a144586b7f3
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:59:49 2024 -0400

    wip

commit ff940f7adf771465e92a6fad350fb2f1aca4f694
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:18:58 2024 -0400

    formatting

commit 8b8d9812f7b7317448d4872db32cffcb45444c02
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:17:41 2024 -0400

    refactored AttentionType and related imports; skip BART test definitions entirely if on vllm CPU version (to avoid xformers import

commit 590a240fe53dd78e62c78f7ac0263b0c3fda6949
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:05:18 2024 -0400

    Formatting

commit 760355bfeea93c7b85cf440f597485e11a7357b1
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 06:04:43 2024 -0400

    bart test skipped on CPU version of vllm

commit db5539a85f83ceaa929e2c02129a1a174fa29424
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 05:00:25 2024 -0400

    format

commit 3d5bb888cfc10c835ff17c18ca367c930a335785
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 04:48:48 2024 -0400

    EncoderDecoderModelInput correctly handles encoder token/position fields

commit 447a5c7e10b09c1e5cff95e907198d6d050f1ffa
Merge: 9ce2da454 22e79ee8f
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 15 04:29:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 9ce2da45412de77bb358c2ce97521fa6a8b7990d
Merge: c5ceb2348 eeceadaec
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 13 19:26:27 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit c5ceb23486c3f3ddd15faf8fcf06fcc1ba722fe1
Merge: 196f30cd7 41708e503
Author: Andrew Feldman <[email protected]>
Date:   Sat Jul 13 02:18:32 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 196f30cd7f25a682dc3d2320d994f949b00084a2
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 11:15:56 2024 -0400

    enc/dec decoder test working, sans sampling check

commit 9c898f5b28113ea53758c447175fd9cfd67b2066
Merge: 685604cfc f7160d946
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 09:41:15 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 685604cfcb90b6e74e37dbf5b5ee478e157f8191
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 12 09:40:42 2024 -0400

    wip modelrunner

commit f6499442e7c434c3ce4a187b344481988f106471
Merge: 9a63f51bd b422d4961
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 10 12:51:51 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner

commit 9a63f51bde8059fc361cc7abb2249ce1efb54163
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 10 12:50:40 2024 -0400

    wip model runner

commit fe7786c8a510d2280f3e25a8461474bb17ab8e11
Merge: 26b6271ca a5c28fca8
Author: laishzh <[email protected]>
Date:   Thu Jul 11 00:27:08 2024 +0800

    Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_model_runner'

commit 6a71f8f4359dab04b9811b84d338db40dafa72bc
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:23:01 2024 -0400

    formatting

commit b4a461d983ed0215777c89f6b2ecbaa754422d4e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:18:56 2024 -0400

    formatting

commit d1343aac0fe6c0063f950e3600f9264aacb0836d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 17:07:43 2024 -0400

    scheduler test passes

commit c95adf50adcdc315f63b276f52ac9a6a2d35b5fa
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:49:34 2024 -0400

    scheduler supports encoder-/cross-attention & passes existing scheduler tests, but needs new encoder/decoder-specific tests

commit 4c01f1300161bb4a16fdc27612cdace516aedebc
Merge: 2c80185fb 4d6ada947
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:38:22 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2c80185fb81602a9a39afe4137bc5f59bcb69f57
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:36:11 2024 -0400

    formatting

commit bd14d29177dda7bd1f2ddd41ccba71703dbaa07d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jul 9 16:17:24 2024 -0400

    wip scheduler

commit c90140fba9d3ec2ee8a065a267aef571e93c64db
Merge: 88e284a53 4f0e0ea13
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 17:55:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner2

commit 88e284a5344699e099e5510e5a353b9c5a54d0c7
Merge: db49d48f2 543aa4857
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 13:26:10 2024 -0400

    merge from main

commit db49d48f2a0913251385e324b28af06bd81cc121
Merge: 22d013c1d 6cd595c3c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 11:15:43 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit 6cd595c3c879d4ee603bb6a5bc0f1724647a5135
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:47:20 2024 -0400

    formatting

commit 5df73fc708bf3370a5f6d7f85cce4772d5c679b5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:47:04 2024 -0400

    xformers backend cleanup

commit d8a692b7dde0656696b726497030970aac0b53d3
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:39:37 2024 -0400

    cleaning up a number of backends & backends utils.py

commit 097aff2029e4560ae28bd7a7acf0f20509f803fe
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:36:05 2024 -0400

    vllm/attention/backends/flash_attn.py cleanup

commit 45fc9f71641bdd17c67997598463f12ead3998b2
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:35:00 2024 -0400

    vllm/attention/backends/blocksparse_attn.py cleanup

commit 5ee30fed1d27dbef98dc3e4512741c9ca301197c
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:31:09 2024 -0400

    vllm/attention/backends/abstract.py cleanup

commit 4f27946dcfb73f0a60420eb3ca6c9a74f6c6d3d1
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:27:35 2024 -0400

    tests/kernels/utils.py cleanup

commit a1bf65212cab0933b2520d8557a9d9132fff8c3d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 10:17:04 2024 -0400

    test_encoder_decoder_attn.py cleanup

commit 9ae6728ecfe48769f578b0fad3f8e3950daa683d
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:46:58 2024 -0400

    fixed specific point-changes requested by woosuk

commit 7ce9a51d4fb3e286fdaa3a3ba12e60d0908d2d64
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:38:03 2024 -0400

    merged in first pieces of woosuk feedback & latest main; formatting

commit e837a73be0b61434116d1f332a84266d05cd61fc
Merge: 07df0e158 7e0bc5725
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:36:30 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn_reviews' into infra_enc_dec_cross_attn

commit 7e0bc572541e6018a7cfcebd16ea08b26826b975
Merge: 13f5b5078 717f4bcea
Author: Andrew Feldman <[email protected]>
Date:   Mon Jul 8 09:35:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 07df0e158a60b7d2a90407eecc868eaa10a58180
Author: afeldman-nm <[email protected]>
Date:   Mon Jul 8 09:33:03 2024 -0400

    Update vllm/attention/layer.py

    Co-authored-by: Woosuk Kwon <[email protected]>

commit 5dbebbc6f3aafe706a5555119fefa519b71c4634
Author: afeldman-nm <[email protected]>
Date:   Mon Jul 8 09:32:43 2024 -0400

    Update vllm/attention/backends/torch_sdpa.py

    nit: This will reduce the number of line changes and make the code look better.

    Co-authored-by: Woosuk Kwon <[email protected]>

commit 13f5b5078cdd81f58ed88a653ecc8ddc0968c073
Merge: d81662c57 abad5746a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jul 5 15:07:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 22d013c1de08aa8bc5747c513b12e0c3dd59d144
Merge: ba09fbcd6 d81662c57
Author: Andrew Feldman <[email protected]>
Date:   Thu Jul 4 00:24:29 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit d81662c572948ca9e01db21ec5f14f71c9fd1764
Merge: 2f0eb9b59 3dd507083
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 22:59:32 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 2f0eb9b591f298879df48be6d0a74196cf32a5cf
Merge: 65e47db5a 966fe7214
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 18:58:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit ba09fbcd6b7efff359b1a0cef47c385d130b777d
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 11:32:18 2024 -0400

    refactored where a number of constants are stored, primarily constants related to encoder/decoder

commit b085795eefcf31303c5e38bd734544664b5757c5
Merge: 44c62708f 65e47db5a
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 11:22:23 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2

commit 44c62708f3645f8a82b17a63849c1822a2dca645
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 10:15:57 2024 -0400

    manually merged BART code in from previous modelrunner attempt, it won't work tho until new modelrunner is finished

commit 65e47db5a59087af005e97df20f9d1a5be466a3c
Merge: 2828aa793 7cd2ebb02
Author: Andrew Feldman <[email protected]>
Date:   Wed Jul 3 07:52:12 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 2828aa7936adab0d2ee3b49ffb0cfd01848581ab
Merge: 5ff9c7686 af9ad46fc
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 30 20:16:34 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 5ff9c7686339f8d5f8e42060c1772f43468f2459
Merge: 8d36458fb 7836fdcc1
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 30 18:21:25 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 8d36458fb640e61fd70844739d107f41c0f3e631
Merge: 64981b535 75aa1442d
Author: Andrew Feldman <[email protected]>
Date:   Sat Jun 29 14:15:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 64981b535c557ada816b338f83cccf8c11ad0f83
Merge: 83d474e93 2cd402e16
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 28 15:37:00 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 83d474e93559ebbaf51194ef818f2308fd16ef1a
Merge: a5018499e 57f09a419
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 28 10:18:17 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit a5018499e3b8475749a8d1af80e14c8d172cf2c7
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 18:57:56 2024 -0400

    reverted unnecessarily vllm/utils.py changes

commit c8f8d59d4ce7e1a3c104bd417f256e9b8f954815
Merge: bcccc3486 c3dde367f
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 17:34:16 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit bcccc34863f5864307ef9c781471cef4e5d38ba8
Merge: 75756b91e 3fd02bda5
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 13:59:00 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 75756b91e3753a9c2a60dbae42b2e46d3612ece5
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 11:28:19 2024 -0400

    removed redundant elif

commit c24697fe82c844e13c820db916efef0a6b789374
Merge: 7ca0d7a39 e9d32d077
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 27 11:23:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 7ca0d7a399da475099cf501b1f4981a7dffc067a
Merge: 4dabe1974 294104c3f
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 26 19:37:30 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit a5c28fca8f5e21653c6e5874719467e08d3d8503
Merge: ba4e2c12e 4dabe1974
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 15:52:22 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 4dabe1974766c6db8fd6ce8b6688c25bbd85b633
Merge: e2a46e3b7 dd248f767
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 15:48:31 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit ba4e2c12e6f1a03e3381cabda8902d55df9a292e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 04:05:23 2024 -0400

    Removed unnecessary position arguments from BART routine; formatting

commit 41e31e861b01896a99fba2f2ea44b717164c4398
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:59:48 2024 -0400

    BART with new explanatory comments & passing formatting tests

commit e61385d90e40b423e1e5d98839413a76ffcd11fb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:49:18 2024 -0400

    fixed bug caused by overzealous refactoring

commit 4400d7733f7dca2acffac916a00f5edc6a89e14e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:36:28 2024 -0400

    some reformatting

commit 5169a2a6518d5ae338001eae0eae6dad64bf52eb
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:25:40 2024 -0400

    removed unnecessary positions arguments from BART encoder, decoder forward()

commit d43141f20514e77963e1c13ba857b1d3cb71c210
Merge: 753bab068 e2a46e3b7
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 03:16:19 2024 -0400

    merge; a lot of formatting fixes to bart code but not fully passing

commit e2a46e3b7b9f9d1a9cc751046c3cddd1522620ed
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:53:35 2024 -0400

    formatting

commit 1a6e5a31846e2ef886b66e9cc9216ffe983d0ec0
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:52:04 2024 -0400

    moved make_tensor_with_pad() helper function back to vllm.utils

commit d23c28466765496049a1696d0a053a0a2505ce9a
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:38:08 2024 -0400

    typing and formatting; fixed escape sequences in comments

commit 2f0b05bb805513e73eb0609ea87b6367ec9d4803
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:35:34 2024 -0400

    typing and formatting

commit 47c9f396fdcd40895597423ebfefe585b014c2f3
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:32:52 2024 -0400

    removed attention_type

commit 06c7f7500140c574d20a12079dbd1ef83db29688
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:28:42 2024 -0400

    reorganized helper functions that were only being used for testing into tests/kernels/utils.py from vllm/utils.py

commit a178b7a8c9838665ee7e169471206b70d62e1b71
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:20:00 2024 -0400

    changed nested if/else to elif/else in xformers mask computation code

commit 597526a49e041ec99329add79ef272ce6e457b9e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:18:02 2024 -0400

    removed extra line

commit 125e5dc46724155f5d81e93a7644a3889e864a2f
Merge: 5ce2dd083 e9de9dd55
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:16:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 753bab06880a05726b2b8274a20d8f9d179c9576
Merge: 919bf88f8 e9de9dd55
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:14:20 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 919bf88f8925b2e60c765f309df655318c392c2e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 25 02:13:52 2024 -0400

    BART e2e test runs but does not pass

commit b7ff75fc3d3cb5d447503daa8a4a78aa6bf1a18d
Merge: 2d8429e1b ba991d5c8
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 19:25:24 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2d8429e1b0002eccb7deaa805d25ebb6d5616187
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 18:47:19 2024 -0400

    fixed a number of bugs related to BART decode-phase; added support for the particular architecture alias used by bart-large-cnn

commit 8f9ee625557ec34ec29787b6b66ec760ff390e77
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 18:06:10 2024 -0400

    wip bart-cnn summarization example

commit d58e8c8464d5bcf41121a582b035f5f290658657
Merge: 6fd4c020a 1744cc99b
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:50:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 6fd4c020a9c5ee8ecbf6e086d8b9dfefb3f8396f
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:42:09 2024 -0400

    fixed prompt processing bug that was preventing inference from starting

commit 7d2fcf90a6516be432ffd39f4571ed0a524438b2
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 15:39:07 2024 -0400

    BART passes profile run

commit 3b95225850af9b81a15142344c4c8bae7257a519
Merge: 8b8c40943 b8d5637c5
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 13:19:42 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_bart' into infra_enc_dec_model_runner_reviews

commit 8b8c40943e2e0a4b104ca65c76441d3db03a017d
Merge: 42c364439 5ce2dd083
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 13:04:54 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 5ce2dd08345da9e5a19a913214e5a73ed4923c8d
Merge: ce88fa36e c24621295
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 12:55:03 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit b8d5637c510b42a6503d9b0c4d810fe3568314dd
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 24 12:50:25 2024 -0400

    wip bart

commit 59caabecf2666c33306625843908b1d9dc2ffa8b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 21:42:39 2024 -0400

    BART almost passing profile_run()

commit f2dac1ce0ae1033b5143b8f1cd234e1eee5e67ee
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 20:13:05 2024 -0400

    wip

commit 082be510533d1e39008db19ca8754a91aa4879d3
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 19:36:46 2024 -0400

    loading tied weights

commit 42c36443981dd89c9defaf2f51c1481ddb0a5751
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 16:24:26 2024 -0400

    encoder decoder model runner fails for unsupported scenarios

commit 9ad5143ab290419d27fcde1287d9bea853a58be3
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 16:00:15 2024 -0400

    refactored backend constants

commit 001cb185141278b6ea3a2fbbf6200032104229e0
Merge: 6219d9590 ce88fa36e
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:40:19 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ce88fa36e6cdbe0352348207a6a4dc405fcd9d76
Merge: ca68c63db f1e72cc19
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:39:06 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 6219d9590dfae14c574d598ce879af58fe97177f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:36:36 2024 -0400

    Formatting

commit 576c26c86a9b210fcca29180ed20fd15770f2660
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:35:11 2024 -0400

    first pass a BART load_weights; probably not handling qkv correctly

commit c11db0fd30e326d2273da95439c5087e83725b04
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:21:15 2024 -0400

    integrating BART weight loading code

commit 2123517ef5fc8a5593e693b7d28d8c217c729282
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 15:13:36 2024 -0400

    formatting

commit 97cad4b875ee09ebeff455a20fdf351eef9d2f16
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 14:40:40 2024 -0400

    wip BART model cleanup

commit 45a53877dc815398f1f190fa7e7d513db7928b6f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 14:28:59 2024 -0400

    pruning out training functionality & unnecessary code from BART

commit 30becae9d35d4b994bcd995c81603a97b93d0e3d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 13:45:48 2024 -0400

    profiling fix; wip bart

commit d2ad2328e41ad7a8898ddbb37db8c1bfaf2ae803
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 13:37:27 2024 -0400

    wip bart integration

commit ed610b0b9a6abcdaf874d16225a441509a207076
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:09:51 2024 -0400

    pulled in bart model code

commit 28f0d2fff6752a90227aa8aa07ca32e43bee395d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:06:56 2024 -0400

    pulled in bart code

commit 213dc597274da4c963510b1d72166d0a8eddbc7b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:03:50 2024 -0400

    test_bart.py

commit 49c7162d70441963ec6c26430a8e36426fbfe1aa
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 12:01:59 2024 -0400

    formatting

commit 84c0dcc5fe2b653cb0517df523504a107055061a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 11:58:45 2024 -0400

    scheduler tests

commit c15731710bd5c317638fef4d861567031d6126b8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 11:30:25 2024 -0400

    free sequence groups

commit 614de4e13869f1b2938d1f30369bbb98752a20c6
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:54:25 2024 -0400

    formatting

commit b6d4383e141e1fc23ee0c8c6bb9a7d172949266a
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:46:15 2024 -0400

    enc/dec integrated in Scheduler.schedule()

commit 89b0e445bb32bbd5758bdcc05cd1bb869101029e
Merge: beec4f571 ca68c63db
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:27:42 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ca68c63db6ef8b9fcd132e84ffc6db1b7c7f618f
Merge: e9d7ede3b bd620b01f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:26:54 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit beec4f5717d5c8193d70449c066f2aa469bf50b0
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 10:24:50 2024 -0400

    enc/dec support in LLMEngine._add_processed_request()

commit a1ab7a110c334f54dc451f1b273c3b0f0345332e
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:50:37 2024 -0400

    removing BART test

commit 7000573396666a58cf5ca06d626f2b4c2e4f8bb2
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:49:37 2024 -0400

    temporarily removing BART work

commit 1bd916c2f91f7b8d755a9142ee3daeb7d5e489cb
Merge: 2b2d2e9df bd620b01f
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 09:38:05 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit 2b2d2e9df2b1535883e36b8353a26d52200f7783
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 08:55:19 2024 -0400

    wip encoder/decoder API integration; WIP BART integration; WIP BART example

commit e9ecd25cb733b220785611056295ea9787b1ce47
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 05:48:50 2024 -0400

    added unoptimized BART example

commit 2fccd1832a0933dca8537e436449dad4d52fa0c3
Merge: de967174d 0f645112d
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:28:07 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_bart

commit 0f645112de4e1784cd43be505e659f3d3bd56581
Merge: 58139e380 e9d7ede3b
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:27:25 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit e9d7ede3bfef92527a643809f4beb20cb780e7c0
Merge: 67ed41961 d9a252bc8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:26:01 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit de967174dcbbdb5e81d975edf158416bcbeb74cd
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 02:25:36 2024 -0400

    wip bart test

commit 58139e3808060c550264c800e605129d0082af5c
Merge: f8569facd d9a252bc8
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 01:55:08 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f8569facd10b0cbf05689bfc364831a37bb48b45
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:35:24 2024 -0400

    formatting

commit eb5819be6025f0e598831e7e13c0656e184e9524
Merge: a0068fc91 1f5674218
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:23:07 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit a0068fc9112c5acefe69f5a8e30470c73a90a039
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 21 00:21:05 2024 -0400

    Encoder/decoder model runner passes prefill/decode/empty-SG tests

commit f0094bd8a90cc26325f1ea7ca1506fc459a312c9
Author: Andrew Feldman <[email protected]>
Date:   Thu Jun 20 10:59:52 2024 -0400

    wip enc/dec modelrunner prepare_prompt test

commit 736cf45223517f5720aedc53b65258ee8a75a25c
Merge: 1581eb7f9 f9f9ae39e
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:56:31 2024 -0400

    Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_model_runner_bart

commit f9f9ae39eea1dd6367cec3b2e878e1d2f3bef4ad
Merge: a8a52d293 67ed41961
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:31:41 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 67ed419619301a39c04417b29c90822a837e6362
Merge: ea37e17ab 3730a1c83
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:29:04 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 1581eb7f978a83690e0aaa2b390be491b42ffb15
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 22:28:28 2024 -0400

    wip

commit fbec309f0cc8d94df6ba7ab3f71f172d30f73531
Author: Andrew Feldman <[email protected]>
Date:   Wed Jun 19 01:14:35 2024 -0400

    moved enc/dec error strings to top-level vllm utils

commit a8a52d2935d5a2ab969c05d498ec2423ae19507b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 23:39:15 2024 -0400

    some formatting fixes

commit 37aeed66141b10b0d43c8e6d56613806dc7108ff
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 23:35:11 2024 -0400

    enc dec model runner testable if only for encoder decoder model

commit e3ba61e368f0085fe64e8dae3d80494f5254164c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:44:23 2024 -0400

    wip

commit 3311aac9bddd474d0a7037b53c53dfc515df0bcc
Merge: f9314fd7d 59a1eb59c
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:43:23 2024 -0400

    Merge branch 'main' into infra_enc_dec_model_runner_reviews

commit f9314fd7d1ae0d3146d7456eb41e6885f0055a5d
Merge: 89fdb8116 ea37e17ab
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 22:43:07 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit ea37e17ab5ad7c084c13bf8e8492039d6a9bcdbf
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 19:16:38 2024 -0400

    merge conflict; typing; formatting

commit 91cbaa63d35e72ed0c14b65ed7f79bffdda2da97
Merge: 525303c7c 2bd231a7b
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 19:15:10 2024 -0400

    merge; resolve conflicts

commit 525303c7c61127900680ff06b6cc09610001b71e
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 18:06:33 2024 -0400

    num encoder tokens

commit 5f8c7f6cd6776cbda8289a5cee28e5cd8b858f4d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:26:24 2024 -0400

    Moved attention type for attn_metadata to attention forward(); added NotImplement failures to backends in non-decoder-only scenarios

commit c3f7da7620921e14e6c7efabeb0c54fd3d08b30b
Merge: 7b9cb7f43 13db4369d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:01:28 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 7b9cb7f4339364b66180bf5cf7015f8fea67479d
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 11:01:05 2024 -0400

    Replace attn_metadata.attention_type and attn_metadata._attn_type with attn_type argument to forward()

commit d0fd9e10ff13157183fc24dfcb558f83c716ead6
Merge: addde7d22 4ad7b53e5
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 09:58:57 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 26b6271caa9b776b0093b874ab94dc8df0bb36b9
Merge: 3ea38598e db5ec52ad
Author: laishzh <[email protected]>
Date:   Tue Jun 18 17:49:40 2024 +0800

    Merge branch 'vllm-project:main' into main

commit addde7d22cda9ab0d006538ec0f900ac593c9292
Merge: 47586807a 114d7270f
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 00:53:01 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 89fdb811629bfe86ce5aaf85e078ce953e03e700
Author: Andrew Feldman <[email protected]>
Date:   Tue Jun 18 00:52:29 2024 -0400

    first pass at _prepare_encoder_model_input()

commit c7bf81228dc06a1ed2c9d7e7e6f0d61e476e7e7b
Merge: 830a05126 47586807a
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 10:37:42 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 47586807a3e8e75c6e9c27d1d17aeb22b0dff63d
Merge: 90aec385a e2b85cf86
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 10:35:45 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit 830a051267732f60b04b99a15552ea984b9f43f8
Author: Andrew Feldman <[email protected]>
Date:   Mon Jun 17 01:16:25 2024 -0400

    format

commit e5c029926043518e63b85739d369b6cbbb9eddda
Merge: 9cb8ee685 90aec385a
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:59:32 2024 -0400

    Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner_reviews

commit 90aec385a0e77574f5b575257e29b194f6974521
Merge: e229e0018 845a3f26f
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:50:21 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit e229e0018138698bf13135f067eaf32a8cbf9167
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 22:47:04 2024 -0400

    format

commit 4dccd51c91fd3c1ae3a9ecea4baa46cad2a5f7dd
Merge: b3c3411e2 f07d51332
Author: Andrew Feldman <[email protected]>
Date:   Sun Jun 16 20:26:41 2024 -0400

    Merge branch 'main' into infra_enc_dec_cross_attn_reviews

commit b3c3411e26b7cf6f27604825d99a920c34605c9c
Author: Andrew Feldman <[email protected]>
Date:   Fri Jun 14 16:39:35 2024 -0400

    formatting

commit f06c6873d77962c7b27fc…
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
garg-amit pushed a commit to garg-amit/vllm that referenced this pull request Oct 28, 2024
sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants