Align with huggingface Top K sampling #753

Abraham-Xu · 2023-08-13T12:35:15Z

main modifications
all modifications are in vllm/model_executor/layers/sampler.py

Reverse the order of softmax and _apply_top_p_top_k function in forward function of class Sampler. So the _apply_top_p_top_k will use logits as input instead of probabilites.
In _apply_top_p_top_k function, it computes the temporary probabilities ( softmax of logits ) firstly for the top_p process to locate the indexes whose cumulative probability exceeds the probability p. ( ATTENTION: the output of _apply_top_p_top_k function is still logits instead of probabilities. This is the main difference from the original code. )
Change the top_p and top_k masked probability value from 0 to -float("Inf").
In _sample_from_prompt function, use torch.multinomial without parameter "replacement=True".

tested result
The input probabilites distribution of torch.multinomial for the first token is the same as for huggingface/transformers under the same weights and input sentence.

test code:
huggingface/transformers

PATH_TO_CONVERTED_WEIGHTS="/data/xutianci/llama_hf/"
PATH_TO_CONVERTED_TOKENIZER="/data/xutianci/vllm/llama-tokenizer/"

from transformers import AutoTokenizer,  AutoModelForCausalLM
import torch
import numpy as np

model = AutoModelForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

prompt = "Hello, my name is"
inputs = tokenizer(prompt, return_tensors="pt")
print(f'inputs={inputs}')

# Generate
generate_ids = model.generate(inputs.input_ids, do_sample=True, max_new_tokens=1)
print(f'generate_ids={generate_ids}')
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(f'output={output}')

vllm

from vllm import LLM, SamplingParams
import torch
import numpy as np

prompts = [
    "Hello, my name is",
]
sampling_params = SamplingParams(top_k=50, max_tokens=1)

llm = LLM(model="/data/xutianci/llama_hf/", tokenizer="/data/xutianci/vllm/llama-tokenizer/")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

related issue
https://github.com/vllm-project/vllm/issues/718

zhuohan123

LGTM! Thanks for the fix! Committed some code to make sure the format is correct and to rename some variables.

Abraham-Xu · 2023-08-16T06:30:31Z

LGTM! Thanks for the fix! Committed some code to make sure the format is correct and to rename some variables.

Glad to be merged. Thanks for checking and revising! The renaming made the algorithm more self-explanatory!

Abraham-Xu changed the title ~~Align with huggingface greedy search~~ Align with huggingface Top K sampling Aug 13, 2023

Abraham-Xu force-pushed the fix-sampler branch from 92c494c to 0a8cd93 Compare August 13, 2023 12:51

Align with huggingface top_k sampling

c049b7a

Abraham-Xu force-pushed the fix-sampler branch from 0a8cd93 to c049b7a Compare August 13, 2023 12:54

zhuohan123 self-requested a review August 15, 2023 22:12

zhuohan123 added 2 commits August 15, 2023 22:38

fix format

034963a

rename probs -> logits & remove one extra softmax

4ced78b

zhuohan123 approved these changes Aug 15, 2023

View reviewed changes

zhuohan123 merged commit d174437 into vllm-project:main Aug 15, 2023
2 checks passed

This was referenced Aug 25, 2023

Regression in vllm outputs for starcoder family #819

Closed

Set replacement=True in torch.multinomial #858

Merged

randxie pushed a commit to randxie/vllm that referenced this pull request Aug 29, 2023

Align with huggingface Top K sampling (vllm-project#753)

64c3426

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Align with huggingface Top K sampling (vllm-project#753)

14a7c7b

sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024

Align with huggingface Top K sampling (vllm-project#753)

484c5ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align with huggingface Top K sampling #753

Align with huggingface Top K sampling #753

Abraham-Xu commented Aug 13, 2023

zhuohan123 left a comment

Abraham-Xu commented Aug 16, 2023

Align with huggingface Top K sampling #753

Align with huggingface Top K sampling #753

Conversation

Abraham-Xu commented Aug 13, 2023

zhuohan123 left a comment

Choose a reason for hiding this comment

Abraham-Xu commented Aug 16, 2023