-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infering logits from model.forward
for the entire batch instead of the last forward's output.
#73
Comments
model.__call__
instead of the last forward's output.model.__call__
for the entire batch instead of the last forward's output.
model.__call__
for the entire batch instead of the last forward's output.model.forward
for the entire batch instead of the last forward's output.
Thanks for reporting @michaelfeil - we'll get back to you soon. |
@micwade-aws Thanks, looking very much forward to your answer. FYI @jimburtoft our discussion today |
+1 on this thread. Furthermore, any way to get the hidden states of the last layer? |
@michaelfeil Here is one thing you could try: To return model forward scores during inference, you can use the Here is an example of how to use this wrapper to access the model forward scores: # Model config object
config = ...
# Create your Neuron model
neuron_model = ...
# Compile you Neuron model
neuron_model.to_neuron()
# Create the Hugging Face wrapper model
neuron = HuggingFaceGenerationModelAdapter(config, neuron_model)
# Run inference using the Hugging Face generate API
# Pass in `output_scores=True, return_dict_in_generate=True` to return the scores
result = neuron.generate(inputs, ..., output_scores=True, return_dict_in_generate=True)
# Retrieve the tokens
tokens = result.sequences
# Retrieve the scores
scores = result.scores For additional information about the Let me know if this solves the original issue. |
@jluntamazon Thanks for your response! My issue was more directed to get the whole sequences logits, specifically to estimate the metrics for lm-eval-harness. |
Hi @michaelfeil: We added the ability to return all input prompt context encoding logits in the 2.19 Release. This is enabled by setting Please note that the Here is an example of how to use
Please let us know if this provides the behavior you are looking for. |
I am trying to retrieve the logits from the model, to use https://github.com/EleutherAI/lm-evaluation-harness/blob/692e0f83b5341b543fa288f84289617f793e4e93/lm_eval/models/huggingface.py#L972
Huggingface
transformers
In
transformers
I can the logits from the forward pass:transformers-neuronx
In simple Pytorch words
Update: 1/11
I got this to work. However, I am using the undocumented
cache_ids
feature.The output seems correct, but the code is terribly slow,. My local laptop gpu RTX3060M runs TinyLLama1.1B around 25x faster.
Using LLamaForSampling, inf2.8xlarge instance, tp_degree=2, neuron 2.15.9
The text was updated successfully, but these errors were encountered: