Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return correct finish reasons in generate_text_func #210

Merged
merged 4 commits into from
Oct 31, 2023

Conversation

dtrifiro
Copy link
Contributor

@dtrifiro dtrifiro commented Sep 28, 2023

The finish reason in generate_text_func is currently broken (always returns OTHER). The reason is that a comparison is being made between a tensor element (int) and a string (eos_token) without converting it using the tokenizer

This PR fixes the tokenization issue and adds a few type hints

Full reproduction screiopt below

Reproduction script

#!/bin/sh
set -exu

tempdir="$(mktemp -d)"
cd ${tempdir}

python -m venv .venv 
source .venv/bin/activate
pip install git+https://github.com/caikit/caikit-nlp

grpcurl="${tempdir}/grpcurl"

# this requires git-lfs to be installed and `git lfs install` to have been run
if ! command -v git-lfs &>/dev/null; then echo "this requires git-lfs"; exit 1; fi

git clone https://huggingface.co/google/flan-t5-small

# convert the model for caikit
python -c 'import caikit_nlp; model = caikit_nlp.text_generation.TextGeneration.bootstrap("flan-t5-small"); model.save("flan-t5-small-caikit")'

CAIKIT_LOG_FILE=$(mktemp)

export TRANSFORMERS_CACHE="$PWD"
export RUNTIME_LIBRARY=caikit_nlp
export RUNTIME_LOCAL_MODELS_DIR="$PWD"

python -m caikit.runtime.grpc_server &>$CAIKIT_LOG_FILE &

# wait for caikit to properly start
timeout --signal=SIGINT 30 bash -c "grep -m1 'Caikit Runtime is serving on port' <(tail -f $CAIKIT_LOG_FILE)"

cat $CAIKIT_LOG_FILE

# install grpcurl to actually query the endpoint
curl -sL https://github.com/fullstorydev/grpcurl/releases/download/v1.8.7/grpcurl_1.8.7_linux_x86_64.tar.gz | tar zxvf - grpcurl
./grpcurl -plaintext -d '{"text": "At what temperature does liquid Nitrogen boil?"}' -H "mm-model-id: flan-t5-small-caikit" localhost:8085 caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict
rc=$?

kill %1
exit $c

@dtrifiro
Copy link
Contributor Author

Tagging @Xaenalt for visibility

@dtrifiro
Copy link
Contributor Author

dtrifiro commented Sep 28, 2023

A small sidenote here:

if generate_ids[0][-1].item() == eos_token:
finish_reason = "EOS_TOKEN"
elif generate_ids.size(1) - 1 == max_new_tokens:
finish_reason = "MAX_TOKENS"
else:
finish_reason = "OTHER"

If finish_reason is set to OTHER, caikit currently fails to serialize the response because that's not defined in the FinishReason enum in caikit:
https://github.com/caikit/caikit/blob/4d01c9b4caded5492ca5772875f0d8778143a657/caikit/interfaces/nlp/data_model/text_generation.py#L35-L44

Would it make sense to add it there?

@Xaenalt
Copy link
Contributor

Xaenalt commented Sep 28, 2023

A small sidenote here:

if generate_ids[0][-1].item() == eos_token:
finish_reason = "EOS_TOKEN"
elif generate_ids.size(1) - 1 == max_new_tokens:
finish_reason = "MAX_TOKENS"
else:
finish_reason = "OTHER"

If finish_reason is set to OTHER", caikitcurrently fails to serialize the response because that's not defined in theFinishReason` enum in caikit: https://github.com/caikit/caikit/blob/4d01c9b4caded5492ca5772875f0d8778143a657/caikit/interfaces/nlp/data_model/text_generation.py#L35-L44

Would it make sense to add it there?

I think it does make sense to add it there, especially if it's being referenced here. @gabe-l-hart is this intended?

@gabe-l-hart
Copy link
Contributor

Hm, that does seem troubling. @gkumbhat I'll defer to you on this, but I think we probably do need to add OTHER to the enum in caikit/interfaces

Copy link
Collaborator

@gkumbhat gkumbhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a small comment, but otherwise looks good. Thanks for catching this and contributing the fix.

@@ -36,6 +36,11 @@
# Local
from ...data_model import ExponentialDecayLengthPenalty

if TYPE_CHECKING:
# Third Party
from transformers import AutoModel, AutoTokenizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets include this as a dependency since anyways all these modules anyways do need to work with transformers

Copy link
Contributor Author

@dtrifiro dtrifiro Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here: get rid of if TYPE_CHECKING? transformers is already a dependency in pyproject.toml

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtrifiro yep. thats exactly what I was suggesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gkumbhat
Copy link
Collaborator

Actually OTHER here is representing a lot of possible scenarios. We need to map it back to proper reasons I think. The enum we have in caikit is aligned with HF and with TGIS here. Adding OTHER in caikit interfaces would cause confusion and would not be informative. 🤔

@dtrifiro
Copy link
Contributor Author

dtrifiro commented Sep 29, 2023

Thinking about it some more, the only other finish reason that should be returned by this function is STOP_SEQUENCE.
By quickly glancing at the code, I'm not sure this scenario works right now. I propose opening a new issue and tackle this in other PR.

I think the overall objective would be to get rid of OTHER here since it breaks the protobuf protocol anyway.

EDIT: I ended up adding some code to handle the STOP_SEQUENCE finish reason

@dtrifiro
Copy link
Contributor Author

dtrifiro commented Oct 7, 2023

@gkumbhat Another fix which was required in order for CI to pass with the new exception instead of OTHER as finish reason, is to fix the finish_reason logic for MAX_TOKENS, since it was currently broken causing most tests to break. See the last commit

@@ -237,7 +237,9 @@ def generate_text_func(
generate_ids[0, -1] == tokenizer.eos_token_id
):
finish_reason = "EOS_TOKEN"
elif generate_ids.size(1) - 1 == max_new_tokens:
elif (generate_ids.size(1) - 1 == max_new_tokens) or (
generate_ids.size(1) - inputs["input_ids"].size(1) == max_new_tokens
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. technically input will only be in output for causal-lm type of models, so there can be some side-effects in this calculations 🤔 May be we would need to check if input is actually in output and then calculate max_new_tokens for that accordingly

Copy link
Contributor Author

@dtrifiro dtrifiro Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkumbhat I agree. I gave some thought to it and I ended up with the logic in the latest push. The idea is the following:

  1. Check if the last token is eos_token, if so, finish_reason = "EOS_TOKEN"
  2. Check if the stop sequence is the the generated tokens, if so, finish_reason = "STOP_SEQUENCE"
  3. If none of the above conditions are true, then finish_reason = "MAX_TOKENS"

My reasoning for always returning MAX_TOKENS comes from looking at caikit.interfaces.nlp.data_model.text_generation.FinishReason. Here's the code with some comments I added

@dataobject(package=NLP_PACKAGE)
class FinishReason(Enum):
    NOT_FINISHED = 0 # should only be returned by streaming implementations
    MAX_TOKENS = 1  
    EOS_TOKEN = 2  # matches rule #1 above
    CANCELLED = 3  # should not be set by generate_text_func
    TIME_LIMIT = 4 # should not be set by generate_text_func
    STOP_SEQUENCE = 5 # matches rule #2 above
    TOKEN_LIMIT = 6 # not sure about this
    ERROR = 7  # fairly generic, 

I'm unsure about TOKEN_LIMIT (how is this different from MAX_TOKENS?), but I think none of the other enum values apply to generate_text_func .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOKEN_LIMITrefers to the maximum number of tokens limit defined by the model whereas the MAX_TOKENS refers to the maximum number defined by the user. So one can reach TOKEN_LIMIT before MAX_TOKENS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkumbhat Ok, so that means that also need to check whether we reached the token limit for the model, although I'm not sure how we can get that information. Can we open an issue for adding support for TOKEN_LIMIT so that we can work on that on another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #253

@dtrifiro dtrifiro changed the title return correct finish reason in generate_text_func return correct finish reasons in generate_text_func Oct 27, 2023
…text_func

"OTHER" is an invalid value for caikit.interfaces.nlp.data_model.text_generation.FinishReason,
resulting in failed serialization of responses when querying the text
generation endpoint.
For `generate_text_func`, it is reasonable to assume that if the finish
reason is not `EOS_TOKEN` or `STOP_SEQUENCE`, it must be `MAX_TOKENS`.

Signed-off-by: Daniele Trifirò <[email protected]>
Copy link
Collaborator

@gkumbhat gkumbhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dtrifiro
Copy link
Contributor Author

Hey @gkumbhat, is there anything I can do to get this merged?

@gkumbhat gkumbhat merged commit c5ab581 into caikit:main Oct 31, 2023
4 checks passed
@dtrifiro dtrifiro deleted the fix-finish-reason branch October 31, 2023 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
Development

Successfully merging this pull request may close these issues.

5 participants