Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove token functions with context args in favor of model #3720

Merged
merged 10 commits into from
Oct 23, 2023
Merged

remove token functions with context args in favor of model #3720

merged 10 commits into from
Oct 23, 2023

Conversation

MarcusDunn
Copy link
Contributor

@MarcusDunn MarcusDunn commented Oct 21, 2023

changed the following to take in llama_model instead of llama_context

  • llama_token_get_text
  • llama_token_get_score
  • llama_token_get_type
  • llama_token_bos
  • llama_token_eos
  • llama_token_nl
  • llama_token_prefix
  • llama_token_middle
  • llama_token_suffix
  • llama_token_eot

Special tokens are a property of the model - not the context (which is how it is currently expressed in llama_token_bos and co).

As a model is always attainable when one has context (via llama_get_model) these new variants supersede the old ones.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some recollection that we had these functions before, but I decided to remove them for some reason. However, now I don't see what could have been the reason.

Will leave this open for a day or two and if we don't see a problem - we can merge and potentially deprecate the context alternatives

llama.h Outdated Show resolved Hide resolved
@ggerganov ggerganov added the need feedback Testing and feedback with results are needed label Oct 22, 2023
@slaren
Copy link
Collaborator

slaren commented Oct 22, 2023

In #3301 I moved the tokenization functions to take a llama_model. Leaving these in llama_context was an oversight. My preferred solution would be to remove the functions take a llama_context entirely, and change them to take a llama_model instead. There is no need to duplicate them.

MarcusDunn and others added 4 commits October 23, 2023 09:08
Co-authored-by: Georgi Gerganov <[email protected]>
- `llama_token_get_text`
- `llama_token_get_score`
- `llama_token_get_type`
@MarcusDunn MarcusDunn changed the title added llama_model_token_* variants to all the llama_token_* functions that takes in model instead of model_context remove token functions that with context args in favor of model Oct 23, 2023
@MarcusDunn MarcusDunn changed the title remove token functions that with context args in favor of model remove token functions with context args in favor of model Oct 23, 2023
@MarcusDunn MarcusDunn requested a review from ggerganov October 23, 2023 16:44
@MarcusDunn
Copy link
Contributor Author

MarcusDunn commented Oct 23, 2023

after @slaren's suggestions - there are small changes to nearly all the examples and this becomes an API breaking change - is there any extra work that should be done to accommodate that? (doc updates? / release notes? / hot topics?)

@ggerganov ggerganov merged commit 5be6c80 into ggerganov:master Oct 23, 2023
32 checks passed
mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Oct 27, 2023
* master: (350 commits)
  speculative : ensure draft and target model vocab matches (ggerganov#3812)
  llama : correctly report GGUFv3 format (ggerganov#3818)
  simple : fix batch handling (ggerganov#3803)
  cuda : improve text-generation and batched decoding performance (ggerganov#3776)
  server : do not release slot on image input (ggerganov#3798)
  batched-bench : print params at start
  log : disable pid in log filenames
  server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768)
  server : do not block system prompt update (ggerganov#3767)
  sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765)
  cmake : add missed dependencies (ggerganov#3763)
  cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749)
  Add more tokenizer tests (ggerganov#3742)
  metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754)
  Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)"
  issues : separate bug and enhancement template + no default title (ggerganov#3748)
  Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746)
  llama : remove token functions with `context` args in favor of `model` (ggerganov#3720)
  Fix baichuan convert script not detecing model (ggerganov#3739)
  make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need feedback Testing and feedback with results are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants