advancements in n_batch size #6672
wiseman-timelord
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In short: I advise upping the default max to 4096
max n_batch is currently 2048 tokens, but I wasnt satisfied for short story generation, so I hacked the files to use 8192 max n_batch...on
llama 3.2 3b gguf
based model, i have been using this for a day, and it works fine and outs ~2300 characters in acceptable quality and context, but I saw no extra characters for making it higher. For the record, the prompt is, multi-section and multi-line and multi-format, in 3850 characters, which is a fair test on a 3b. I advise upping the max n_batch to 4096 tokens, and the other applicable relating (possibly hidden) max values (soes its utilized correctly and not limited). I imagine llama 3.2 onwards will be good for such use (story output).Beta Was this translation helpful? Give feedback.
All reactions