This methodology has been superseded by Batched Multi-Contextual Token Sampling.
A novel methodology for improving the needle in a haystack capabilities of chat SLMs.
- Each context window of previous messages responds as if it is the sole context
- The highest probability next token across all context windows is sampled
- Each context window is concatenated with the same next token
- Only tokens that have appeared in previous chat logs can be used in the response
- This gives the agent the ability to adapt to the users vocabulary and style of conversation overtime
Small Language Models struggle to generate accurate responses in long-context settings, such as chat modelling. With this novel methodology, the logits for the next token are computed for each chunk of context separately. Furthermore, the masking of unseen tokens further increases accuracy, particularly with respect to information such as dates and times.