sampling : add XTC sampler #9742

MaggotHATE · 2024-10-04T14:01:31Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

This is my implementation of the XTC sampler as described by @p-e-w in oobabooga/text-generation-webui#6335 , the main difference is threshold_max parameter which seems to help with finetunes. Additional "thank you" goes to @LostRuins who implemented XTC a bit earlier than me (probabilities + logits system is a bit confusing at first).

XTC is a novel sampler that turns truncation on its head: Instead of pruning the least likely tokens, under certain circumstances, it removes the most likely tokens from consideration.

Technically, this sampler:

checks for probability xtc_p
searches for all tokens with probabilities above threshold xtc_t and below upper threshold xtc_t_max
penalizes and removes all such tokens besides the one least probable of them.

This means that at least one "most probable" token is always left untouched, which should keep things stable. For more details and explanations it is better to read @p-e-w 's original PR as it is very thorough and comprehensive.

Additionally, I have not encountered problems with eos tokens that were described in discussions of the original implementation, so I have not added any checks for it.

Being a unique sampler, XTC is not included in the queue by default (like Mirostats)

In my experience, the effect varies between models and even sizes of models. XTC improves creativity greatly, but may break models in specific cases (for example, multi-language capabilities of Nemo can be slightly worsened for inflective languages). With larger models XTC may require very careful choice of parameters - I have seen more refusals in Mistral Large 123b with default XTC settings, for example (although, not on my implementation, but through Kobold.cpp).

In general, however, XTC is a really interesting idea and a sampler worth testing.

Adds XTC sampler, not activated by default, but recommended settings by default.

To be more inline with the original implementation, chance is calculated once at the beginning.

pnb

I'm not very familiar with the code base, but XTC is a very interesting idea to me so I took a read through to see how you implemented it, and left a few comments in case they are helpful. In case not, feel free to ignore. :)

common/arg.cpp

common/common.h

src/llama-sampling.cpp

Still need to look into sorting

@slaren

Thanks to @slaren for directions

Moved from conditions to a simple check at the end.

@slaren

Thanks to @slaren for explanation

David-AU-github · 2024-10-05T02:28:46Z

I have tested this with a number of models of different sizes / types via Text Generation / oobabooga on day of release.
This works very well and a strong addition to LLAMACPP.

p-e-w · 2024-10-05T03:07:57Z

What is the theory behind the xtc_t_max parameter? It seems to me that in many cases, that parameter would do the opposite of what XTC is supposed to do: Reduce creativity compared to not using XTC at all.

Here's an example: Imagine the input is ... shiver down his, and the predictions from the model are

spine 0.65
back 0.15
throat 0.12
...

With the original XTC algorithm and xtc_t set to 0.1, spine and back will be eliminated, leaving throat and the rest of the distribution. The behavior is more creative because the most clichéd continuations are gone.

Now let's say you set xtc_t to 0.1 again, and you also set the xtc_t_max parameter from this PR to 0.6. In that case, the algorithm will eliminate only back, leaving spine because it is above the upper threshold. The probability mass of back is distributed to the remaining tokens after renormalization, and the most clichéd continuation (spine), the one we want to avoid above all else, is now even more likely than it would have been if we had done nothing at all.

That doesn't seem right.

MaggotHATE · 2024-10-05T06:50:00Z

Here's an example: Imagine the input is ... shiver down his

These example and explanations are true for baseline models, but may not work for finetuned models that were trained against clichéd phrases.

Consider a model which, as a result of finetuning, is now less likely to output the phrase in your example: less likely doesn't mean "erased entirely", and such models may still be able to generate it, but with lower probability. In such case XTC will remove a more probable (supposedly less clichéd) variant and increase the chance of choosing a previously less probable cliché. In the long run it can steer such finetuned model back to its baseline, "undoing" the effect of finetuning to some extent.

At the same time XTC may still be useful for such a model if a middle range is carefully set. I would assume that it is possible to cut out a range where penalized clichés may still appear, further improving model's capabilities. In practice, I have tested only a few such models, but I was getting better (as in, less typical) responses with lowered upper threshold.

You can look at various benchmarks that estimate creativity and, more relevant in our case, so called "slop", which is exactly the type of phrases we want to avoid using XTC. It's worth testing XTC with such models too.

As a note, while testing new fixed random, I've encountered refusals on mini-magnum-12b-v1.1.Q8_0 where they never happened even on baseline Nemo Instruct, with parameters M=0.020->xtc=0.20-0.99 (99% chance) (and for statistics I was counting tokens that came through XTC sampler and that were removed: 611/37275(1.64%) ). While this is an extreme example (99% chance), it shows that it is possible to get an undesirable result in some corner cases. The addition of upper threshold simply makes XTC a bit more flexible without breaking it (default value is still 1.0).

Shifts tokens to remove the penalized ones, then puts the penalized at the back. Should make `min_keep` still viable.

p-e-w · 2024-10-06T03:10:17Z

Consider a model which, as a result of finetuning, is now less likely to output the phrase in your example: less likely doesn't mean "erased entirely", and such models may still be able to generate it, but with lower probability. In such case XTC will remove a more probable (supposedly less clichéd) variant and increase the chance of choosing a previously less probable cliché. In the long run it can steer such finetuned model back to its baseline, "undoing" the effect of finetuning to some extent.

So the purpose of xtc_t_max is to act as a countermeasure so that the finetune's anti-cliché countermeasures don't get overridden by XTC, which is itself an anti-cliché countermeasure? That's highly convoluted.

I suspect that this parameter is useless in most cases, actively harmful unless extremely carefully calibrated for each model and input, and above all else, confusing and very difficult to interpret for most users. The standard XTC threshold gives you a straightforward knob to tweak, with a clearly defined meaning: The probability that you consider "viable". By contrast, with xtc_t_max, you have to guess to what degree the finetune has already suppressed clichés, and that degree might not even be uniform across all clichés. Essentially, xtc_t_max means nothing at all. It's just a value that you can fiddle with in the hope that you will like the output better.

IMO, xtc_t_max is a philosophical mismatch for XTC, injecting complexity and volatility into an otherwise simple and straightforward sampler. I considered several additional parameters while designing XTC, but in the end decided against including any of them, because simplicity is king, and interpretability is what makes parameters useful in the first place.

MaggotHATE · 2024-10-06T07:58:47Z

So the purpose of xtc_t_max is to act as a countermeasure so that the finetune's anti-cliché countermeasures don't get overridden by XTC, which is itself an anti-cliché countermeasure?

No, it a measurement of a model's perceived creativity. If a model is already creative enough, more clichés will be less probable - and the range of tokens we perceive as undesirable would have lower ceiling.

Look at it from a broader perspective: every model needs a direction to create an output a user might need. To achieve that, we can perform an internal work and an external one on that model.

Internal work is training/finetuning - if affects the internal data of the model, and the results of such work are "baked" into the model itself. While not usually used, it is possible for any well-made model to output coherently (almost) without samplers.

External work are sampling algorithms, RAG, etc. They are used to redirect the model towards a more desirable output for every input a user might have. We can also say that prompt engineering falls into this category (as opposed to creating training data pairs, for example, which would be internal work).

Now, let's look at XTC again: the initial idea was to remove the most "boring", repetitive tokens from consideration. This means that we see the results of the internal work done on the model as "incomplete" from what we see as "creativity" of a model. To make it "complete", we use XTC and get exactly (or almost exactly) what we need.

However, if the internal work done one a model has already considered that creativity might be "incomplete", and further training was performed, that model by itself will be closer to what we want to achieve. We see the base output of such model as "more creative" in comparison to another (usually, the base/instruct versions) model. This also means that undesirable tokens will have lower probabilities during sampling.

With the original XTC, however, we will still penalize top tokens only. There are no mechanisms to control it's behavior towards upper top tokens, because raising the bar affects lower top tokens, and changing the probabilities affects all top tokens. This means that we don't differentiate between models that are more creative and models that are less creative - lower threshold cannot reflect that as it controls lower top tokens, not the upper ones.

By introducing xtc_t_max we give a user more control over the perceived creativity, making it easier to find per-model settings. It has always been the case with presets which might be sufficient for one model, but not sufficient for another. Considering the large number of models nowadays, xtc_t_max is an important addition and a useful flexibility.

because simplicity is king

I would argue that functionality is the king: giving users more options and more control over the tools they use is more important than keeping algorithms simple. KISS is important, but it should not hurt usability. In this case: original XTC is a sampler that works with ranges of tokens from top only, with xtc_t_max it works with any range of tokens, from top or in the middle. Users who don't need to fiddle with it, will keep it on default, while those who want to experiment will be able to try middle ranges. Possibilities are doubled.

1. Scan token from top till the first non-penalizable 2. Remove the last captured token (the least probable above threshold) 3. Shift all tokens to override the remaining penalizable 4. Penalize and put them at the the bottom.

MaggotHATE · 2024-10-06T11:30:58Z

I have reworked the algorithm, given that std::sort is no longer used. It should be ready now.

MaggotHATE · 2024-10-07T14:33:56Z

After tinkering with the algorithm more I've tested two options:

The more elegant one is to iterate through all tokens from back to front, which makes capturing the last op token easier:

    std::vector<llama_token_data> tokens_left;
    bool last_captured = false;

    for (int i = (cur_p->size - 1); i >= 0; --i) {
        if (cur_p->data[i].p < ctx->threshold || cur_p->data[i].p > ctx->threshold_max) {
            tokens_left.emplace_back(cur_p->data[i]);
        } else if (!last_captured) {
            tokens_left.emplace_back(cur_p->data[i]);
            last_captured = true;
        }
    }

    if (last_captured) {
        int to_remove = cur_p->size - tokens_left.size();

        if (to_remove >= 1 && tokens_left.size() >= ctx->min_keep) {
            memcpy(cur_p->data, tokens_left.data(), tokens_left.size()*sizeof(llama_token_data));

            cur_p->size = tokens_left.size();
            // since tokens are in reverse order, we'll need to sort them later
            cur_p->sorted = false;
        }
    }

The problem is, it requires sorting later, which will be performed by softmax later. That is expensive, and we can avoid it by going from the front:

    std::vector<llama_token_data> tokens_left;
    bool last_captured = true;

    for (size_t i = 0; i < cur_p->size; ++i) {
        if (cur_p->data[i].p < ctx->threshold || cur_p->data[i].p > ctx->threshold_max) {
            // capturing the last top token
            if (!last_captured && i > 0 && cur_p->data[i-1].p >= ctx->threshold) {
                tokens_left.emplace_back(cur_p->data[i-1]);
                last_captured = true;
            }

            tokens_left.emplace_back(cur_p->data[i]);
        }
    }

    // in case all candidates are penalizable
    if (cur_p->data[cur_p->size - 1].p >= ctx->threshold) tokens_left.emplace_back(cur_p->data[cur_p->size - 1]);

    if (tokens_left.size() < ctx->min_keep) return;

    size_t to_remove = cur_p->size - tokens_left.size();

    if (to_remove >= 1) {
        memcpy(cur_p->data, tokens_left.data(), tokens_left.size()*sizeof(llama_token_data));
        cur_p->size = tokens_left.size();
    }

However, the second one feels bulky. Which one is more acceptable? @slaren can I request your help here?

slaren · 2024-10-07T16:45:55Z

If I understand the algorithm correctly, this implementation seems too complicated. Since the tokens are sorted by probability, you should only need to first find the index of the last token to remove, and then move the remaining tokens to the top of the list. The implementation in textgen-webui also seems to completely disable itself if any special tokens would be removed, which I imagine is important.

MaggotHATE · 2024-10-07T17:13:19Z

you should only need to first find the index of the last token to remove, and then move the remaining tokens to the top of the list

It will be slightly more complex due to addition of max threshold, but yes, I understand. I have also added XTC into test-sampling for self-check.

disable itself if any special tokens would be removed

It's not a part of the original idea, and I haven't encountered any problems with special tokens in llama.cpp specifically - that's why I left it out for now. If any problems are found later, I'll definitely add it. Thank you!

pnb · 2024-10-07T17:16:32Z

... The implementation in textgen-webui also seems to completely disable itself if any special tokens would be removed, which I imagine is important.

Wouldn't doing so bias the responses to be shorter, on average? If EOT is a special token and can be elevated in probability by XTC when other (more likely) tokens are removed, but never decreased in probability by itself being removed, it seems like responses will be shorter.

I've tried this implementation a bit now and it does occasionally ramble a bit at the end of a response, especially when generating code (not a great use for XTC but entertaining at least). That might be caused by removing EOT, so perhaps there's good reason to avoid it. But I would definitely prefer an occasional ramble over shortened responses.

p-e-w · 2024-10-12T03:48:49Z

@slaren

I don't think it is strictly necessary to clone the exact same behavior as the original implementation, the idea can be improved, but in that case out of respect for the original author and to avoid confusion, it may be better to use a different name for the sampler. Unless that's ok with @p-e-w.

As I have stated above, the plain implementation from this PR (without special treatment for \n and EOS) is the original design. The special tokens hack was added by the maintainer against my objections because some users were complaining loudly about the behavior with certain large finetunes. As far as I'm concerned, that hack doesn't belong in XTC, and the other implementations (Kobold and ExLlamaV2) thankfully don't have it.

What is not part of the original XTC design is the xtc_t_max parameter introduced by this PR. No other implementation has that parameter, and as I have explained in detail above, I consider adding that parameter a bad idea because it complicates the user interface with an extremely capricious knob that can actually do the opposite of what XTC is designed for.

MaggotHATE · 2024-10-12T04:26:53Z

What is not part of the original XTC design is the xtc_t_max parameter introduced by this PR. No other implementation has that parameter, and as I have explained in detail above, I consider adding that parameter a bad idea because it complicates the user interface with an extremely capricious knob that can actually do the opposite of what XTC is designed for.

If a frontend doesn't implement xtc_t_max in its UI, nothing will break, though. XTC will work exactly as the original if the upper threshold is left untouched.

p-e-w · 2024-10-12T13:02:02Z

If frontends don't expose xtc_t_max then the parameter is useless. If frontends do expose it, then users will play with it, the output will become less creative rather than more creative, and people will conclude "XTC doesn't work".

I maintain that xtc_t_max doesn't belong in XTC. If you believe otherwise, there is a very simple way to resolve this: Submit that parameter in a separate PR, so that it may be evaluated on its own merit.

But please don't use the popularity that XTC has gained to try to shoehorn your own ideas into the sampler by coupling the two, and then submitting them under the title "add XTC sampler". There is nothing wrong in principle with adding parameters to a sampler, including XTC, but that is not the way to do it.

MaggotHATE · 2024-10-12T13:40:31Z

@p-e-w

to try to shoehorn your own ideas into the sampler

That was never the intention, I simply wanted it to be more flexible after experiencing more cliched responses and refusals on finetuned models, that is all. But I understand what you mean.

There is nothing wrong in principle with adding parameters to a sampler, including XTC

On that note, as was testing this XTC implementation in server UI, I looked at candidates and saw no consistency in the result any range has on a finetuned model (still using mini-magnum for testing). At times XTC pulls out clichés on default settings, but different clichés appear with different ranges. The "dimly lit" is probably the most elusive one from my current tests. However, tokens that usually lead to cliched phases have lower probabilities most of the time and, what's more important, they are more mixed with the tokens that would give less cliched results.

As such, while I'm still sure that there should be a mechanism for more control, I am not longer sure about the top range being the best of them. Alternatives I tried in the past were:

~~1. Penalizing each token above threshold regardless of their number (i.e. even if there's only one of them)~~
2. Calculating probability for each token

UPD: After testing more, I think I've found what to work with: in mini-magnum at least, penalizing tokens at around 50 to 60% results in more cliched phases, and the whole style becomes more of a typical "purple prose". In the original XTC it is partly offset by chance and penalizing tokens with higher probabilities. At the same time penalizing all tokens above ~70% gives less cliched writing overall. This can be done by penalizing exactly one top token if threshold is above 0.5, I will add it as a separate PR once this gets merged.

p-e-w · 2024-10-14T04:14:09Z

This can be done by penalizing exactly one top token if threshold is above 0.5, I will add it as a separate PR once this gets merged.

That's an interesting proposal, but it sounds like a completely new sampler, rather than an addition to XTC. I recommend submitting that mechanism as a separate sampler under a new name. It's pretty much the inverse of XTC in the way it removes tokens.

MaggotHATE · 2024-10-14T04:32:46Z

I recommend submitting that mechanism as a separate sampler under a new name.

Yes, I tested more in the meantime and see potential in two ideas:

A simple range-based removal would allow penalizing from 1.0 till 0.8, for example, doing exactly that.
A top tokens disposition would check if the difference between first two tokens is higher than X (in this case, for example, 0.65) and, if true, remove the top token. This would allow removing only the most "dominating" tokens while keeping those with more variety (for example, in candidates 40%, 40%, 10%,etc) - or vise versa if, again, a range is defined.

common/arg.cpp

common/common.cpp

common/common.h

p-e-w · 2024-10-14T04:30:55Z

examples/main/README.md

+
+By removing top tokens XTC can improve the variety of answers, break writing clichés and inhibit repition, since clichés and repeated phrases are usually more likely to appear. By keeping the last token above the threshold, XTC ensures that the answer is still coherent. XTC is meant to be used for creative tasks, but feel free to experiment with different settings for different models.
+
+Being experimental and unique, XTC is disabled by default. The recommended combination of samplers is Min-P followed by XTC on its default settings: `--sampling-seq mx --min-p 0.02 -xtc-p 0.5`.


Is it still necessary to pass an explicit sampler chain via --sampling-seq in order to activate XTC? I thought that it is now in the sampler chain by default, and disabled by having xtc_probability set to 0.

@p-e-w While XTC is included into the sampler queue by default, it is put after all other truncating samplers. As such, the recommended combinations of samplers, as per your words in oobabooga/text-generation-webui#6335 , requires passing samplers chain explicitly.

I don't get it. Seems like the order is correct by default (XTC after truncation, which Min-P is). And all samplers are set to neutral (off) by default, right? So what does --sampling-seq mx do that wouldn't happen otherwise?

@p-e-w Not all samplers - Top K is 40 by default and Top P is 0.95. So, either they need to be set to 0 and 1.0 respectively, or (which is easier and more logical) sampling queue should be limited to only the samplers we need.

😮 I had no idea! So llama.cpp users are getting crap samplers from the Stone Age without even realizing it. That's terrible. I would have expected a clean slate that samples from the raw model distribution unless parameters are set explicitly.

Anyway, this means your example command is correct, of course.

If I remember correctly, this topic has already been brought up some time ago in other PRs. However, since in most cases llama.cpp is used as a library through another app or as a server, this issue is mostly related to llama-cli users. You can look at index-new.html (new server UI): it has different "default" values with top_k and top_p turned off, and I assume any other frontend will have a payload with all parameters set as needed.

But yes, this is an issue.

Llama-cli user here via scripts... thanks for bringing this up at any rate, Improved my (custom) benchmark scores just by adjusting settings to disable Top K and Top P.

I've seen the default params adjusted a few times for llama-cli; feels like this would be a good change.

examples/server/public/index-new.html

src/llama-sampling.cpp

p-e-w · 2024-10-14T04:46:26Z

src/llama-sampling.cpp

+    delete (llama_sampler_xtc *) smpl->ctx;
+}
+
+static void llama_sampler_xtc_reset(struct llama_sampler * smpl) {


What is the purpose of this function?

AFAIK this is necessary to properly reset seed and maintain repeatability, as recommended by @slaren earlier.

src/llama-sampling.cpp

p-e-w · 2024-10-14T04:54:40Z

src/llama-sampling.cpp

+    int pos_last = 0;
+
+    for (size_t i = 0; i < cur_p->size; ++i) {
+        if (cur_p->data[i].p - ctx->threshold >= -1e-5) {


Why this epsilon instead of a regular comparison?

I added it after running tests - they were failing due to precision problem.

Ugh, no. Never change correct code to satisfy tests. If the code is semantically correct, and the tests don't pass, the tests need to be adapted to account for things such as floating point shenanigans. But changing the code itself is always wrong, unless of course the code has a bug.

The correct way to express this condition is

if (cur_p->data[i].p >= ctx->threshold) {

Nothing else will do. And the tests need to work with that, or the tests are wrong.

I'm not sure it should be considered a problem with tests (precision problem is a wider topic), but alright.

My basic point is that the tests serve the code, not the other way round. Code is only ever changed in response to failing tests if the code is found to have a bug.

I had a similar problem with tests for an experimental sampler I wrote for llama.cpp a while ago, and I was able to work around it by using a different set of token probabilities in the tests that I constructed specifically so that after probability renormalization, the resulting values were exactly representable in floating point.

src/llama-sampling.cpp

examples/main/README.md

p-e-w

I did another full review of the code and while some parts are quite subtle, I'm now pretty confident that the implementation is correct (though I did not run it myself yet).

@slaren: This is ready to go as far as I'm concerned.

examples/main/README.md

src/llama-sampling.cpp

slaren · 2024-10-15T10:54:24Z

src/llama-sampling.cpp

+    }
+
+    if (cur_p->size - pos_last >= ctx->min_keep && pos_last > 0) {
+        cur_p->data += pos_last;


This may potentially break 3rd party code that expects this pointer to be unchanged (eg. to free it after sampling). I don't think this is necessarily a problem, but we should make it clear that this pointer may be changed by the samplers, and applications should not rely on it being unchanged.

MaggotHATE added 3 commits October 4, 2024 17:51

Initial XTC commit

89640b0

Adds XTC sampler, not activated by default, but recommended settings by default.

Cleanup

9455194

Simplified chances calculation

db54ac5

To be more inline with the original implementation, chance is calculated once at the beginning.

pnb reviewed Oct 4, 2024

View reviewed changes

common/arg.cpp Outdated Show resolved Hide resolved

common/common.h Outdated Show resolved Hide resolved

src/llama-sampling.cpp Show resolved Hide resolved

src/llama-sampling.cpp Outdated Show resolved Hide resolved

src/llama-sampling.cpp Outdated Show resolved Hide resolved

MaggotHATE marked this pull request as draft October 4, 2024 15:19

MaggotHATE added 8 commits October 4, 2024 21:34

First fixes by comments

41e1665

Still need to look into sorting

Merge branch 'ggerganov:master' into master

d9c9203

Fixed trailing backspaces

f2a2a61

Fixed RNG to be reproduceable

4f8e55b

Thanks to @slaren for directions

Fixed forgotten header

6d94ba2

Moved min_keep

49cd211

Moved from conditions to a simple check at the end.

Merge branch 'ggerganov:master' into master

899e073

Fixed broken randomization

74f657c

Thanks to @slaren for explanation

MaggotHATE added 2 commits October 5, 2024 21:51

Merge branch 'ggerganov:master' into master

59e8e63

Swapped sorting for a custom algorithm

63e60de

Shifts tokens to remove the penalized ones, then puts the penalized at the back. Should make `min_keep` still viable.

MaggotHATE added 2 commits October 6, 2024 16:06

Merge branch 'ggerganov:master' into master

094caea

Algorithm rework

39940e5

1. Scan token from top till the first non-penalizable 2. Remove the last captured token (the least probable above threshold) 3. Shift all tokens to override the remaining penalizable 4. Penalize and put them at the the bottom.

MaggotHATE marked this pull request as ready for review October 6, 2024 11:27

MaggotHATE requested a review from slaren October 6, 2024 11:31

Merge branch 'ggerganov:master' into master

4c44e3d

MaggotHATE added 3 commits October 12, 2024 18:35

Removed xtc_threshold_max

9c43a01

Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc

68557eb

Merge branch 'ggerganov:master' into master

ea85a51

Fixed arg after update

cca842f

compilade mentioned this pull request Oct 12, 2024

added implementation of DRY sampler (post-refactor) #9702

Open

4 tasks

Merge branch 'ggerganov:master' into master

ea62e65

p-e-w suggested changes Oct 14, 2024

View reviewed changes

MaggotHATE added 6 commits October 14, 2024 11:43

Quick fixes by comments

44bbd63

Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc

a3e6522

Merge branch 'ggerganov:master' into master

dfef2c4

Simplified algorithm since threshold_max is removed

436a991

Renamed random distribution

3613a6d

Merge branch 'ggerganov:master' into master

17ad143

slaren approved these changes Oct 14, 2024

View reviewed changes

p-e-w suggested changes Oct 15, 2024

View reviewed changes

examples/main/README.md Outdated Show resolved Hide resolved

MaggotHATE added 2 commits October 15, 2024 09:46

Fixed tests and outdated README

2be814a

Merge branch 'master' of https://github.com/MaggotHATE/llama.cpp-xtc

28d2cff

p-e-w approved these changes Oct 15, 2024

View reviewed changes

examples/main/README.md Outdated Show resolved Hide resolved

src/llama-sampling.cpp Outdated Show resolved Hide resolved

MaggotHATE added 2 commits October 15, 2024 11:23

Small fixes

3496f58

Merge branch 'ggerganov:master' into master

050eb7a

slaren reviewed Oct 15, 2024

View reviewed changes

slaren merged commit fbc98b7 into ggerganov:master Oct 15, 2024
52 checks passed

wooooyeahhhh mentioned this pull request Oct 15, 2024

[FEATURE_REQUEST] enable XTC for llama.cpp SillyTavern/SillyTavern#2992

Open

justinsteven mentioned this pull request Oct 16, 2024

Bug: llama.cpp does not use XTC sampler when given temperature == 0 even if temperature is not in sampler sequence #9904

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling : add XTC sampler #9742

sampling : add XTC sampler #9742

MaggotHATE commented Oct 4, 2024 •

edited

Loading

pnb left a comment

David-AU-github commented Oct 5, 2024

p-e-w commented Oct 5, 2024

MaggotHATE commented Oct 5, 2024

p-e-w commented Oct 6, 2024

MaggotHATE commented Oct 6, 2024

MaggotHATE commented Oct 6, 2024

MaggotHATE commented Oct 7, 2024 •

edited

Loading

slaren commented Oct 7, 2024 •

edited

Loading

MaggotHATE commented Oct 7, 2024 •

edited

Loading

pnb commented Oct 7, 2024

p-e-w commented Oct 12, 2024

MaggotHATE commented Oct 12, 2024

p-e-w commented Oct 12, 2024

MaggotHATE commented Oct 12, 2024 •

edited

Loading

p-e-w commented Oct 14, 2024

MaggotHATE commented Oct 14, 2024 •

edited

Loading

p-e-w Oct 14, 2024

MaggotHATE Oct 14, 2024

p-e-w Oct 15, 2024

MaggotHATE Oct 15, 2024

p-e-w Oct 15, 2024

MaggotHATE Oct 15, 2024

strawberrymelonpanda Oct 15, 2024 •

edited

Loading

p-e-w Oct 14, 2024

MaggotHATE Oct 14, 2024

p-e-w Oct 14, 2024

MaggotHATE Oct 14, 2024

p-e-w Oct 15, 2024

MaggotHATE Oct 15, 2024

p-e-w Oct 15, 2024

p-e-w left a comment

slaren Oct 15, 2024


		By removing top tokens XTC can improve the variety of answers, break writing clichés and inhibit repition, since clichés and repeated phrases are usually more likely to appear. By keeping the last token above the threshold, XTC ensures that the answer is still coherent. XTC is meant to be used for creative tasks, but feel free to experiment with different settings for different models.

		Being experimental and unique, XTC is disabled by default. The recommended combination of samplers is Min-P followed by XTC on its default settings: `--sampling-seq mx --min-p 0.02 -xtc-p 0.5`.

sampling : add XTC sampler #9742

sampling : add XTC sampler #9742

Conversation

MaggotHATE commented Oct 4, 2024 • edited Loading

pnb left a comment

Choose a reason for hiding this comment

David-AU-github commented Oct 5, 2024

p-e-w commented Oct 5, 2024

MaggotHATE commented Oct 5, 2024

p-e-w commented Oct 6, 2024

MaggotHATE commented Oct 6, 2024

MaggotHATE commented Oct 6, 2024

MaggotHATE commented Oct 7, 2024 • edited Loading

slaren commented Oct 7, 2024 • edited Loading

MaggotHATE commented Oct 7, 2024 • edited Loading

pnb commented Oct 7, 2024

p-e-w commented Oct 12, 2024

MaggotHATE commented Oct 12, 2024

p-e-w commented Oct 12, 2024

MaggotHATE commented Oct 12, 2024 • edited Loading

p-e-w commented Oct 14, 2024

MaggotHATE commented Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

strawberrymelonpanda Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

p-e-w left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaggotHATE commented Oct 4, 2024 •

edited

Loading

MaggotHATE commented Oct 7, 2024 •

edited

Loading

slaren commented Oct 7, 2024 •

edited

Loading

MaggotHATE commented Oct 7, 2024 •

edited

Loading

MaggotHATE commented Oct 12, 2024 •

edited

Loading

MaggotHATE commented Oct 14, 2024 •

edited

Loading

strawberrymelonpanda Oct 15, 2024 •

edited

Loading