-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[User] The server crashes when a wrong n_keep
is set
#3550
Comments
I can reproduce the problem. Near the top of for (size_t i = params.n_keep + 1 + n_discard; i < embd.size(); i++)
{
printf("--> embd[%zu] = embd[%zu] (%d, %d, %zu)\n", i - n_discard, i, params.n_keep, n_discard, embd.size());
embd[i - n_discard] = embd[i];
} With
Since edit: Pretty sure something really serious is going wrong, as far as I can see |
This actually weirder than just the size. Using the example with everything 32, there has to be at least 28 whitespace characters at the start of the prompt. For example I couldn't find anywhere in the code where it seemed to be doing whitespace trimming so I really have no idea of what's going on, but I'm pretty sure it's not just a problem with the calculating but something else affected by the actual prompt content that's causing this. |
Yes, the bug happens only with a specific set of arguments and prompt length. I stumbled upon it by pure chance. The other weird thing is that the bug does not happen after any other successful request is performed. Maybe it has something to do with the caching. Also, the prompt doesn't have to be whitespace-only or even have any whitespace to begin with. Initially I had this crash on a long random text (1000+ tokens, but the context window was still 32), where if you remove one word, the crash disappears. It also crashes on a sequence of ":"
Again, remove or add any ":" and the crash does not happen. Initially I thought that the crash solely depends on the number of tokens, however it doesn't seem to be the case. My first example with newlines has 33 tokens (according to the |
Really weird. I may look into this more if I get a chance but I'm not really too familiar with the server (I've actually barely even used it). Maybe someone who knows more about that will respond, the bug label may make it more visible. This definitely seems like a bug, it's possible it may affect the other examples as well since they use similar logic. |
Going out of memory bounds in C/CPP doesn't immediately guarantee "crash", as memory blocks aren't fenced exactly on the boundary It's rather "normal" (typical) to have to overflow in a specific and nondeterministic way in order to cause hard crash. Many many years ago, there was |
I think you can get the same bounds checking with address sanitizer. |
I actually did my tests with address sanitizing turned on so I'm pretty confident this isn't the symptom of earlier memory corruption or something like that. The initial calculation that results in negative values is definitely wrong, but the first actual invalid memory access was when it accessed embd 1 past its valid range. quick edit: So when I first managed to trigger it, I meant when the sanitizer saw the invalid access. After that, I actually copied out the |
I looked into In particular I believe the logic for trimming the prompt in case it is too big for the context is flawed and leads to situations where there is no space for the next token which causes the segfault you mentioned, because Also, the I created a pull request to fix in #3639 Give it a try. |
Can confirm that with the patch in #3639, the server does not crash anymore on the provided examples. |
I did another attempt at fixing this in #3661 after @z80maniac found a performance regression in my patch #3639. |
This was fixed. Not sure in which commit though. |
Prerequisites
Expected Behavior
Running the
server
with a specific context window and then passing a specific wrongn_keep
with a specific number of tokens. Since I'm passing a wrong parameter, I expect the server to either give an error or just generate something anyway.Current Behavior
The server crashes.
Environment and Context
Commit: db3abcc
OS: Kubuntu 23.04
Steps to Reproduce
I used this model:
https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/blob/main/llama-2-13b-chat.Q4_K_M.gguf
(but it also crashes with other models too)
The server is built with just
make
, no other params. But it also crashes withLLAMA_CUBLAS=1
.Start the server with the context window of 32:
startup log
n_keep = 32
and a specific number of tokens inprompt
(not quite sure how many):Failure Logs
The server just crashes with this:
Failure Information (for bugs)
The crash happens only with the parameters described above. For example, I tried other values other than 32, but with them the server works as usual. Maybe it crashes with any other combination of parameters, but I can only reproduce it like this.
I assume that
n_keep
is wrong because it's the same size as the context, which I assume makes no sense. However, if theprompt
is one token longer, the server actually generates something.1 more token in the prompt
Also, the server crashes only on the very first request. If the abovementioned request is done after some other request, the server does not crash.
The text was updated successfully, but these errors were encountered: