-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent errors with webgui llm answers when json decoding fails. #177
Comments
Same with LMStudio and also the dolphin-2.1-mistral-7b modell |
To answer my own question partially: I found what gives the parameters for the query. So I tried to change I also tried adding Maybe adding a Besides possibly fixing some of the JSON parsing problems one may also want to change some of the parameters to shape the models answers. P.S.: I also think the program should just tell you that it failed and not crash when JSON parsing fails or no answer is generated. |
Just to clarify what's going on here (just in case it's not clear, sorry if I'm explaining what you already know):
The raw output string from the LLM is two JSON objects, whereas it should just be one:
We could potentially get around this by running a JSON parser that checks for matching open/close braces ( Also, just looking at this output it seems like there's another error (that's semantic) - the bot should have added tl;dr: there are lots of MemGPT specific parsing hacks we can use here to make MemGPT perform better |
Another JSON decoding error from bebo on Discord: https://pastebin.com/nJeAxHvZ |
I am using LM Studio and have tried every model under the sun and always get Exception: Failed to decode JSON from LLM output every time I try to run MemGPT. I have no problems having a chat in LM Studio with any of the models. Currently I am testing TheBloke - zephyr beta 7B q5_k_m gguf. It seems to be a very capable model when chatting in LM Studio. But once again as soon as I start my LM Studio server and run through the setups using Anaconda Prompt set all of my variables with It thinks for several minutes then always returns the same results with Exception: Failed to decode JSON from LLM output. I can include all of the information from my LM Studio and from MemGPT but I am not sure exactly what is needed to asses the issue. I am running a windows laptop, Aspire 5, not great for this but just need a bit more patients waiting for the responses, not a problem for me. |
@Rivelyn were you able to get this working? This seems like it might be a model error but it's really hard to check without matching model settings we've tested exactly. If you're on discord ping me and I'll help you get set up. If you're not on discord, can you try running a dolphin-2.1 model instead and report back? Eg try to make your LM Studio look like this exactly (maybe a lower quantization if the model doesn't fit on your computer): |
For someone that might be reading this here and not on discord I will update what is happening so far. As of yet I have not changed to dolphin-2.1. I can, I have been impressed with the Zephry model so far so I was hoping to get it operational. LM Studio updated version from yesterday 0.2.8 Thinking ran for a while then either LM Studio or MemGPT went into a loop on startup and finally gave me this error: Exception: Hit first message retry limit (10) My environment is Windows 11 on an Aspire 5 16GB laptop. I have no GPU, and the chat with Zephyr model is good, just for anyone wondering. I actually tried the larger version of Zephyr and didn't really notice any difference for my testing. Using Anaconda I created my environment and cloned the repo, then set my commands as above. |
What about a stop sequence like "\n }\n}" ? It works for me in my tests with dolphin and openchat3.5 using koboldcpp. |
Hey @oderwat @Rivelyn @Drake-AI We just added some extra JSON parsing code in #269, which hopefully should fix many of the common issues like double-JSON and run-on JSON (eg b/c of missed stop tokens). In the PR top comment you can see an example of how the new parsing hacks prevent common bad JSON outputs that people have been experiencing. Hopes this helps! Closing this particular issue because it's about double-JSON which should be fixed now, but feel free to re-open or continue commenting. |
Looks like the new PR solves a few things, good job. For dolphin i am using a modified wrapper, just like airoboros but changing some things and it works well. |
@Drake-AI I guess you do not use the original dolphin wrapper either. I am on legacy code and don't plan to change to the current main, because it seems still to break my models. They just act very different, and I have no fun in debugging that. |
@oderwat That's weird, the model depends on prompt and parameters sent to webui, should not change if the prompt is the same. I tried both and never noticed that, but i use koboldcpp as endpoint, it is pretty much like webui but just for GGUF models. For dolphin I use a modified wrapper based on the airoboros wrapper, works well with inner thoughs even and with the new version of dolphin 2.2.1. But now i'm trying openchat3.5 and so far is better than dolphin, but needs a custom wrapper too. |
@Drake-AI I am not sure what is happening with the new code. It ignored all persona information. It also seemed that the inner dialog are not working the same way. I got a lot of replies that had basically a very good answer as inner dialog, but the actual message was a bad version of that. But that may already be fixed or was user error. I just don't have the time to check it out again. My PR's for "retry / rethink / rewrite / dump message" functionalities are also just "stuck". My timezone PR got a "do it yourself, we don't care". I have more important stuff to do. |
@oderwat May be fixed, try again when you have time. I tried your retry command and works well, thanks. For timezone i modified it and just use datetime.now(), so it takes the time from the system. |
I found that most longer answers from the webgui backend will fail to JSON decode. From the data I get to see, it looks like as if the LLM returned an incomplete message. I wonder if one could make the LLM "continue" the message when decoding of it fails. Maybe also try to repair a JSON answer if it is "just missing" the proper ending? I also wonder if one could give the backend a grammar or maximum new token length or how to set any parameters. Is that done inside webgui in that case?
When using "--model dolphin-2.1-mistral-7b" with the same model loaded into webgui I also get this kind of malformed JSON when trying to store into memory
> Enter your message: my name is Horst and I am 42 years old. save that in your memory.
The text was updated successfully, but these errors were encountered: