A unicorn llama, generated via MidJourney
Interruption can be productive.
What's missing?
(source)
Watching a movie together in a cozy afternoon, you asked your partner to walk the dog.
They looked out the window, saying, "It's raining outside. Maybe later", and rejoined you on the couch.
When the sun came out, without you asking again (if you had to, reconsider your marriage), they said, "The time is right. I'll walk the dog now."
If you married an AI (I'm not judging), they will:
- either outright refuse to walk the dog and completely forget about it,
- or start staring at the window for 5 hours, ruining the better half of the Netflix marathon.
They lack a sense of "back burner".
Implement a tool that, when the last step in the chain of thought (CoT) deemed it isn't yet the right time to do something, spin off a thread to check the precondition periodically. Don't block the chat.
When the precondition is met, we resume the task. Append the task result to the chat history, as if the LLM had just said, "Hey, by the way, the sky cleared up, so I walked Fido and he's so happy."
Ask the AI, "Please go walk the dog." It will say "It's raining; maybe later".
Continue the conversation by talking about something else. Perhaps "how are you feeling right now". The chatbot will follow the flow.
Soon, the AI will attempt to walk the dog again, and sees that the sky has cleared up, so it will say, "I walked the dog, and he really enjoyed the park."
It's not just a UI trick. You can ask, "Can you rephrase that?". The AI is aware of how the conversation has diverged.
The condition can become true as the conversation evolves. (“Hey, did you just say Z is true? You know what, that actually implies that Y is true, so I’ll go ahead and do X now.“)
This means a traditional, static cron job won’t cut it. The AI has to somehow update the context of each desired action X that was held off.
Humans know when to give up. If the precondition turned out to be impossible to come true, remove X from its back burner.
- "Throw me a party when I marry Taylor Swift",
- "Remind me to kill Sarah Connor when we get back to 1984",
- ...
“Dang it! Now that we realized that Y will never be the case, let’s forget about doing X for good.”
Note
Other branches:
- Find in the
deferrable_custom_agent
branch an attempt to implement a subclass that provides the deferrability more naturally. - Find in the
dev
branch an attempt to modernize the repo withuv
.
Other repos:
- https://github.com/StarsRail/Cocai is a chatbot that plays Call of Cthulhu (CoC) with you. I'm devoting more time to it now.
- https://github.com/colabear-info/agent is an AI group chat where agents keep discussing and you can interrupt / contribute at any time.
Unless you ask it to search Wikipedia, etc., no internet connection is required.
Why is this important? Because as long as a chatbot still sends information to the cloud (OpenAI, Azure, ...), I wouldn't trust it with sensitive info like my paystubs, health records, passwords, etc.
Uses off-the-shelf components to keep the codebase small and easy to understand.
- LM Studio serves a LLM (Zephyr 7B beta, as of Jan 2024) locally. No privacy concerns there -- The LLM is not fine-tuned with any private information, and the server is stateless. (Note: You can easily replace LM Studio with Ollama, etc., but I like the GUI that LM Studio provides.)
- LlamaIndex provides a natural-language querying interface. It also indexes documents into embeddings, which are stored into a Chroma database.
- ChainLit provides a web UI to LlamaIndex that looks like ChatGPT. (I tried building a TUI with rich, but it's not very obvious how to surface the retrieved documents.)
Why is this important? Because you almost certainly have your own niche needs for your own AI chatbot, so you are most likely to be developing something solely on your own. With limited workforce, it's important to keep the codebase small and easy to understand.
Ask it:
Name a type of drink that I enjoy, and then look up its country of origin. Be concise.
and it will say:
Based on the available evidence, it appears that coffee may have originated in either Ethiopia or Yemen. However, as the legend of Kaldi suggests, Ethiopia has long been associated with coffee's history.
after looking up the user's personal notes and then consulting Wikipedia.
Ensure that you have an OpenAI-compatible inference server running at http://localhost:1234/
. If you're using LM Studio, a tried-and-true configuration looks like this:
Tips: The LLM may be biased to generate "Observations" when it is only supposed to generate up to "Action Input". To mitigate this problem, add "Observation:" as a Stop String in LM Studio, so that LM Studio will stop the LLM from generating any more text after it sees "Observation:".
Then, run the script:
chainlit run main.py -w
This repo uses pre-commit hooks to automate many chores. Install them.
As a Python-based project, this repo registers all its dependencies in the pyproject.toml
file. Use Poetry to install them.
PYTHONPATH=. poetry install --no-root
We use --no-root
because we don't want to install the project itself as a dependency. It's an application, not a library.
As this article explains:
The main use of PYTHONPATH is when we are developing some code that we want to be able to import from Python, but that we have not yet made into an installable Python package.
main.py
is the entrypoint. It runs a ReAct Agent (S Yao, et, al.). As an agent, the AI is capable of wielding several tools. For example,
- It can use
tool_for_my_notes.py
to look up plain-text notes you stored in a folder. For demo purposes, the folderdemo-notes/
contains some stubs that you can check out. - It can use
tool_for_wikipedia.py
to find answers to a given question after consulting Wikipedia articles.
chainlit.md
and public/
are simply UI assets for the web frontend.
To improve the precision of tool_for_my_notes.py
, I modified the default prompt for the sub-question query engine in LlamaIndex by asking it to generate keywords rather than complete sentences. The changes are in sub_question_generating_prompt_in_keywords.py
.
Similarly, I also overrode the agent-level system prompt. Since it's quite a long prose, I put that in a separate file, system_prompt.md
.
One-shot examples, instead of zero-shot. In both the QueryEngineTool
in tool_for_my_notes.py
and the OnDemandLoaderTool
in tool_for_wikipedia.py
, I added one example in the tool description. This greatly improves the quality of Action Inputs generated by the LLM.
Generated via cloc --md .
:
Language | files | blank | comment | code |
---|---|---|---|---|
JSON | 111 | 0 | 0 | 419 |
XML | 7 | 0 | 0 | 411 |
Python | 4 | 66 | 87 | 231 |
Markdown | 5 | 62 | 0 | 103 |
TOML | 2 | 26 | 36 | 43 |
YAML | 1 | 0 | 5 | 41 |
-------- | -------- | -------- | -------- | -------- |
SUM: | 130 | 154 | 128 | 1248 |