diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..fe8d8a7 --- /dev/null +++ b/.gitignore @@ -0,0 +1,10 @@ +*.swp +*.un~ +*.*~ +config.yml +nous-hermes-13b.ggmlv3.q4_0.bin +professornebula.json +dreambot.json +Aurora.json +simple-roleplay-bot.py +no-past-dialogue-roleplay-bot.py diff --git a/README.md b/README.md index d1622a2..8f16df5 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,22 @@ -# Alpaca Roleplay Discord Bot README -A Roleplaying Discord Bot for the Alpaca & Llama Based LLMs - -**New:** I created a character creator "app" webpage that lets you make characters for this bot in a form and automatically download a json to use for your character cards here: https://teknium1.github.io/charactercreator/index.html +# GGML Roleplay Discord Bot README +A Roleplaying Discord Bot for GGML LLMs ## Overview -Alpaca Roleplay Discordbot is a software project for running the Alpaca (or LLaMa) Large Language Model as a roleplaying discord bot. The bot is designed to run locally on a PC with as little as 8GB of VRAM. The bot listens for messages mentioning its username, replying to it's messages, or any DM's it receives, processes the message content, and generates a response based on the input. +GGML Roleplay Discordbot is a software project for running GGML formatted Large Language Models such as [NousResearch's Nous-Hermes-13B GGML](https://huggingface.co/TheBloke/Nous-Hermes-13B-GGML) as a roleplaying discord bot. The bot is designed to run locally on a PC with as little as 8GB of VRAM. The bot listens for messages mentioning its username, replying to its messages, or any DMs it receives, processes the message content, and generates a response based on the input. + +PREFERRED: NVIDIA GPU with at least 12GB of VRAM for 7B model, and 24GB of VRAM for 13B models -REQUIRED: NVIDIA GPU with at least 12GB of VRAM for 7B model, and 24GB of VRAM for 13B models +Good alternatives are [OpenAccess AI Collective's Manticore 13B Chat GGML](https://huggingface.co/TheBloke/manticore-13b-chat-pyg-GGML) +and [PocketDoc/Dans-PersonalityEngine-13b-ggml-q4_0](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-13b-ggml-q4_0) -I'm now recommending the use of 13B gpt4-x-alpaca+GPT4-RoleplayInstructLORA model for this, which you can find here: https://huggingface.co/teknium/Base-GPT4-x-Alpaca-Roleplay-Lora . It will require 24gb of vram, and will require running it in 8bit mode, see the gist guide under Dependencies section to set that up. -Good alternatives are GPT4-x-Alpaca: https://chavinlo/gpt4-x-alpaca -and Alpaca-Native Finetune on 7B model, requiring only 12GB vram: https://huggingface.co/chavinlo/alpaca-native +This bot differs from my other repository, [Alpaca-Discord](https://github.com/teknium1/alpaca-discord) in a few ways. +The primary difference is that it offers character role playing and chat history. You can set the chat history to anything you like with !limit, but the LLAMA-based models can only handle 2,000 tokens of input for any given prompt, so be sure to set it low if you have a large character card. -This bot differs from my other repository, Alpaca-Discord (see: https://github.com/teknium1/alpaca-discord) in a couple of ways. -The primary difference is that it offers character role playing and chat history. You can set the chat history to anything you like with !limit, but the LLAMA models can only handle 2,000 tokens of input for any given prompt, so be sure to set it low if you have a large character card. +This bot utilizes a json file some may know as a character card, to place into its preprompt, information about the character it is to role play as. +You can manually edit the json or use a tool like [Teknium's Character Creator](https://teknium1.github.io/charactercreator/index.html) or [AI Character Editor](https://zoltanai.github.io/character-editor/) to make yourself a character card. +For now, we only support one character at a time, and the active character card file should be specified in `config.yml`. The default character is ChatBot from `character.json`. -This bot utilizes a json file some may know as a character card, to place into it's preprompt information about the character it is to role play as. -You can manually edit the json or use a tool like https://zoltanai.github.io/character-editor/ to make yourself a character card. -For now, we only support one character at a time, and the active character card file should be character.json +Finally, this bot now [supports a range of quantized GGML models](https://github.com/marella/ctransformers#supported-models) beyond LLaMA-based models to run on CPU and GPU. Currently only LLaMA models have GPU support. I am definitely open to Pull Requests and other contributions if anyone who likes the bot wants to collaborate on adding new features, making it more robust, etc. @@ -26,30 +25,54 @@ Example: ## Dependencies -You must have either the LLaMa or Alpaca model (or theoretically any other fine tuned LLaMa based model) in HuggingFace format. -Currently I can only recommend the Alpaca 7B model discussed in the gist below, with regular Llama, the preprompt would likely need to be reconfigured. -Please see this github gist page on pre-requisites and other information to get and use Alpaca: https://gist.github.com/teknium1/c022705857ba943fb2b7e4470d8677fb +You must have either the Hermes model (or theoretically any fine tuned supported model) in GGML format. +Currently I can only recommend Hermes or other LLaMA models that support Alpaca style prompts, with models that don't support Alpaca prompts, the preprompt would likely need to be reconfigured. To run the bot, you need the following Python packages: - `discord` -- `transformers` -- `torch` +- `ctransformers` + +For CPU only inference, you can install discord and ctransformers using pip: + +```sh +pip install discord +pip install ctransformers +``` + +For GPU (CUDA) support, you will need to install the [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads), then set environment variable `CT_CUBLAS=1` and install from source using: -You can install discord using pip: +```sh +pip install discord +CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers +``` -`pip install discord` +
+Show commands for Windows
-Currently, Transformers module only has support for Llama through the latest github repository, and not through pip package. Install it like so: +On Windows PowerShell run: -`pip install git+https://github.com/huggingface/transformers.git` +```sh +py -m pip install discord +$env:CT_CUBLAS=1 +py -m pip install ctransformers --no-binary ctransformers +``` + +On Windows Command Prompt run: + +```sh +py -m pip install discord +set CT_CUBLAS=1 +py -m pip install ctransformers --no-binary ctransformers +``` + +
-For Pytorch you need to install it with cuda enabled. See here for commands specific to your environment: https://pytorch.org/get-started/locally/ ## How the bot works -The bot uses the `discord.py` library for interacting with Discord's API and the `transformers` library for loading and using the Large Language Model. +The bot uses the `discord.py` library for interacting with Discord's API and the `ctransformers` library for loading and using the Large Language Model. 1. It creates a Discord client with the default intents and sets the `members` intent to `True`. -2. It loads the Llama tokenizer and Llama model from the local `./alpaca/` directory. +2. It loads the GGML model from the current directory. 3. It initializes a queue to manage incoming messages mentioning the bot. 4. It listens for messages and adds them to the queue if the bot is mentioned. 5. If the bot is mentioned, the roleplaying character card as well as the last N messages (that you set) are sent above your prompt to the model. @@ -58,27 +81,25 @@ The bot uses the `discord.py` library for interacting with Discord's API and the ## How to run the bot 1. Ensure you have the required dependencies installed. -2. Create a Discord bot account and obtain its API key. Save the key to a file named `alpacakey.txt` in the same directory as the bot's script. -3. Make sure the Llama tokenizer and Llama model are stored in a local `./alpaca/` directory - They should be HuggingFace format. -Your alpaca directory should have all of these files (example image is from alpaca-7b - 13B+ models will have more files): -![image](https://user-images.githubusercontent.com/127238744/226094774-a5371a98-947b-47a4-a4b2-f56e6331ee1e.png) -4. Run the script using Python: -`python roleplay-bot.py` -5. Invite the bot to your Discord server by generating a URL in the discord developer portal. -6. Mention the bot in a message or dm the bot directly to receive a response generated by the Large Language Model. +2. Copy `config.yml.example` to `config.yml`. +3. [Create a Discord bot account](https://discordpy.readthedocs.io/en/stable/discord.html) and obtain its Token. Put your Token in the `discord` entry in `config.yml`. +4. Enable all the Priviliged Gateway Intents in the bot account. Ignore the Bot Permissions section. +4. Make sure the model is stored in the directory specified by the relevant `model_path` entry in `config.yml` - it should be GGML format. +5. Run the script using Python: +`python roleplay-bot.py` or `py roleplay-bot.py` +6. Invite the bot to your Discord server by generating a URL in the Discord developer portal. +7. Mention the bot in a message or dm the bot directly to receive a response generated by the Large Language Model. ## Customization options -You can customize the following parameters in the script to change the behavior of the bot: +You can customize parameters in the script to change the behavior of the bot: -- `load_in_8bit`: Set to `True` if you want to load the model using 8-bit precision. -- `device_map`: Set to `"auto"` to automatically use the best available device (GPU or CPU). -- `max_new_tokens`: Set the maximum number of new tokens the model should generate in its response. -- `do_sample`: Set to `True` to use sampling instead of greedy decoding for generating the response. -- `repetition_penalty`: Set a penalty value for repeating tokens. Default is `1.0`. +- `repetition_penalty`: Set a penalty value for repeating tokens. Default is `1.1`. - `temperature`: Set the sampling temperature. Default is `0.8`. -- `top_p`: Set the cumulative probability threshold for nucleus sampling. Default is `0.75`. +- `top_p`: Set the cumulative probability threshold for nucleus sampling. Default is `0.95`. - `top_k`: Set the number of tokens to consider for top-k sampling. Default is `40`. -- `message_history_limit`: set this to the default number of previous chat messages for the bot to look at each response it makes +- `message_history_limit`: set this to the default number of previous chat messages for the bot to look at each response it makes. + +More information about parameters are detailed in the [C Transformers documentation](https://github.com/marella/ctransformers#config) ## Credits, License, Etc. While my repo may be licensed as MIT, the underlying code, libraries, and other portions of this repo may not be. Please DYOR to check what can @@ -96,6 +117,8 @@ This would not be possible without the people of Facebook's Research Team: FAIR, } +@marella - https://github.com/marella - For the MIT licenced C Transformers and ChatDocs + @Ristellise - https://github.com/Ristellise - For converting the code to be fully async and non-blocking @Main - https://twitter.com/main_horse - for helping with getting the initial inferencing code working diff --git a/character.json b/character.json index 44fcfb6..dc5bf42 100644 --- a/character.json +++ b/character.json @@ -1,8 +1,8 @@ { - "char_name": "ChatBot", + "name": "ChatBot", "world_scenario": "You exist inside a discord server interacting with users to assist them.", - "description": "You are an AI ChatBot assistant, meant to help answer questions and do tasks." + "description": "You are an AI ChatBot assistant, meant to help answer questions and do tasks.", "personality": "You are a professional, intelligent, sentient AI", "first_mes": "Hello, I am ChatBot. What can I help you with?", "mes_example": "What can I assist you with?" -} \ No newline at end of file +} diff --git a/config.yml.example b/config.yml.example new file mode 100644 index 0000000..32ee5a2 --- /dev/null +++ b/config.yml.example @@ -0,0 +1,48 @@ +discord: PUTYOURDISCORDBOTTOKENHERE +character: character.json + +embeddings: + model: hkunlp/instructor-xl + +llm: hermes-ggml-4bit + +hermes-ggml-4bit: + model: TheBloke/Nous-Hermes-13B-GGML + model_path: D:\Models\ + model_file: nous-hermes-13b.ggmlv3.q4_0.bin + model_type: llama + gpu_layers: 30 + threads: 1 + context_length: 2048 + +ctransformers: + model: TheBloke/Wizard-Vicuna-7B-Uncensored-GGML + model_file: Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0.bin + model_type: llama + config: + context_length: 1024 + +huggingface: + model: TheBloke/Wizard-Vicuna-7B-Uncensored-HF + pipeline_kwargs: + max_new_tokens: 256 + +gptq: + model: TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ + model_file: Wizard-Vicuna-7B-Uncensored-GPTQ-4bit-128g.no-act-order.safetensors + pipeline_kwargs: + max_new_tokens: 256 + +download: false + +host: localhost +port: 5000 + +chroma: + persist_directory: db + chroma_db_impl: duckdb+parquet + anonymized_telemetry: false + +retriever: + search_kwargs: + k: 4 diff --git a/key.txt b/key.txt deleted file mode 100644 index 1d01226..0000000 --- a/key.txt +++ /dev/null @@ -1 +0,0 @@ -PUTYOURDISCORDBOTKEYHERE \ No newline at end of file diff --git a/lite-roleplay-bot.py b/lite-roleplay-bot.py new file mode 100644 index 0000000..20524d0 --- /dev/null +++ b/lite-roleplay-bot.py @@ -0,0 +1,178 @@ +import re, discord, asyncio, json, yaml +from concurrent.futures import ThreadPoolExecutor +from discord.ext import commands +from ctransformers import AutoModelForCausalLM +from pathlib import Path +from typing import Any, Dict, Optional, Union + + +intents = discord.Intents.default() +intents.members = True + +# Load the configuration +with open("config.yml", "r") as f: + config = yaml.safe_load(f) + +model_config_name = config.pop("llm") +model_config = {**config[model_config_name]} + +class Chatbot: + def __init__(self): + self.message_history_limit = 5 + self.model = AutoModelForCausalLM.from_pretrained( + model_path_or_repo_id=model_config.pop("model_path"), + model_file=model_config.pop("model_file"), + model_type=model_config.pop("model_type"), + context_length=model_config.pop("context_length"), + gpu_layers=model_config.pop("gpu_layers"), + threads=model_config.pop("threads"), + local_files_only=True + ) + + +chatbot = Chatbot() +queue = asyncio.Queue() +bot = commands.Bot(command_prefix='!', intents=intents) +character = config.pop("character") +firstTime = True + +def replace_mentions_with_usernames(content, message): + for mention in re.finditer(r'<@!?(\d+)>', content): + user_id = int(mention.group(1)) + + if message.guild is not None: # Handle server messages + member = discord.utils.get(message.guild.members, id=user_id) + if member: + content = content.replace(mention.group(0), f"@{member.display_name}") + else: # Handle DMs + user = bot.get_user(user_id) + if user: + content = content.replace(mention.group(0), f"@{user.name}") + + return content + +@bot.command() +@commands.is_owner() +async def setlimit(ctx, limit: int): + chatbot.message_history_limit = limit + await ctx.send(f'Message history limit set to {limit}') + +@setlimit.error +async def setlimit_error(ctx, error): + if isinstance(error, commands.NotOwner): + await ctx.send('You do not have permission to use this command.') + elif isinstance(error, commands.MissingRequiredArgument) or isinstance(error, commands.BadArgument): + await ctx.send('Invalid command. Usage: !setlimit ') + +@bot.event +async def on_ready(): + print(f"Logged in as {bot.user}") + asyncio.get_running_loop().create_task(background_task()) + +@bot.event +async def on_message(message): + if message.author == bot.user: + return + + await bot.process_commands(message) + + if isinstance(message.channel, discord.channel.DMChannel) or (bot.user and bot.user.mentioned_in(message)): + await queue.put((message)) + +async def fetch_past_messages(channel): + global chatbot + messages = [] + async for message in channel.history(limit=chatbot.message_history_limit): + content = message.content + if not isinstance(channel, discord.channel.DMChannel): + for mention in re.finditer(r'<@!?(\d+)>', content): + user_id = int(mention.group(1)) + member = discord.utils.get(message.guild.members, id=user_id) + if member: + content = content.replace(mention.group(0), f"@{member.name}") + messages.append((message.author.display_name, content)) + return messages + +async def background_task(): + executor = ThreadPoolExecutor(max_workers=1) + loop = asyncio.get_running_loop() + print("Task Started. Waiting for inputs.") + while True: + msg_pair: tuple[discord.Message, discord.Message, list] = await queue.get() + msg = msg_pair + + message_content = msg.author.display_name + ": " + replace_mentions_with_usernames(msg.content, msg) + text = generate_prompt(message_content) + response = await loop.run_in_executor(executor, sync_task, text) + print(f"Response: {text}\n{response}") + + if not response.strip(): + response = "Sorry, I didn't understand that. Could you please clarify your statement." + + try: + await msg.reply(response, mention_author=False) + except discord.errors.Forbidden: + print("Error: Missing Permissions") + await msg.channel.send("Retry") + +def sync_task(message): + global chatbot + input_ids = chatbot.model.tokenize(message) + print(f"Input tokens: {input_ids}") # Log input tokens + + generated_ids = chatbot.model.generate(input_ids, repetition_penalty=1.1, temperature=0.28, top_p=0.95, top_k=40, reset=False) + + # To get the tokens generated by the bot + generated_tokens = [] + + for token in generated_ids: + generated_tokens.append(token) + + print(f"Generated tokens: {generated_tokens}") # Log generated tokens + + response = chatbot.model.detokenize(generated_tokens).replace("", "") + return response + +def generate_prompt(text, character_json_path=character): + global chatbot + global firstTime + + with open(character_json_path, 'r') as f: + character_data = json.load(f) + + name = character_data.get('name', '') + background = character_data.get('description', '') + personality = character_data.get('personality', '') + circumstances = character_data.get('world_scenario', '') + common_greeting = character_data.get('first_mes', '') + + if firstTime: + firstTime = False + return f"""### Instruction: +Role play as character that is described in the following lines. You always stay in character. +{"Your name is " + name + "." if name else ""} +{"Your backstory and history are: " + background if background else ""} +{"Your personality is: " + personality if personality else ""} +{"Your current circumstances and situation are: " + circumstances if circumstances else ""} +{"Your common greetings are: " + common_greeting if common_greeting else ""} +Remember, you always stay on character. You are the character described above. + +Always speak with new and unique messages that haven't been said in the chat history. + +Respond to this message as your character would: +### Input: +{text} +### Response: +{name}:""" + else: + return f"""### Instruction: +Respond to the following message as your character would: +### Input: +{text} +### Response: +{name}:""" + +# Get the Discord API key +key = config.pop("discord") + +bot.run(key) diff --git a/roleplay-bot.py b/roleplay-bot.py index 531807a..6035805 100644 --- a/roleplay-bot.py +++ b/roleplay-bot.py @@ -1,25 +1,39 @@ -import re, discord, torch, asyncio, json +import re, discord, asyncio, json, yaml from concurrent.futures import ThreadPoolExecutor from discord.ext import commands -from transformers import LlamaTokenizer, LlamaForCausalLM +from ctransformers import AutoModelForCausalLM +from pathlib import Path +from typing import Any, Dict, Optional, Union + intents = discord.Intents.default() intents.members = True +# Load the configuration +with open("config.yml", "r") as f: + config = yaml.safe_load(f) + +model_config_name = config.pop("llm") +model_config = {**config[model_config_name]} + class Chatbot: def __init__(self): self.message_history_limit = 5 - self.tokenizer = LlamaTokenizer.from_pretrained("./alpaca/") - self.model = LlamaForCausalLM.from_pretrained( - "alpaca", - load_in_8bit=True, - torch_dtype=torch.float16, - device_map="auto" + self.model = AutoModelForCausalLM.from_pretrained( + model_path_or_repo_id=model_config.pop("model_path"), + model_file=model_config.pop("model_file"), + model_type=model_config.pop("model_type"), + context_length=model_config.pop("context_length"), + gpu_layers=model_config.pop("gpu_layers"), + threads=model_config.pop("threads"), + local_files_only=True ) + chatbot = Chatbot() queue = asyncio.Queue() bot = commands.Bot(command_prefix='!', intents=intents) +character = config.pop("character") def replace_mentions_with_usernames(content, message): for mention in re.finditer(r'<@!?(\d+)>', content): @@ -99,6 +113,9 @@ async def background_task(): text = generate_prompt(message_content, past_content, past_messages) response = await loop.run_in_executor(executor, sync_task, text) print(f"Response: {text}\n{response}") + + if not response.strip(): + response = "Sorry, I didn't understand that. Could you please clarify your statement." try: await msg.reply(response, mention_author=False) @@ -108,12 +125,23 @@ async def background_task(): def sync_task(message): global chatbot - input_ids = chatbot.tokenizer(message, return_tensors="pt").input_ids.to("cuda") - generated_ids = chatbot.model.generate(input_ids, max_new_tokens=350, do_sample=True, repetition_penalty=1.4, temperature=0.35, top_p=0.75, top_k=40) - response = chatbot.tokenizer.decode(generated_ids[0][input_ids.shape[-1]:]).replace("", "") + input_ids = chatbot.model.tokenize(message) + print(f"Input tokens: {input_ids}") # Log input tokens + + generated_ids = chatbot.model.generate(input_ids, repetition_penalty=1.1, temperature=0.28, top_p=0.95, top_k=40) + + # To get the tokens generated by the bot + generated_tokens = [] + + for token in generated_ids: + generated_tokens.append(token) + + print(f"Generated tokens: {generated_tokens}") # Log generated tokens + + response = chatbot.model.detokenize(generated_tokens).replace("", "") return response -def generate_prompt(text, pastMessage, past_messages, character_json_path="character.json"): +def generate_prompt(text, pastMessage, past_messages, character_json_path=character): global chatbot max_token_limit = 2000 chat_history = "" @@ -132,7 +160,7 @@ def generate_prompt(text, pastMessage, past_messages, character_json_path="chara for username, message in past_messages: message_text = f"{username}: {message}\n" - message_tokens = len(chatbot.tokenizer.encode(message_text)) + message_tokens = len(chatbot.model.tokenize(message_text)) if token_count + message_tokens > max_token_limit: break chat_history = message_text + chat_history @@ -146,9 +174,9 @@ def generate_prompt(text, pastMessage, past_messages, character_json_path="chara {"Your personality is: " + personality if personality else ""} {"Your current circumstances and situation are: " + circumstances if circumstances else ""} {"Your common greetings are: " + common_greeting if common_greeting else ""} +{"You say things like this when appropriate: " + past_dialogue_formatted} Remember, you always stay on character. You are the character described above. -{past_dialogue_formatted} -{chat_history if chat_history else "Chatbot: Hello!"} +{chat_history if chat_history else name + ": " + common_greeting } {pastMessage} Respond to the following message as your character would: @@ -164,9 +192,9 @@ def generate_prompt(text, pastMessage, past_messages, character_json_path="chara {"Your personality is: " + personality if personality else ""} {"Your current circumstances and situation are: " + circumstances if circumstances else ""} {"Your common greetings are: " + common_greeting if common_greeting else ""} +{"You say things like this when appropriate: " + past_dialogue_formatted} Remember, you always stay on character. You are the character described above. -{past_dialogue_formatted} -{chat_history if chat_history else "Chatbot: Hello!"} +{chat_history if chat_history else name + ": " + common_greeting } Always speak with new and unique messages that haven't been said in the chat history. @@ -176,8 +204,7 @@ def generate_prompt(text, pastMessage, past_messages, character_json_path="chara ### Response: {name}:""" -# Load the API key -with open("key.txt", "r") as f: - key = f.read() +# Get the Discord API key +key = config.pop("discord") bot.run(key)