Skip to content

Commit

Permalink
chore: Cleaning up code and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
kurisu committed Oct 26, 2024
1 parent 2cf100b commit b6bf8bb
Show file tree
Hide file tree
Showing 3 changed files with 179 additions and 97 deletions.
43 changes: 24 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ python_version: 3.11.9

The project is built using Transformers Agents 2.0, and uses the Stanford SQuAD dataset for training. The chatbot is designed to answer questions about the dataset, while also incorporating conversational context and various tools to provide a more natural and engaging conversational experience.

At the time of writing, the project is available on [Hugging Face Spaces](https://huggingface.co/spaces/kaiokendall/SQuAD_Agent_Experiment).

## Getting Started

1. Install dependencies:
Expand All @@ -23,13 +25,16 @@ pip install -r pre-requirements.txt
pip install -r requirements.txt
```

1. Set up required keys:
2. Set up required keys:

Create a `.env` file and set the following environment variables:

```bash
HF_TOKEN=<your token>
OPENAI_API_KEY=<your key>
```

1. Run the app:
3. Run the app:

```bash
python app.py
Expand All @@ -39,37 +44,37 @@ python app.py

1. SQuAD Dataset: The dataset used for training the chatbot is the Stanford SQuAD dataset, which contains over 100,000 questions and answers extracted from 500+ articles.
2. RAG: RAG is a technique used to improve the accuracy of chatbots by using a custom knowledge base. In this project, the Stanford SQuAD dataset is used as the knowledge base.
3. Llama 3.1: Llama 3.1 is a large language model used to generate responses to user questions. It is used in this project to generate responses to user questions, while also incorporating conversational context.
4. Transformers Agents 2.0: Transformers Agents 2.0 is a framework for building conversational AI systems. It is used in this project to build the chatbot.
5. Created a SquadRetrieverTool to integrate a fine-tuned BERT model into the agent, along with a TextToImageTool for a playful way to engage with the question-answering agent.
3. Transformers Agents 2.0: Transformers Agents 2.0 is a framework for building conversational AI systems. It is used in this project to build the chatbot.
4. Created a SquadRetrieverTool to integrate a fine-tuned BERT model into the agent, along with a TextToImageTool for a playful way to engage with the question-answering agent.
5. Gradio: Gradio is used to create the chatbot interface, in `app.py`.

## Evaluation

* [Agent Reasoning Benchmark](https://github.com/aymeric-roucher/agent_reasoning_benchmark)
* [Hugging Face Blog: Open Source LLMs as Agents](https://huggingface.co/blog/open-source-llms-as-agents)
* [Benchmarking Transformers Agents](https://github.com/aymeric-roucher/agent_reasoning_benchmark/blob/main/benchmark_transformers_agents.ipynb)
SemScore is used in this project to evaluate the chatbot's responses in the notebook `benchmarking.ipynb`.

## Results
See [SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity](https://doi.org/10.48550/arXiv.2401.17072)

TBD
In this experiment, the agent is evaluated with 3 different system prompting approaches:

## Limitations
1. The default prompting approach, which is just the default system prompt used in Hugging Face Transformers Agents 2.0, with only an example of using the `squad_retriever` tool added.
2. A succinct prompting approach, which guides the agent to be concise if possible while still answering the question.
3. A focused prompting approach, which reframes the entire chatbots purpose to focus more on the specific task of answering questions about the SQuAD dataset, while still being open to exploring other topics.

## Results

TBD

## Related Research

* [Retro: A Generalist Agent for Science](https://arxiv.org/abs/2112.04426)
* [RETRO-pytorch](https://github.com/lucidrains/RETRO-pytorch)
* [Why isn't Retro mainstream? State-of-the-art within reach](https://www.reddit.com/r/MachineLearning/comments/1cffgkt/d_why_isnt_retro_mainstream_stateoftheart_within/)
## Limitations

TBD
* This experiment is not designed for multiple users. While it has in-session memory, simply refreshing the browser will reset the chat history, which is convenient for experimentation.
* Some of the agent's underlying engines, models, and tools use keys that have usage limits, so the app may not work if those limits have been reached.
* It is recommended to clone the repo and run the code using your own keys, to avoid running into those limits.

## Acknowledgments

* [Agents 2.0](https://github.com/huggingface/transformers/tree/main/src/transformers/agents)
* [Hugging Face Transformers Agents 2.0](https://huggingface.co/docs/transformers/en/main_classes/agent)
* [SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity](https://arxiv.org/abs/2401.17072)
* `semscore.py` from [geronimi73/semscore](https://github.com/geronimi73/semscore/blob/main/semscore.py)
* [SemScore](https://huggingface.co/blog/g-ronimo/semscore)
* [Stanford SQuAD](https://rajpurkar.github.io/SQuAD-explorer/)
* [llama 3.1](https://github.com/meta-llama/Meta-Llama)
* [Gradio](https://www.gradio.app/)
20 changes: 15 additions & 5 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@
else "http://localhost:1234/v1"
)

"""
The ImageQuestionAnsweringTool from Transformers Agents 2.0 has a bug where
it said it accepts the path to an image, but it does not.
This class uses the adapter pattern to fix the issue, in a way that may be
compatible with future versions of the tool even if the bug is fixed.
"""
class FixImageQuestionAnsweringTool(ImageQuestionAnsweringTool):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
Expand All @@ -49,6 +55,13 @@ def encode(self, image: "Image | str", question: str):
image = Image.open(image)
return super().encode(image, question)

"""
The app version of the agent has access to additional tools that are not available
during benchmarking. We chose this approach to focus benchmarking on the agent's
ability to solve questions about the SQuAD dataset, without the help of general
knowledge available on the web. For the purposes of the project, the demo
app has access to additional tools to provide a more interactive and engaging experience.
"""
ADDITIONAL_TOOLS = [
DuckDuckGoSearchTool(),
VisitWebpageTool(),
Expand All @@ -62,7 +75,7 @@ def encode(self, image: "Image | str", question: str):
# Add image tools to the default task solving toolbox, for a more visually interactive experience
TASK_SOLVING_TOOLBOX = DEFAULT_TASK_SOLVING_TOOLBOX + ADDITIONAL_TOOLS

# system_prompt = DEFAULT_SQUAD_REACT_CODE_SYSTEM_PROMPT
# Using the focused prompt, which was the top-performing prompt during benchmarking
system_prompt = FOCUSED_SQUAD_REACT_CODE_SYSTEM_PROMPT

agent = get_agent(
Expand All @@ -72,9 +85,6 @@ def encode(self, image: "Image | str", question: str):
use_openai=True, # Use OpenAI instead of a local or HF model as the base LLM engine
)

app = None


def append_example_message(x: gr.SelectData, messages):
if x.value["text"] is not None:
message = x.value["text"]
Expand Down Expand Up @@ -197,7 +207,7 @@ def _postprocess_content(
"text": "What is on top of the Notre Dame building?",
},
{
"text": "Tell me what's on top of the Notre Dame building, and draw a picture of it.",
"text": "What is the Olympic Torch made of?",
},
{
"text": "Draw a picture of whatever is on top of the Notre Dame building.",
Expand Down
213 changes: 140 additions & 73 deletions benchmarking.ipynb

Large diffs are not rendered by default.

0 comments on commit b6bf8bb

Please sign in to comment.