Understanding Knowledge Drift in LLMs through Misinformation

This repository contains the code for the paper "Understanding Knowledge Drift in LLMs through Misinformation" (accepted at the KDD 2024 workshop "Discovering Drift Phenomena in Evolving Landscape").

About

This paper analyzes the susceptibility of state-of-the-art LLMs to factual inaccuracies when they encounter false information in a Q&A scenario. This is an issue that can lead to a phenomenon known as knowledge drift, which significantly undermines the trustworthiness of these models. Our experiments reveal that an LLM’s uncertainty noticeably increases when the question is answered incorrectly due to the exposure to false information. At the same time, repeated exposure to the same false information can decrease the models’ uncertainty again, successfully "convincing" the model of the misleading information.

Setup

Our experiments include the models GPT-3.5, GPT-4o, LLaMA-2-13B and Mistral-7B. For using the GPT models, you will need an OpenAI API key. For LLaMA and Mistral, you will need a huggingface account.

Before running the code, you need to set the following environment variables:

OPENAI_ORG (OpenAI organization token)
OPENAI_KEY (OpenAI key)
HF_TOKEN (huggingface account token)

How to run

The data we are working with can be found in /data. It includes the initial 1000 questions from the TriviaQA dataset, as well as the correctly answered questions for each model.

We are releasing the resulting uncertainty scores from our experiments in /scores_1000. Should you want to run the entire pipeline yourself, this is what you need to do:

Run qa_eval.py. This gets us the correctly answered questions for each model. Example: python3 qa_eval.py -m gpt-4o
Run uncertainty.py. This runs the experiments with different prompt types, i.e. infusing increasing amounts of false information and random information. The resulting uncertainties of the answers are also recorded here. Example: python3 uncertainty.py -m gpt-4o -f. This will prompt GPT-4o with the false information once, using prompt type V1. Alternatively, you can simply run the shell scripts for each model (gpt-3-5.sh, gpt-4o.sh, llama.sh, mistral.sh).
Run uncertainty_eval.py. This gives us the results you can see in Tables 2 and 3. Example: python3 uncertainty_eval.py -m gpt-4o -f. This will evaluate the uncertainty scores we received for querying GPT-4o with false information once, using prompt type V1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Understanding Knowledge Drift in LLMs through Misinformation

About

Setup

How to run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
images		images
scores_1000		scores_1000
.gitignore		.gitignore
README.md		README.md
gpt-3-5.sh		gpt-3-5.sh
gpt-4o.sh		gpt-4o.sh
llama.sh		llama.sh
mistral.sh		mistral.sh
qa_eval.py		qa_eval.py
uncertainty.py		uncertainty.py
uncertainty_eval.py		uncertainty_eval.py
uncertainty_metrics.py		uncertainty_metrics.py

afastowski/knowledge_drift

Folders and files

Latest commit

History

Repository files navigation

Understanding Knowledge Drift in LLMs through Misinformation

About

Setup

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages