LLMs-and-Probabilistic-Reasoning

🔥 Remember to ⭐ this repo if you find it useful and cite our work if you use in your work! 🔥

🔥 If you have any questions or concerns, please create an issue 📝! 🔥

LLMs-and-Probabilistic-Reasoning

This repository contains the data and software artifacts for the EMNLP 2024 (Main) paper "What Are the Odds? Language Models Are Capable of Probabilistic Reasoning".

📓 Overview

This repository is organized into several key components:

Generation:
- idealized_generation/: Contains scripts for generating idealized distributions and prompts.
- real_world_generation/: Scripts for generating distributions and prompts based on real-world data.
Sample Results from Paper:
- This folder contains the sample results presented in the paper, organized by experiment type.
Templates:
- Contains template code used for generating idealized and real-world distirbutions.
Notebook:
- EMNLP_2024_tutorial.ipynb: A Jupyter notebook that provides an interactive tutorial on using the provided scripts to generate datasets and load results from the paper (which can be found in sample_results_from_paper/.)

🔧 Setup and Usage

This section is a work-in-progress. For the time being, please refer to the tutorial notebook (EMNLP_2024_tutorial.ipynb).

⚡ Additional Results

Additional model-wise results are provided below for tables 1 and 2 from the paper.

Aggregated zero-shot task performanceacross different LMs (Table 1):

Model	Percentiles (%)	Sampling (K-S)	Probabilities (%)
Llama3-70B	26.6 ± 3.76	0.63 ± 0.07	32.5 ± 2.33
GPT3.5-Turbo	25.7 ± 3.11	0.73 ± 0.07	32.7 ± 2.38
GPT4-Turbo	14.9 ± 2.39	0.59 ± 0.08	21.0 ± 2.11
Gemini 1.0 Ultra	16.5 ± 2.67	0.76 ± 0.09	19.4 ± 2.26

Zero-shot performance by domain and context category across different LMs (Table 2):

Model	Health Idealized	Health Real World Con.	Health Norm. Approx.	Finance Idealized	Finance Real World Con.	Finance Norm. Approx.	Climate Idealized	Climate Real World Con.	Climate Norm. Approx.
Llama3_8B	20.50 +/- 5.56	19.40 +/- 1.33	17.98 +/- 1.02	26.05 +/- 1.97	20.49 +/- 0.83	24.43 +/- 1.70	26.94 +/- 2.16	15.63 +/- 2.53	13.72 +/- 2.10
Llama3_70B	14.8 +/- 6.01	15.3 +/- 4.03	8.61 +/- 1.97	23.9 +/- 4.02	19.8 +/- 6.56	6.24 +/- 0.78	23.5 +/- 5.71	20.2 +/- 5.29	8.87 +/- 0.99
Gemma2 9B	16.14 +/- 5.70	19.08 +/- 8.07	18.97 +/- 7.69	27.05 +/- 5.74	7.36 +/- 0.73	7.59 +/- 0.99	25.09 +/- 4.49	7.55 +/- 0.76	9.26 +/- 1.41
Gemma2 27B	13.28 +/- 5.68	5.02 +/- 0.56	5.09 +/- 0.51	16.08 +/- 6.32	7.90 +/- 1.20	7.74 +/- 1.16	11.84 +/- 0.85	5.82 +/- 1.08	5.10 +/- 1.10
Mistral_8x7B	15.13 +/- 3.96	11.22 +/- 1.64	9.64 +/- 1.55	21.63 +/- 2.31	11.30 +/- 2.63	12.28 +/- 4.09	26.05 +/- 5.21	11.29 +/- 1.94	10.90 +/- 1.82
GPT3.5-Turbo	20.5 +/- 9.62	20.3 +/- 8.51	6.81 +/- 0.68	17.7 +/- 4.54	20.4 +/- 2.88	7.55 +/- 0.77	22.7 +/- 6.88	25.7 +/- 6.32	7.90 +/- 0.22
GPT4-Turbo	11.0 +/- 4.94	4.92 +/- 3.18	3.15 +/- 0.76	8.99 +/- 1.18	10.7 +/- 3.24	5.50 +/- 0.48	18.5 +/- 6.53	15.2 +/- 5.13	4.94 +/- 0.58
Gemini Pro	25.30 +/- 8.41	11.51 +/- 1.06	10.42 +/- 1.32	29.35 +/- 3.72	11.77 +/- 0.92	10.10 +/- 1.01	26.20 +/- 5.44	18.67 +/- 2.01	16.53 +/- 1.94
Gemini Ultra	12.8 +/- 4.43	10.3 +/- 2.49	5.89 +/- 1.09	14.0 +/- 4.47	10.5 +/- 2.75	7.62 +/- 1.06	16.9 +/- 3.86	10.5 +/- 0.79	7.43 +/- 1.11

📜 Citation

If you find our paper or any codes in this repo useful, please cite our work.

@article{paruchuri2024odds,
  title={What Are the Odds? Language Models Are Capable of Probabilistic Reasoning},
  author={Paruchuri, Akshay and Garrison, Jake and Liao, Shun and Hernandez, John and Sunshine, Jacob and Althoff, Tim and Liu, Xin and McDuff, Daniel},
  journal={arXiv preprint arXiv:2406.12830},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
generation		generation
sample_results_from_paper		sample_results_from_paper
templates		templates
.gitignore		.gitignore
EMNLP_2024_tutorial.ipynb		EMNLP_2024_tutorial.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMs-and-Probabilistic-Reasoning

📓 Overview

🔧 Setup and Usage

⚡ Additional Results

📜 Citation

About

Releases

Packages

Languages

License

yahskapar/LLMs-and-Probabilistic-Reasoning

Folders and files

Latest commit

History

Repository files navigation

LLMs-and-Probabilistic-Reasoning

📓 Overview

🔧 Setup and Usage

⚡ Additional Results

📜 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages