GPTs can't count 🎲

Demo of even the most advanced LLMs' inability to handle basic arithmetic.

The script scans number combinations in a range, using the template a{op}b=, e.g.:

17+950=

until a pair of numbers a, b yields a wrong answer.

Currently uses gpt-4o.

Example: addition

python cant_count.py --min 1000 --max 1100 --max_attempts 20 --op "+"

Output:

Running from 1,000 to 1,100, operator '+'... 
1,000 + 1,000 correct.                                     
1,050 + 1,050 correct.                                     
...
1,100 + 1,060 correct.                                     
1,100 + 1,080 correct.                                     
1,100 + 1,090 INCORRECT!!!                                 
Model answered: '2180'.                                    
Correct answer was: 2,190.

Example: multiplication

python cant_count.py --min 1000 --max 1100 --max_attempts 20 --op "*"

Output:

Running from 1,000 to 1,100...                  
1,000 * 1,000 correct.                                        
1,050 * 1,050 correct.                                        
1,075 * 1,075 correct.                                        
1,088 * 1,037 INCORRECT!!!                                    
Model answered: 'The product of 1088 and 1037 is 1,127,456.'.  
Correct answer was: 1,128,256.

For a theory about what might be the source of this and how it could be fixed, see Minimum Description Length Recurrent Neural Networks.

Usage

Install the OpenAI API package:

pip install openai==1.4.0

Add your OpenAI API Key as environment variable or add directly to script:

export OPENAI_API_KEY=...

Run:

python cant_count.py --min 1000 --max 1100 --max_attempts 20 --op "+"

Available operators: +, -, *.

Note that OpenAI model responses are nondeterministic, results may vary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GPTs can't count 🎲

Demo of even the most advanced LLMs' inability to handle basic arithmetic.

Example: addition

Example: multiplication

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

GPTs can't count 🎲

Demo of even the most advanced LLMs' inability to handle basic arithmetic.

Example: addition

Example: multiplication

Usage