Table of contents generated with markdown-toc
The purpose of this library is to score predictions.
Some of the code in this library comes from my work at Empiricast, a forecasting startup I co-founded.
For a thorough introduction to scoring rules, see Calibration Scoring Rules for Practical Prediction Training by Spencer Greenberg.
- 4 scoring rules for choice predictions:
- Brier
- Logarithmic
- Practical
- Quadratic
- Fully type hinted
- 100 percent test coverage
Python Prediction Scorer is also available as a REST API. This is useful if you are not using Python or if you are using a Python version that we don’t support. The documentation is available on https://predictionscorer.herokuapp.com/docs.
pip install predictionscorer
Python Prediction Scorer requires Python 3.8.
For choice predictions, the forecaster assigns probabilities to different answers. As an example, let’s say that George and Kramer made the following forecasts for the result of a game where the home team ended up winning:
Result | George | Kramer | Correct |
---|---|---|---|
Home team wins | 40 % | 65 % | Yes |
Tie | 30 % | 10 % | No |
Away team wins | 30 % | 25 % | No |
Kramer assigned a higher probability to the correct answer than George did, so his forecast was better. But how much better? In order to find out, we must quantify the quality of their predictions. That’s what this library does.
We have four scoring rules to determine this:
- Brier
- Logarithmic
- Practical
- Quadratic
Let us look at each of them.
Brier scores range from 0 to 2. Lower is better.
from predictionscorer.rules import brier_score
george_probability = 0.4
kramer_probability = 0.65
george_score = brier_score(george_probability) # 0.72
kramer_score = brier_score(kramer_probability) # 0.2450
Logarithmic scores range from approaching infinity (worst) to 0 (best):
from predictionscorer.rules import logarithmic_score
george_probability = 0.4
kramer_probability = 0.65
george_score = logarithmic_score(george_probability) # 1.32
kramer_probability = logarithmic_score(kramer_probability) # 0.62
Practical scores range from approaching negative infinity (worst) to a configurable maximum — we use 2:
from predictionscorer.rules import practical_score
george_probability = 0.4
kramer_probability = 0.65
george_score = practical_score(0.4) # -0.64
kramer_score = practical_score(0.65) # 0.76
practical_score
accepts two optional parameters:
Name | Default |
---|---|
max_practical_score |
Decimal(2) |
max_probability |
Decimal("0.9999") |
Quadratic scores range from -1 (worst) to 1 (best):
from predictionscorer.rules import quadratic_score
george_probability = 0.4
kramer_probability = 0.65
george_score = quadratic_score(george_probability) # 0.28
kramer_score = quadratic_score(kramer_probability) # 0.76
See CHANGELOG.md.
Please open an issue on GitHub if you discover any problems or potential for improvement. They are very welcome. Comments on the API design are especially useful at this point.
Also, see CONTRIBUTING.md.