Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Brier score quiz to evaluate estimates on an anwser #8

Open
mathdesc opened this issue Nov 22, 2024 · 1 comment
Open

Feature Brier score quiz to evaluate estimates on an anwser #8

mathdesc opened this issue Nov 22, 2024 · 1 comment

Comments

@mathdesc
Copy link

mathdesc commented Nov 22, 2024

What is unfair and inaccurate with quiz

A quiz is a nice feedback way to assess audience take on the slide's content during or preferably at the end of talk or a lecture, even prior to Q&A closing moment to let audience focus its questions based on quiz's results.

But it is very binary (ticked or not) even with multiple answers. The fact that either you opt-in totally and select one or few options from the answer set or you totally opt-out can not express a preference (in terms of percentage for example) nor can the audience self assess its certainty on a piece of knowledge that was to be taken away.

This uncertainty metric is also key for the presenter to evaluate reasoning of audience choices

Rationale

Let's imagine a talk on birds, where it has been stressed that penguins where only found in the southern hemisphere, along with many details, their social behaviours, etc.. and also the number of species (which is 18). At the end a quiz is held, amongst others this question is asked :

Question : How many species of northern penguins species are there ?
Answer A - 22 : 20 %
Answer B - 18 : 60 %
Answer C - 16 : 20 %
Answer D - 5 : 0%
Answer E - none : 0 % (the only exclusively correct answer)

This one was a sanity question to test attention of the audience. Above is represented the answer of someone who can not remember the exact number, but still was pretty confident on the good number (Answer B : 18) even if falling in the "trap" (Answer E is the only correct one, question asks for "northern species" since all are southern, it means there is none). This confidence level is informative for the presenter and in that case, a warning and a relief for the unfortunate people whom answered likely.

The real deal now

Another more representative example of the idea is when multiple answer is required, with various truth estimates along with reference estimate and justification :

Question : What make mostly all those species so social ? (230 pts to allocate)
A - They move in single file during their journey 100% (100% true, say they all do)
B - They fish in hunters groups 30% (100% true, say they all do)
C - They share nests between fishing parties 60% (10% true, rarely reported & attested)
D - They look like us humans on some many aspects 20% (NA, maybe but cannot be evaluated in an objective rational context)
E - They regurgitate food even for other's youngsters 80% (say 50% true not for all, depends on social status, food supply, etc..)
F - They gather in cluster during winter, 100% (70% true, because say 30% lives in warmer latitudes - less relies)
cycling warmer core individuals to cluster's
outer perimeter where it's colder

The Brier Score here is : sum (estimate - reference)² (NA references reflects given estimate for cancellation)
(1 - 1)² + (0.3 - 1)² + (0.6 - 0.1)² + (0.2 - 0.2)² + (0.8 - 0.5)² + (1 - 0.7)² = 0.83
This yields a skill score of 63 % = 1 - (0.83 / 2.30)

Take away with this scoring methods for quiz :
On B, the hesitation/speculation was really too low.
On C, the is likely a generalization bias perception error
On D, the is anthropomorphism , the "old" naturalist error
On E, this is an overshoot maybe due to a lack of contextualization or details
On F, again this is an overshoot due to a reasoning error, question asked about "all those species" so even those ie in Australia

@mathdesc
Copy link
Author

First step to take would be IMO :

  • enable a number or range input for the estimates
  • enable a json source for quiz question, answer options, reference estimates and rationales
  • change the results of brier quiz to show percentage like poll does but based on mean or median of all participants estimate which a threshold point representing the reference estimate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant