-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Evaluation, Reproducibility, Benchmarks Meeting 14
AReinke edited this page Jan 19, 2022
·
1 revision
Date: 24th November 2021
Present: Nicola, Jorge, Lena
- conflict of interest discussion
- Initial draft put up for discussion by Lena:
-
1. Raw metric analyses
- Analyzing per class
- Making sample dependencies transparent
-
2. Aggregation
- Respecting hierarchical data structure
- Handling missing values
- Dealing with unequal class importance and frequency
- Considering dependencies between classes
-
3. Stratification
- Handling biases
- Leveraging meta data
-
4. Comparing and ranking (optional)
- Respecting relations between metrics
- Performing statistical tests
- Performing uncertainty-aware rankings
- Considering non-determinism of ML
- Considering clinical relevance
-
5. Reporting
- Choosing reporting precision
- Reporting on the quality of the reference
- Extrapolating results
-
1. Raw metric analyses
- Jorge: need to distinguish between requirements and recommendations
- high risk -> requirement; otherwise recommendation
- Can we make a self-assessment form such that people can assess risks and thus decide on requirements vs recommendation?
- Open: How to consider clinical relevance