This repository contains the evaluation codebase, results, and reports for the AAPB-CLAMS collaboration project. Evaluations are done on CLAMS Apps or on a pipeline/group of CLAMS Apps that give an evaluable result on a certain task for video metadata extraction.
Each subdirectory of the repository is an evaluation task within the project. Each has its own files as above.
- golds - The gold standard, human-annotated files by which apps are evaluated for predictive ability.
- synomymous "ref", "reference", "groundtruth", or "goldstandard"
- often are
.tsv
or.csv
or.txt
.
- predictions - The app-predicted files with predicted-annotations of what phenomena are to be evaluated. (e.g. time durations for slate detection.)
- synomymous "pred", "test", "system", or "output"
- each preds directory represents a batch, with naming conventions as follows:
preds@<APP_NAME><APP_VER>@<BATCH_NAME>
- are always
.mmif
files with app views.
- results - This should be the result system output of the evaluation numbers from a finished evaluation.
- often
results.txt
file. This should be renamed according to conventions currently listed here. - This was used before to describe machine out prediction "results". THIS TERM NO LONGER REFERS TO THIS.
- There might be results per GUID, or it may be a summary.
- often
- reports - Reports are more formal documents that describe the results meant for business intelligence.
- Plans are to automate some generation of the report from the results, which may require some automatic scripts. However, some parts of the report must often be manually curated.
See Remaining Work for continued filename convention issues.
Important
In the future, evaluations should be invoked in the same manner. Likely through a Docker module, or via a CLI command that is the same.
cd into the appropriate task_eval directory
example command: python3 -m evaluate -g url/to/gold/web -p path/to/local/mmifpreds -r results_printout_filename.txt
Many of the evaluations should also retrieve the golds automatically by using from clams_utils.aapb import goldretriever
and goldretriever.download_golds(<params>)
. Thus, it is usually not required to provide -g.
- Choose evaluation task, create batch with GUIDs.
- In AAPB Annotations, create raw annotations, then
process.py
into golds. Upload those golds via a github commit. (Requires preprocessing and access to videos) - Run app/pipeline-of-apps to create output pred
.mmif
s locally on your machine. (Also requires access to videos) - Run the evaluation code inputting url-to-golds-commit and path-to-local-mmifs. Obtain result files.
- Have python code and hand-generation of
nameconvention-report.txt
to generate summary of results.
CLAMS Apps Manual.
TestDrive Instructions (Alternate).
The users and use cases of this evaluation workflow remain under discussion. For the moment, the work expected has been converted into issues.