-
Notifications
You must be signed in to change notification settings - Fork 129
User_Building an Ingestion Program for a Competition
- What is an ingestion program?
- Result submission challenges
- Code submission challenges
- Execution priority
- Arguments
On the left (blue) is the organizer. They supply an ingestion program and a scoring program. On the right (red) is the participant. They supply result or code (or even data). For a simple use case, see the Iris competition.
An ingestion program is a piece of code, which is executed when a challenge participant makes a submission. It "ingests" it and execute "something" to help processing it. There are several possible use cases:
- Parsing the submission and deciding how to process it, e.g. the organizers may allow the participants to submit either results, code or data.
- Allowing submission of source code or libraries. The ingestion program can then call functions supplied by the participants and executed them on the input data. Advantage: the input data is read by the same reader for everybody, the participants are not penalized if their code fails to read the data and/or if it takes time to read the data. Other advantage: the organizers can run cross-validation experiments with not possible cheating of the participants.
- Allowing time series predictions, active learning or query learning. The ingestion program can "serve" data on demand to the code supplied by the participants. In fact, the ingestion program can even generate artificial data, if needed!
However you may not need it, read on:
If you are organizing a challenge with RESULT submission (no participant-supplied code executed on the challenge platform), you should not supply an ingestion program. The participants should submit a zip file with prediction result and NO metadata
file.
If your participants must supply executables, you do not necessarily need to supply an ingestion program. The challenge platform will execute any submission that comes with a metadata
file. This file need to include the command to be executed, e.g.:
command: python $program/run.py $input $output
You must supply a so-called "Ingestion Program" that reads data and participants' submissions if you ask the participants for instance to supply a python class model.py
that is NOT executable. Your ingestion program will then read the data and call that class to train and test the predictive model. We provide an example of ingestion program for the Iris challenge.
The following logic is implemented:
- If participant submission has no "metadata" file:
- treat the submission as a result submission and forward it to the scoring program
- else: # treat the submission as a code submission
- If the organizers did NOT provide an ingestion program:
- execute the code submission of the participants (according to the command in its metadata file)
- else: # organizer-supplied ingestion program
- execute the ingestion program (via its metadata command)
- execute simultaneously the code submission of the participants (if there is a command in its metadata file)
- If the organizers did NOT provide an ingestion program:
If an ingestion program is supplied by the organizers and the code of the participants is executable, this allows both codes to be run simultaneously and exchange data (input data from the ingestion program to the participant's program and results the other way around). This happens via the $shared
directory.
This feature can help implement competitions in which data is not provided all at once to the code of the participants. This includes implementing:
- cross-validation
- time series prediction
- on-line learning
- active or query learning
- iterative experimental design
- reinforcement learning
The following arguments are available to the various programs. All arguments are DIRECTORIES.
command: python $ingestion_program/test.py $ingestion_program $input $output $hidden $shared $submission_program
-
$ingestion_program
directory where the ingestion program is located. -
$input
input data directory. -
$output
output directory (where predictions are written). -
$hidden
reference data directory. -
$shared
directory shared with the participant's code (which is executed simultaneously). -
$submission_program
directory of the code being ran -- if this is during the scoring phase, it will be the scoring program
command: python $program/code.py $program $input $output $shared $submission_program
-
$program
directory of the submitted code. -
$input
input data directory. -
$output
output directory (where predictions are written). -
$shared
directory shared with the participant's code (which is executed simultaneously). -
$submission_program
directory of the code submitted by the participants.
command: python $program/score.py $input $output
-
$program
directory of the scoring program. -
$input
input data directory. It contains 2 subdirectoriesref/
andres/
containing the solutions and the predictions respectively. -
$output
output directory (where scores are written). -
$hidden
hidden reference data directory, only available if ingestion is ran during scoring program
A simple test example is provided in the Yello World competition.