This is a guide on what to do after you have created a new task repository from the template. More in depth information about how to create a new task can be found in the OpenProblems Documentation.
A list of required software to start developing a new task can be found in the OpenProblems Requirements.
- Update the
name
field to the name of the task in snake_case the name should start withtask_
. - Update the
description
field to a short description of the task. - Add a keyword to the
keywords
field that describes the task. - Update the
task_template
in the links fields to the name of the task from step 1. - Update the info fields to the text from the task issue.
- Replace the task_template to the name of the task in
info.name
. - Update the authors of the task.
After performing the steps above you can remove the comments in the _viash.yaml
file.
Check the instructions here for more information.
The OpenProblems team has provided some test resources that can be used to test the task. These resources are stored in the resources_test
folder. The scripts/download_resources.sh
script can be used to download these resources.
If these resources are not sufficient, you can add more resources to the resources_test
folder. The scripts/download_resources.sh
script can be updated to download these resources. When using new test resources let the OP team know so they can be added to the s3 bucket.
scripts/download_resources.sh
API files (docs)
Update the API files in the src/api
folder. These files define the input and output of the methods and metrics.
Components (docs)
To create a component, you can run the respective script in the script
directory. Before running the script make sure to update the variables task_name
, component_name
and component_lang
and save the file. For additional components you will only need to update the component_name
and component_lang
variables.
scripts/add_a_control_method.sh
scripts/add_a_method.sh
scripts/add_a_metric.sh
For each type of component there already is a first component created that you can modify.
- Update the
.info
fields in theconfig.vsh.yaml
. - Add any component specific arguments to the
config.vsh.yaml
file. - Add any additional resources that are required for the component.
- Update the docker engine image setup if additional packages are required.
- If you know the required memory and or CPU you can adjust the nextflow
.directive.labels
field. In addition if your component requires a GPU you can add thegpu
label to the field. - Update the
script.py
orscript.R
file with the code for the component.
Note
You can remove the comments in the config.vsh.yaml
file after you have updated the file.
Testing Components (docs)
You can test the component by running the following command:
viash test /path/to/config.vsh.yaml
Y0u can also test all components by running the following command:
scripts/test_all_components.sh
It is possible to customise the command in the above script by adding a -q
argument to only perform the test on for example methods e.g. -q methods
.
Dataset processor (docs)
The dataset processor is a script that removes all unnecessary info from the dataset for your task. This info is defined in the api/file_common_dataset.yaml
file. From this filtered dataset several files are created that are used by the methods and metrics. Safeguarding data leaks and laking sure the structure of the data cannot be altered for a method or a metric.
To create the data processor there is no template available. You can follow the guideline in the documentation. Store the processor in the src/process_dataset
folder.
Be sure to update the file_common_dataset.yaml
file with the correct information required for the methods/metrics.
Important
When using your own datasets please advise the OpenProblems team on how to add these datasets to the s3 bucket.
As the dataset processor should make use of the common
datasets folder in the resources
or resources_test
directory.
To create the resources and test_resources for the task we will create a nextflow workflow that will process the datasets. This workflow will be created together with the OpenProblems team.
To create the task README
file preform following command:
scripts/create_readme.sh
Benchmarking (docs)
When you are finished with creating your components and dataset processor you can create a workflow to benchmark the components. This workflow will be created together with the OpenProblems team.