Skip to content

Commit

Permalink
Merge pull request #46 from MetaboHUB-MetaToul-FluxoMet/31-developmen…
Browse files Browse the repository at this point in the history
…t-guide

Add guidelines for further development
  • Loading branch information
elliotfontaine authored Jul 22, 2024
2 parents 8e00c2f + ad8e43e commit e3517e5
Show file tree
Hide file tree
Showing 17 changed files with 573 additions and 15 deletions.
3 changes: 2 additions & 1 deletion cylc-src/bioreactor-workflow/bin/get-instrument
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/usr/bin/env python

import os, sys
import os
import sys
from pathlib import Path

from pyopenms import MzMLFile, MSExperiment
Expand Down
11 changes: 4 additions & 7 deletions cylc-src/bioreactor-workflow/flow.cylc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
URL = https://github.com/MetaboHUB-MetaToul-FluxoMet/RTMet

# Create task families for conda environments.
%include 'envs/conda.cylc'
%include 'envs/conda.cylc'

[scheduling]
cycling mode = integer
Expand All @@ -35,10 +35,6 @@
annotate => upload_features
quantify => upload_concentrations
"""
# +P3/P1 = """
# compute_fluxes => upload_fluxes
# compute_fluxes => upload_metadata
# """
{% endif %}
[[queues]]
[[[default]]]
Expand Down Expand Up @@ -151,12 +147,13 @@
[[[meta]]]
title = Trim Spectra
description = """
Remove the first and last `n_start` and `n_end` spectra from the mzML file.
Remove the first `n_start` and last `n_end` scans from the mzML file. This is useful
if the shape of the flowgram is not stable at the beginning or end of the run.
"""
categories = bioinformatics

[[get_timestamp]]
# Regex should be replaced with XML parsing.
# Regex could be replaced by pyOpenMS.
script = """
RUN_TIMESTAMP=$(grep '<run' ${mzml} | sed 's/.*startTimeStamp="\([^"]*\)".*/\1/')
Expand Down
4 changes: 2 additions & 2 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
sphinx-rtd-theme==2.0.0
sphinx-design==0.6.0
metomi-rose
cylc-flow
cylc-flow==8.3.0
metomi-rose==2.3.0
cylc-sphinx-extensions
Binary file added docs/source/_static/graphs/added-task-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@
"exec",
]

pygments_style = "dracula" # 🧛🏻‍♂️

templates_path = ["_templates"]
exclude_patterns = []

Expand Down
60 changes: 60 additions & 0 deletions docs/source/development/add_config_option.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
.. _development.add-config_option:

====================================
Adding an item to user configuration
====================================

.. note::
Prerequisites:
* :ref:`tutorial.user-config`
* :ref:`reference.user-config`
* :ref:`development.add-task`

Write a new item in :rose:file:`rose-suite.conf`
================================================

ThermoRawFileParser can output the metadata in text or json format. Right now, the workflow only
outputs metadata in json. We can give the user the option to choose between the two formats.

At the end of the :strong:`[template variables]` section, add the following line:

.. code-block:: ini
:caption: :file:`rose-suite.conf`
# ...
cfg__raw_meta_format = txt
Use the template variable in the workflow definition
====================================================

In the :strong:`[validate_cfg]` task, change the :strong:`metadata` environment variable to:

.. code-block:: jinja
:caption: :file:`flow.cylc`
[runtime]
[[convert_raw]]
[[[environment]]]
- metadata = json
+ metadata = {{ cfg__raw_meta_format }}
During run installation, the value will now be replaced by the one set in :rose:file:`rose-suite.conf`.
If you want to change the value at runtime, you can follow the instructions in :ref:`tutorial.user-config`.

Validate the new configuration item
===================================

Rose (the configuration manager) allows us to validate the user configuration. It is done at runtime
at cyclepoint 0 with the :strong:`[validate_cfg]` task. Let's add a new validation rule for our item.
Locate the :file:`meta/rose-meta.conf` file in the workflow source directory, and add the following:

.. code-block:: ini
[template variables=cfg__raw_meta_format]
compulsory=true
type=character
values='json', 'txt'
The :strong:`[validate_cfg]` will now check that the value of :strong:`cfg__raw_meta_format` is
either 'json' or 'txt', and that the item is indeed present.

149 changes: 149 additions & 0 deletions docs/source/development/add_task.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
.. _development.add-task:

=============================
Adding a task to the workflow
=============================

In this tutorial, we will see how to add a new task to the workflow. We will use the example of a
task that extract the number of scans from a mzML file, using the pyOpenMS library.

Adding a python script to the workflow executables
==================================================

In :file:`cylc-src/bioreactor-workflow/bin/`, create a new file named :file:`get-scans-number` and
paste the following content:

.. code-block:: python
:caption: :file:`bin/get-scans-number`
#!/usr/bin/env python
import os
import sys
from pathlib import Path
from pyopenms import MzMLFile, MSExperiment
MZML = os.getenv("mzml")
def main():
"""
Usage:
./get-scans-number
Get number of scans from mzML file. `$mzml` shell
environment variable must be set to the path of the file.
"""
exp = MSExperiment()
MzMLFile().load(MZML, exp)
sys.stdout.write(str(exp.getNrSpectra()))
if __name__ == "__main__":
if len(sys.argv) > 1:
sys.stderr.write(main.__doc__)
elif not MZML:
sys.stderr.write("$mzml environment variable not set.\n")
sys.exit()
elif not Path(MZML).exists():
sys.stderr.write(f"mzML file not found: {MZML}\n")
sys.exit()
main()
Make the script executable:

.. code-block:: console
$ chmod +x get-scans-number
Creating a new task in the [runtime] section
================================================

Open :file:`cylc-src/bioreactor-workflow/flow.cylc` and add the following task definition at the end:

.. code-block:: cylc
:caption: :file:`flow.cylc`
:emphasize-lines: 3-
[runtime]
# ...
[[get_scans_number]]
# The task will run in the wf-openms conda environment
# Adding None makes the task appear at the root in the TUI/GUI
inherit = None, CONDA_OPENMS
script = """
echo "The script lauched by this task will extract the number of scans from the mzML file."
get-scans-number > ${output_file}
echo "The number of scans has been saved to ${output_file}"
echo "Number of scans: $(cat ${output_file})"
"""
[[[environment]]]
# The python script will use the $mzml environment
# variable to get the path of the file.
mzml = ${MAIN_RESULTS_DIR}/${RAWFILE_STEM}.mzML
output_file = ${MAIN_RESULTS_DIR}/scans_number.txt
This task will run the :file:`get-scans-number` script and save the output to a file named
:file:`scans_number.txt` in the main results directory. This directory
(:file:`share/cycle/n/dataflow/`) is specific to each cyclepoint ``n``.

Adding the task to the graph
============================

Add a new graph string to the :strong:`+P1/P1` recurrence, inside the :strong:`[graph]` section
of the workflow definition:

.. code-block:: cylc
:caption: :file:`flow.cylc`
:emphasize-lines: 8
[[graph]]
R1/^ = validate_cfg => validate_compounds_db & validate_met_model => is_setup
R1/+P1 = convert_raw => get_instrument => extract_features
+P1/P1 = """
is_setup[^] => _catch_raw
@catch_raw => _catch_raw => convert_raw => get_timestamp &
trim_spectra => extract_features => annotate => quantify
convert_raw => get_scans_number
"""
The task will be executed for each cyclepoint (/P1) starting from the second one (+P1). It will run after the
:strong:`convert_raw` task as it depends on the mzML file generated by it. No other task depends on
the one we just added.

You can check that the task has been added correctly by running:

.. code-block:: console
$ cylc graph bioreactor-workflow 0 1
.. figure:: /_static/graphs/added-task-graph.png
:alt: Graph with the new task added
:scale: 50%
:align: center

Testing the new task
====================

Install and start a new run of the workflow, and add a mzML file to the :file:`raws/` directory. The task should
start immediately after the :strong:`convert_raw` task and generate a :file:`scans_number.txt` file
in the :file:`cylc-run/your_run_name/share/cycle/1/dataflow/` directory.

.. code-block:: output
:caption: :file:`job.out` in logs
Workflow : bioreactor-workflow/task-added
Job : 1/get_scans_number/01 (try 1)
User@Host: [email protected]
2024-07-22T14:18:50+02:00 INFO - started
The script lauched by this task will extract the number of scans from the mzML file.
The number of scans has been saved to /Users/elliotfontaine/cylc-run/bioreactor-workflow/task-added/share/cycle/1/dataflow/scans_number.txt
Number of scans: 35
2024-07-22T14:18:52+02:00 INFO - succeeded
77 changes: 77 additions & 0 deletions docs/source/development/coding_style.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
.. _development.coding-style:

============
Coding style
============

:file:`bin/` scripts: environment variables or command line arguments?
======================================================================

When writing scripts (Python, R, Bash) for the workflow, you have the choice between loading
environment variables from inside the script, or parsing command line arguments.

As a rule of thumb, use environment variables when you don't expect the script to be reused outside
the workflow, and command line arguments with strong input validation when you want to make the script
more portable.

Cylc
====

In general, follow Cylc :doc:`cylc:workflow-design-guide/style-guide`. When creating tasks,
set the :strong:`[meta]` title and description fields to describe what the task does. You can also
add custom field like :strong:`categories` if you want.

Use uppercase for:
* family tasks (notably the conda ones, e.g. :strong:`CONDA_OPENMS`),
* global environment variables set in :strong:`[runtime][root]` and broadcasted ones (e.g.
:strong:`RAWFILE_STEM`).

Use lowercase for:
* local environment variables set in :strong:`[environment]` blocks inside tasks.
* task names.

Add :strong:`None` before the name of inherited family tasks to make the task in question appear at
the root when using the TUI or GUI. Otherwise, the task will be nested under the family task. The
exception are InfluxDB tasks, which are always nested under the :strong:`INFLUXDB` family task.


When using global environment variables or Jinja2 template variables to build CLI arguments,
do it in the :strong:`[environment]` block of the task, not in the script itself:

.. code-block:: cylc
:caption: :file:`flow.cylc`
:emphasize-lines: 4, 7-9
[[trim_spectra]]
inherit = None, CONDA_OPENMS
script = """
trimms ${mzml} ${n_start} ${n_end}
"""
[[[environment]]]
mzml = ${MAIN_RESULTS_DIR}/${RAWFILE_STEM}.mzML
n_start = {{ cfg__trim_values[0] }}
n_end = {{ cfg__trim_values[1] }}
[[[meta]]]
title = Trim Spectra
description = """
Remove the first `n_start` and last `n_end` scans from the mzML file. This is useful
if the shape of the flowgram is not stable at the beginning or end of the run.
"""
categories = bioinformatics
Python
======

Python code should follow the `PEP 8`_ style guide. The `Black`_ code formatter should be used to
automatically format the code.

You should also use a linter / static code analyser like `Pylint`_ to catch potential bugs, commented
out code, code smells, etc.

Bash
====
[TODO]

R
=
[TODO]
28 changes: 28 additions & 0 deletions docs/source/development/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
.. _development:

===========
Development
===========

Here are discussed some of the choices made during the development of the project (coding styles for
different languages, pattern used in Cylc, etc).

You'll also find some guidelines on how to add a new task or configuration option to the workflow.

.. note::
It is assumed that you have a basic understanding of:
* Cylc,
* Python, R and Bash.

For further information on Cylc, please consult their :ref:`cylc:user guide`.

.. toctree::
:maxdepth: 2

workflow_design
coding_style
add_task
add_config_option



Loading

0 comments on commit e3517e5

Please sign in to comment.