Merge branch 'master' into jpfeuffer-patch-6

OpenMS · Feb 25, 2024 · 075dff7 · 075dff7
2 parents ad83953 + 6a0aab2
commit 075dff7
Show file tree

Hide file tree

Showing 40 changed files with 1,931 additions and 62 deletions.
diff --git a/.github/workflows/build-push-notebooks.yaml b/.github/workflows/build-push-notebooks.yaml
@@ -1,9 +1,9 @@
 name: Building latest jupyter Notebooks and push to master+ipynb
 
 on:
-  schedule:
+  #schedule:
     # Trigger 5:30 UTC
-    - cron:  '30 5 * * *'
+  #  - cron:  '30 5 * * *'
   push:
     branches: [ master, merge-workflows ]
   workflow_dispatch:

diff --git a/.github/workflows/code-blocks-linting.yaml b/.github/workflows/code-blocks-linting.yaml
@@ -16,7 +16,7 @@ jobs:
 
     - name: Get changed files
       id: changed-files
-      uses: tj-actions/changed-files@v35
+      uses: tj-actions/changed-files@v41
       with:
         dir_names_max_depth: 0
         files: |

diff --git a/.github/workflows/test-pr.yaml b/.github/workflows/test-pr.yaml
@@ -19,7 +19,7 @@ jobs:
 
     - name: Get changed files
       id: changed-files
-      uses: tj-actions/changed-files@v35
+      uses: tj-actions/changed-files@v41
       with:
           dir_names_max_depth: 0
           files: |

diff --git a/README.md b/README.md
@@ -34,4 +34,4 @@ Are created by CI and stored in master+ipynb to not clutter the master branch.
 
 Binder integration
 =============
-Binder uses the Jupyter Notebooks in master+ipynb. The conda environment is described in environment.yml, the post-build event installs the nightly pyopenms wheel. Currently, only environment.yml is used by binder. Note: You can test a branch "jpfeuffer-patch-6" using https://mybinder.org/v2/gh/OpenMS/pyopenms-docs/jpfeuffer-patch-6 
+Binder uses the Jupyter Notebooks in master+ipynb. The conda environment is described in environment.yml, the post-build event installs the nightly pyopenms wheel. Currently, only environment.yml is used by binder. Note: You can test a branch "jpfeuffer-patch-6" using https://notebooks.gesis.org/binder/v2/gh/OpenMS/pyopenms-docs/jpfeuffer-patch-6 
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1,7 +1,7 @@
 # Defining the exact version will make sure things don't break
 sphinx==6.1.0
 pydata_sphinx_theme
-readthedocs-sphinx-search==0.3.1
+readthedocs-sphinx-search==0.3.2
 sphinx-copybutton==0.5.1
 sphinx-hoverxref
 sphinx-remove-toctrees

diff --git a/docs/source/_templates/navbar-run-binder.html b/docs/source/_templates/navbar-run-binder.html
@@ -1,6 +1,6 @@
 <ul class="navbar-icon-links navbar-nav" aria-label="Custom Icon Links">
   <li class="nav-item">
-    <a href="https://mybinder.org/v2/gh/{{ github_user }}/{{ github_repo }}/{{ github_version }}+ipynb?urlpath=lab/tree/{{ doc_path }}{{ pagename }}.ipynb" class="nav-link" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-original-title="Launch on Binder" data-bs-placement="bottom"><span><i class="fa fa-rocket fa-beat fa-lg"></i></span>
+    <a href="https://notebooks.gesis.org/binder/v2/gh/{{ github_user }}/{{ github_repo }}/{{ github_version }}+ipynb?urlpath=lab/tree/{{ doc_path }}{{ pagename }}.ipynb" class="nav-link" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-original-title="Launch on Binder" data-bs-placement="bottom"><span><i class="fa fa-rocket fa-beat fa-lg"></i></span>
       <label class="sr-only">Launch on Binder</label></a>
   </li>
 </ul>
diff --git a/docs/source/community/build_from_source.rst b/docs/source/community/build_from_source.rst
@@ -35,22 +35,33 @@ Depending on your systems setup, it may make sense to do this inside a virtual e
     virtualenv pyopenms_venv
     source pyopenms_venv/bin/activate
 
-
-Next, configure OpenMS with pyOpenMS: execute ``cmake`` as usual, but with
-parameters ``DPYOPENMS=ON``. Also, if using virtualenv or using a specific
-Python version, add ``-DPYTHON_EXECUTABLE:FILEPATH=/path/to/python`` to ensure
+Next, we will configure the CMake-based OpenMS build system
+to enable the pyOpenMS target with the configuration option ``-DPYOPENMS=ON``.
+If your are using virtualenv or a specific Python version,
+add ``-DPYTHON_EXECUTABLE:FILEPATH=/path/to/python`` to ensure
 that the correct Python executable is used. Compiling pyOpenMS can use a lot of
 memory and take some time, however you can reduce the memory consumption by
 breaking up the compilation into multiple units and compiling in parallel, for
 example ``-DPY_NUM_THREADS=2 -DPY_NUM_MODULES=4`` will build 4 modules with 2
-threads. You can then configure pyOpenMS:
+threads. You can now configure pyOpenMS (inside your build folder) with:
 
 .. code-block:: bash
 
     cmake -DPYOPENMS=ON
-    make pyopenms
+    
+
+Remember, that you can pass the other options as described above to the first
+command by adding ``-DOPTION=VALUE`` statements if you need them.
+
+Now build pyOpenMS (now there should be pyOpenMS specific build targets).
+If you are still inside your build folder, you can use "." as the build
+folder parameter.
+
+.. code-block:: bash
+
+    cmake --build $YOURBUILDFOLDER --target pyopenms --config Release
+    
 
-Build pyOpenMS (now there should be pyOpenMS specific build targets).
 Afterwards, test that all went well by running the tests:
 
 .. code-block:: bash

diff --git a/docs/source/user_guide/adduct_detection.rst b/docs/source/user_guide/adduct_detection.rst
@@ -0,0 +1,86 @@
+Adduct Detection
+================
+
+In mass spectrometry it is crucial to ionize analytes prior to detection, because they are accelerated and manipulated in electric fields, allowing their separation based on mass-to-charge ratio.
+This happens by addition of protons in positive mode or loss of protons in negative mode. Other ions present in the buffer solution can ionize the analyte as well, e.g. sodium, potassium or formic acid.
+Depending on the size and chemical compsition, multiple adducts can bind leading to multiple charges on the analyte. In metabolomics with smaller analytes the number of charges is typically low with one or two, whereas in proteomics the number of charges is much higher.
+Furthermore, analytes can loose functional groups during ionization, e.g. a neutral water loss.
+Since the ionization happens after liquid chromatography, different adducts for an analyte have similar retention times.
+
+.. image:: img/adduct_detection.png
+
+In pyOpenMS, :py:class:`~.MetaboliteFeatureDeconvolution` takes a :term:`feature map` as input adding adduct information as additional meta values. Features belonging to an adduct group will be stored in a :term:`consensus map`. The most important parameters are explained in the comments.
+
+| **Input file generation:**
+| The input :term:`feature map` can be obtained using a `feature finder algorithm <feature_detection.html>`_.
+
+| **Suggested follow up step:**
+| The resulting feature map can be exported to a pandas DataFrame with adduct information from the *dc_charge_adducts* feature meta values.
+| Multiple feature maps can be `combined using the feature linking algorithms <feature_linking.html>`_. Each consensus feature will get a new meta value *best ion* based on the most common annotated adduct within the consensus feature group.
+
+.. code-block:: python
+
+    from urllib.request import urlretrieve
+    import pyopenms as poms
+
+    # get example data file with metabolmics feature map
+    gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
+    urlretrieve(gh + "/src/data/MetaboliteFeatureDeconvolution_input.featureXML", "example.featureXML")
+
+    # open example input feature map
+    feature_map = poms.FeatureMap()
+    poms.FeatureXMLFile().load("example.featureXML", feature_map)
+
+    # initialize MetaboliteFeatureDeconvolution
+    mfd = poms.MetaboliteFeatureDeconvolution()
+
+    # get default parameters
+    params = mfd.getDefaults()
+    # update/explain most important parameters
+
+    # adducts to expect: elements, charge and probability separated by colon
+    # the total probability of all charged adducts needs to be 1
+    # e.g. positive mode: 
+    # proton dduct "H:+:0.6", sodium adduct "Na:+:0.4" and neutral water loss "H-2O-1:0:0.2"
+    # e.g. negative mode:
+    # with neutral formic acid adduct: "H-1:-:1", "CH2O2:0:0.5"
+    # multiples don't need to be specified separately:
+    # e.g. [M+H2]2+ and double water loss will be detected as well!
+    # optionally, retention time shifts caused by adducts can be added
+    # e.g. a formic acid adduct causes 3 seconds earlier elution "CH2O2:0:0.5:-3"
+    params.setValue("potential_adducts", ["H:+:0.6", "Na:+:0.4", "H-2O-1:0:0.2"])
+
+    # expected charge range
+    # e.g. for positive mode metabolomics:
+    # minimum of 1, maximum of 3, maximum charge span for a single feature 3
+    # for negative mode:
+    # charge_min = -3, charge_max = -1
+    params.setValue("charge_min", 1, "Minimal possible charge")
+    params.setValue("charge_max", 3, "Maximal possible charge")
+    params.setValue("charge_span_max", 3)
+
+    # maximum RT difference between any two features for grouping
+    # maximum RT difference between between two co-features, after adduct shifts have been accounted for 
+    # (if you do not have any adduct shifts, this value should be equal to "retention_max_diff")
+    params.setValue("retention_max_diff", 3.0)
+    params.setValue("retention_max_diff_local", 3.0)
+
+    # set updated paramters object
+    mfd.setParameters(params)
+
+    # result feature map: will store features with adduct information
+    feature_map_MFD = poms.FeatureMap()
+    # result consensus map: will store grouped features belonging to a charge group
+    groups = poms.ConsensusMap()
+    # result consensus map: will store paired features connected by an edge
+    edges = poms.ConsensusMap()
+
+    # compute adducts
+    mfd.compute(feature_map, feature_map_MFD, groups, edges)
+
+    # export feature map as pandas DataFrame and append adduct information
+    df = feature_map_MFD.get_df(export_peptide_identifications=False)
+    df["adduct"] = [f.getMetaValue("dc_charge_adducts") for f in feature_map_MFD]
+
+    # display data
+    print(df.head())
diff --git a/docs/source/user_guide/charge_isotope_deconvolution.rst b/docs/source/user_guide/charge_isotope_deconvolution.rst
@@ -4,7 +4,7 @@ Charge and Isotope Deconvolution
 A single mass spectrum contains measurements of one or more analytes and the
 m/z values recorded for these analytes. Most analytes produce multiple signals
 in the mass spectrometer, due to the natural abundance of carbon :math:`13` (naturally
-occurring at ca. :math:`1%` frequency) and the large amount of carbon atoms in most
+occurring at ca. :math:`1\%` frequency) and the large amount of carbon atoms in most
 organic molecules, most analytes produce a so-called isotopic pattern with a
 monoisotopic peak (all carbon are :chem:`^{12}C`) and a first isotopic peak (exactly one
 carbon atom is a :chem:`^{13}C`), a second isotopic peak (exactly two atoms are :chem:`^{13}C`) etc.

diff --git a/docs/source/user_guide/chemistry.rst b/docs/source/user_guide/chemistry.rst
@@ -137,7 +137,7 @@ The isotope distribution of oxygen and sulfur can be displayed with the followin
     from matplotlib import pyplot as plt
 
 
-    # very simple overlappping correction of annotations
+    # very simple overlapping correction of annotations
     def adjustText(x1, y1, x2, y2):
         if y1 > y2:
             plt.annotate(

diff --git a/docs/source/user_guide/export_files_GNPS.rst b/docs/source/user_guide/export_files_GNPS.rst
@@ -5,7 +5,7 @@ With pyOpenMS you can automatically generate all files needed for GNPS Feature-B
 Ion Identity Molecular Networking (IIMN).
 
 Pre-requisites are your input :term:`mzML` files and a :py:class:`~.ConsensusMap`, generated by an
-`untargeted metabolomics pre-processing workflow <metabolomics_preprocessing.html>`_. 
+`untargeted metabolomics pre-processing workflow <untargeted_metabolomics_preprocessing.html>`_. 
 Ensure that :term:`MS2` data has been mapped to the :py:class:`~.FeatureMap` objects with :py:class:`~.IDMapper`.
 For IIMN adduct detection must have been performed on the :py:class:`~.FeatureMap` 
 objects during pre-processing with :py:class:`~.MetaboliteFeatureDeconvolution`.

diff --git a/docs/source/user_guide/feature_detection.rst b/docs/source/user_guide/feature_detection.rst
@@ -13,7 +13,6 @@ FeatureFinders are available in pyOpenMS:
   - :py:class:`~.FeatureFinderMultiplexAlgorithm` (e.g., :term:`SILAC`, Dimethyl labeling, (and label-free), identification free feature detection of peptides)
   - :py:class:`~.FeatureFinderAlgorithmPicked` (Label-free, identification free feature detection of peptides)
   - :py:class:`~.FeatureFinderIdentificationAlgorithm` (Label-free identification-guided feature detection of peptides)
-  - :py:class:`~.FeatureFinderAlgorithmIsotopeWavelet` (old instruments)
   - :py:class:`~.FeatureFindingMetabo` (Label-free, identification free feature detection of metabolites)
   - :py:class:`~.FeatureFinderAlgorithmMetaboIdent` (Label-free, identification guided feature detection of metabolites)
 

diff --git a/docs/source/user_guide/fragment_spectrum_generation.rst b/docs/source/user_guide/fragment_spectrum_generation.rst
@@ -52,7 +52,6 @@ which you could plot with :py:meth:`pyopenms.plotting.plot_spectrum`, automatica
 
     import matplotlib.pyplot as plt
     from pyopenms.plotting import plot_spectrum
-    import matplotlib.pyplot as plt
 
     plot_spectrum(spec1)
     plt.show()
@@ -122,10 +121,6 @@ which you can again visualize with:
 .. code-block:: python
     :linenos:
 
-    import matplotlib.pyplot as plt
-    from pyopenms.plotting import plot_spectrum
-    import matplotlib.pyplot as plt
-
     plot_spectrum(spec2, annotate_ions=False)
     plt.show()
 

diff --git a/docs/source/user_guide/img/DFPIANGER_theo.png b/docs/source/user_guide/img/DFPIANGER_theo.png
diff --git a/docs/source/user_guide/img/DFPIANGER_theo_full.png b/docs/source/user_guide/img/DFPIANGER_theo_full.png
diff --git a/docs/source/user_guide/img/DFPIANGER_theo_full_mirror.png b/docs/source/user_guide/img/DFPIANGER_theo_full_mirror.png
diff --git a/docs/source/user_guide/img/adduct_detection.png b/docs/source/user_guide/img/adduct_detection.png
diff --git a/docs/source/user_guide/img/after_normalization.png b/docs/source/user_guide/img/after_normalization.png
diff --git a/docs/source/user_guide/img/after_normalization_TIC.png b/docs/source/user_guide/img/after_normalization_TIC.png
diff --git a/docs/source/user_guide/img/before_normalization.png b/docs/source/user_guide/img/before_normalization.png
diff --git a/docs/source/user_guide/img/nlargest.png b/docs/source/user_guide/img/nlargest.png
diff --git a/docs/source/user_guide/img/spec_alignment_1.png b/docs/source/user_guide/img/spec_alignment_1.png
diff --git a/docs/source/user_guide/img/spec_alignment_2.png b/docs/source/user_guide/img/spec_alignment_2.png
diff --git a/docs/source/user_guide/img/spec_averaging.png b/docs/source/user_guide/img/spec_averaging.png
diff --git a/docs/source/user_guide/img/spec_merging_1.png b/docs/source/user_guide/img/spec_merging_1.png
diff --git a/docs/source/user_guide/img/spec_merging_2.png b/docs/source/user_guide/img/spec_merging_2.png
diff --git a/docs/source/user_guide/img/spec_merging_3.png b/docs/source/user_guide/img/spec_merging_3.png
diff --git a/docs/source/user_guide/img/threshold_mower.png b/docs/source/user_guide/img/threshold_mower.png
diff --git a/docs/source/user_guide/img/window_mower.png b/docs/source/user_guide/img/window_mower.png
diff --git a/docs/source/user_guide/index.rst b/docs/source/user_guide/index.rst
@@ -43,9 +43,11 @@ headings and structure.
    smoothing
    centroiding
    spectrum_normalization
+   spectrum_merging
    charge_isotope_deconvolution
    feature_detection
    map_alignment
+   adduct_detection
    feature_linking
    peptide_search
    chromatographic_analysis

diff --git a/docs/source/user_guide/interactive_plots.rst b/docs/source/user_guide/interactive_plots.rst
@@ -94,7 +94,7 @@ Result:
 
 
 With this you can also easily create whole dashboards like the one
-hosted `here <https://mybinder.org/v2/gh/OpenMS/pyopenms-docs/master+ipynb?urlpath=msbokehapps>`_ on a Binder instance.
+hosted `here <https://notebooks.gesis.org/binder/v2/gh/OpenMS/pyopenms-docs/master+ipynb?urlpath=msbokehapps>`_ on a Binder instance.
 If you are reading/executing this on Binder already, execute the next cell to get a link to your current instance.
 
 .. code-block:: python

diff --git a/docs/source/user_guide/ms_data.rst b/docs/source/user_guide/ms_data.rst
@@ -188,7 +188,6 @@ We can also visualize our mass spectrum from before using the :py:func:`~.plot_s
 
     import matplotlib.pyplot as plt
     from pyopenms.plotting import plot_spectrum
-    import matplotlib.pyplot as plt
 
     plot_spectrum(spectrum)
     plt.show()
@@ -639,8 +638,8 @@ But first, we will load some test data:
     oms.MzMLFile().load("test.mzML", inp)
 
 
-Filtering Mass Spectra by :term`MS` Level
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Filtering Mass Spectra by MS Level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 We will filter the data from ``test.mzML`` file by only retaining
 mass spectra that are not :term:`MS1` spectra
@@ -707,4 +706,111 @@ Similarly we could only retain peaks above a certain
 intensity or keep only the top N peaks in each mass spectrum.
 
 For more advanced filtering tasks pyOpenMS provides special algorithm classes.
-We will take a closer look at some of them in the algorithm section.
+We will take a closer look at some of them in the next section.
+
+
+Filtering Mass Spectra with TOPP Tools
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We can also use predefined TOPP tools to filter our data. First we need to load in the data:
+
+.. code-block:: python
+    :linenos:
+
+    import matplotlib.pyplot as plt
+    from pyopenms.plotting import plot_spectrum, mirror_plot_spectrum
+
+    gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
+    urlretrieve(
+        gh + "/src/data/YIC(Carbamidomethyl)DNQDTISSK.mzML", "observed.mzML"
+    )
+
+    exp = oms.MSExperiment()
+    # Load mzML file and obtain spectrum for peptide YIC(Carbamidomethyl)DNQDTISSK
+    oms.MzMLFile().load("observed.mzML", exp)
+
+    # Get first spectrum
+    spectra = exp.getSpectra()
+    observed_spectrum = spectra[0]
+
+The :py:class:`~.WindowMower` tool can be used to remove peaks in a sliding or jumping window. The window size,
+number of highest peaks to keep and move type can be set with a :py:class:`~.Param` object
+
+.. code-block:: python
+    :linenos:
+
+    from copy import deepcopy
+
+    window_mower_filter = oms.WindowMower()
+
+    # Copy the original spectrum
+    mowed_spectrum = deepcopy(observed_spectrum)
+
+    # Set parameters
+    params = oms.Param()
+    # Defines the m/z range of the sliding window
+    params.setValue("windowsize", 100.0, "")
+    # Defines the number of highest peaks to keep in the sliding window
+    params.setValue("peakcount", 1, "")
+    # Defines the type of window movement: jump (window size steps) or slide (one peak steps)
+    params.setValue("movetype", "jump", "")
+
+    # Apply window mowing
+    window_mower_filter.setParameters(params)
+    window_mower_filter.filterPeakSpectrum(mowed_spectrum)
+
+    # Visualize the resulting data together with the original spectrum
+    mirror_plot_spectrum(observed_spectrum, mowed_spectrum)
+    plt.show()
+
+.. image:: img/window_mower.png
+
+
+Noise can be easily removed with :py:class:`~.ThresholdMower` by setting a threshold value for the intensity of peaks
+and cutting off everything below.
+
+.. code-block:: python
+    :linenos:
+
+    # Copy spectrum
+    threshold_mower_spectrum = deepcopy(observed_spectrum)
+
+    threshold_mower_filter = oms.ThresholdMower()
+
+    # Set parameters
+    params = oms.Param()
+    params.setValue("threshold", 20.0, "")
+
+    # Apply threshold mowing
+    threshold_mower_filter.setParameters(params)
+    threshold_mower_filter.filterPeakSpectrum(threshold_mower_spectrum)
+
+    mirror_plot_spectrum(observed_spectrum, threshold_mower_spectrum)
+    plt.show()
+
+.. image:: img/threshold_mower.png
+
+
+We can also use e.g. :py:class:`~.NLargest` to keep only the N highest peaks in a spectrum.
+
+.. code-block:: python
+    :linenos:
+
+    # Copy spectrum
+    nlargest_spectrum = deepcopy(observed_spectrum)
+
+    nlargest_filter = oms.NLargest()
+
+    # Set parameters
+    params = oms.Param()
+    params.setValue("n", 4, "")
+
+    # Apply N-Largest filter
+    nlargest_filter.setParameters(params)
+    nlargest_filter.filterPeakSpectrum(nlargest_spectrum)
+
+    mirror_plot_spectrum(observed_spectrum, nlargest_spectrum)
+    plt.show()
+    # Two peaks are overlapping, so only three peaks are really visible in the plot
+
+.. image:: img/nlargest.png