Merge pull request #614 from subinamehta/clinical-mp

Add clinicalmp discovery workflow
galaxyproject · Dec 9, 2024 · 5ed1732 · 5ed1732
2 parents 490160d + d66c8c4
commit 5ed1732
Show file tree

Hide file tree

Showing 6 changed files with 1,411 additions and 0 deletions.
diff --git a/workflows/proteomics/clinicalmp/clinicalmp-discovery/.dockstore.yml b/workflows/proteomics/clinicalmp/clinicalmp-discovery/.dockstore.yml
@@ -0,0 +1,11 @@
+version: 1.2
+workflows:
+- name: main
+  subclass: Galaxy
+  publish: true
+  primaryDescriptorPath: /iwc-clinicalmp-discovery-workflow.ga
+  testParameterFiles:
+  - /iwc-clinicalmp-discovery-workflow-tests.yml
+  authors:
+  - name: Subina Mehta
+    orcid: 0000-0001-9818-0537
diff --git a/workflows/proteomics/clinicalmp/clinicalmp-discovery/CHANGELOG.md b/workflows/proteomics/clinicalmp/clinicalmp-discovery/CHANGELOG.md
@@ -0,0 +1,4 @@
+# Changelog
+
+## [0.1] 2024-11-18
+First release.
diff --git a/workflows/proteomics/clinicalmp/clinicalmp-discovery/README.md b/workflows/proteomics/clinicalmp/clinicalmp-discovery/README.md
@@ -0,0 +1,25 @@
+# Clinical Metaproteomics 2: Discovery
+
+Discovery in clinical metaproteomics is greatly enhanced by using a well-curated database, particularly one generated with the **MetaNovo tool**. This tool creates a manageable and streamlined database by identifying proteins relevant to the dataset, reducing the complexity of downstream analysis. For optimal results, the MetaNovo-generated database can be merged with reviewed proteins from **Human SwissProt** and known contaminants from the **cRAP (common Repository of Adventitious Proteins)** database, resulting in a compact yet comprehensive database of approximately 21,200 protein sequences. This refined database serves as the foundation for peptide identification, where mass spectrometry (MS) data is matched against the database to identify relevant peptides efficiently and accurately. By reducing redundancy and focusing on clinically relevant sequences, this approach improves the discovery of biomarkers and key protein insights, allowing researchers to extract meaningful biological information with reduced noise and false positives. This streamlined process is particularly valuable in clinical studies, where precision and relevance are critical for advancing diagnostics and therapeutic research.
+
+In this current workflow, we perform Discovery using the SearchGUI and MaxQuant tools. A GTN has been developed for this workflow.
+[https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/clinical-mp-2-discovery/tutorial.html](https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/clinical-mp-2-discovery/tutorial.html)
+
+## Inputs dataset
+
+- `MSMS datasets` in RAW dataset collection
+- `Databases for discovery` in Fasta (protein sequences for database searching)
+- `Experimental-Design Discovery MaxQuant` in Tabular Format 
+
+## Inputs values
+
+For MaxQuant and SearchGUI/PeptideShaker 
+- Peptide Length
+- Variable modifications
+- Labeled element
+
+
+## Processing
+
+- extract microbial proteins and peptides using text formating tools
+- Grouping duplicates using the Group tool
diff --git a/...ws/proteomics/clinicalmp/clinicalmp-discovery/iwc-clinicalmp-discovery-workflow-tests.yml b/...ws/proteomics/clinicalmp/clinicalmp-discovery/iwc-clinicalmp-discovery-workflow-tests.yml
@@ -0,0 +1,24 @@
+- doc: Test outline for iwc-clinicalmp-discovery-workflow
+  job:
+    Human UniProt Microbial Proteins from MetaNovo and cRAP:
+      class: File
+      location: https://zenodo.org/records/10720030/files/Human_UniProt_Microbial_Proteins_(from_MetaNovo)_and_cRAP.fasta
+      filetype: fasta
+    Experimental Design Discovery MaxQuant:
+      class: File
+      path: test-data/Experimental Design Discovery MaxQuant.tabular
+      filetype: tabular
+    Tandem Mass Spectrometry MSMS files:
+      class: Collection
+      collection_type: list
+      elements:
+      - class: File
+        identifier: PTRC_Skubitz_Plex2_F10_9Aug19_Rage_Rep-19-06-08.raw
+        location: https://zenodo.org/records/14182981/files/PTRC_Skubitz_Plex2_F10_9Aug19_Rage_Rep-19-06-08.raw
+  outputs:
+    SGPS MQ Peptides:
+      asserts:
+        - has_n_columns:
+            n: 1
+        - has_text: 
+            text: "AAFPNVTAMNITTNNGK"