MS2-Autoencoder

MS2 Autoencoder is built on Keras for Python. The purpose of MS2 Autoencoder is to create a generalized model of MS2 spectra so that any low quality spectra can be upscaled to a high quality spectra (with quality being baed on precursor intensity). The direct general application of this tool is denoising spectra.

Tools

Miniconda or Anaconda
NextFlow

Imports

*sklearn

pyteomics
h5py
keras autoencoder tutorial
tensorflow (tensorflow-gpu or tensorflow*)
- *tensorflow-gpu worked on version 1.14 with cudnn version 10.0

Structure

Extract mzxml/mzml files for MS2 data
Stitch all extracted data files (.npz) into HDF5 file (.hdf5)
Train autoencoder, deep autoencoder, convolutional neural network,... variational autoencoder, LSTM
Evaluate and predict test data on models
Achieve spectra upscaling/denoising

1. Extract mzxml

In MS2-Autoencoder/bin/main.py import extract_mzxml as em
The else statement in main.py is the entire top to bottom flow of mzxml data extraction
This step should be run on the cluster with nohup and NextFlow to gather all of the data
The Makefile includes functions (instructions) for NextFlow to run main.py on all QExactive data on GNPS(Nov/2019)
This step outputs several files per input mzXML/mzML. This includes ready_array.npz, which includes metadata about the spectra pair, and ready_array2.npz, which includes the actual vector'd data.

2. Stitch .npz into .hdf5

Use SCP to transfer extracted outdirs from cluster to local (advised that .json files are rm -r from outdir)
- only ready_array2.npz or a .npz file is needed for stitching
In MS2-Autoencoder/bin/processing.py
Specify path to the parent directory of all outdirs, specify name of the data file (e.g. 'ready_array2.npz' if we want to merge all the actual paired spectra vectors)
processing.py will concatenate all .npz; it will output two .hdf5 files
1. Autoencoder structured dataset
2. Convolution neural network 1D structured dataset

3. Train models

Model architecture is outlined in ms2-autoencoder.py, ms2-conv1d.py, ms2-deepautoencoder.py
Generators, training, evaluating, predicting, and all model architectures are in ms2_model.py
In train_models.py import ms2_model.py
Trained models are saved as .h5 with architeture and weights
Models training function is built on tensorflow-gpu with gpu memory allocation and session declaration
Model training can be done on local or cluster machine

4. Evaluate and Predict models

Jupyter/keras load validate.ipynb is the Jupyter Notebook for loading models and visualizating predictions
Models prediction function is built on tensorflow-gpu with gpu memory allocation and session declaration

5. Spectra denoising

Hopefully cosine proximity is closer to 1.0 than 0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MS2-Autoencoder

Tools

Imports

Structure

1. Extract mzxml

2. Stitch .npz into .hdf5

3. Train models

4. Evaluate and Predict models

5. Spectra denoising

Files

README.md

Latest commit

History

README.md

File metadata and controls

MS2-Autoencoder

Tools

Imports

Structure

1. Extract mzxml

2. Stitch .npz into .hdf5

3. Train models

4. Evaluate and Predict models

5. Spectra denoising