Skip to content

Reproduce Results

Ken Nakatsu edited this page Aug 8, 2023 · 8 revisions

Overview

The results presented in the paper can be broken down into the following parts:

  1. miRNA benchmark
  2. Multi-species benchmark
  3. Conserved fragments

We will cover two sections, the miRNA benchmark and the conserved fragments. Multi-species time benchmark can be done by utilizing the time class in the basics.py file.

miRNA benchmark

Purpose: The purpose of the miRNA benchmark was to show that peak calling can accurately call start and end positions and that the pipeline could work with reasonable accuracy. We note in the paper that the peaks that are miscalled are miscalled because there are "false peaks" that do not correspond to the miRbase annotation.

  1. Preparation of the miRNA annotation is done and is in the "pub_fig" directory.
  2. Copy and paste config parameters. Note that this is needed because we changed some config files after the benchmarking. (Note: No changes to algorithm)
  3. Run the pipeline with this config.
  4. Open the R script file, miRNA_bench.R, in R studio.
  5. Based upon the command, organize a directory such that it can read the specified files (most of which are included in the pub_fig directory).
  6. A certain degree of manual calculation was used to calculate metrics. However, figures will be produced properly.

Conserved Fragments

Purpose: Showing the utility of the license plating method proposed by the authors of MINTMap.

  1. Download all samples for species from paper links.
  2. Process all with pipeline, according to default settings.
  3. Run the following command, except replace the human1, human2, with species name and associate these keys with the proper merged_counts.csv file.
python analyze_srnafrag.py compare_merged_hamming '{"human1":"/home/usr/data/merged_counts.csv", "human2":"/home/usr/data/run2/merged_counts.csv"}' comp_1.csv 3
  1. This should produce a heatmap and a table of merged_ids that are conserved.
  2. To obtain MFE secondary structures, run the following command. It will automatically highlight portion of the source transcript that the fragment originated from. *Ensure that you have ViennaRNA installed.
python analyze_srnafrag.py rna_fold merged_1 gtf_filtered.gtf annotated_ref_table.csv filtered_corrected_counts.csv cluster_peak_relationship.csv out/dir/location
Clone this wiki locally