Skip to content

Reproduce Results

Ken Nakatsu edited this page Aug 19, 2023 · 8 revisions

Overview

The results presented in the paper can be broken down into the following parts:

  1. miRNA benchmark
  2. MSA
  3. Conserved fragments
  4. Multi-species benchmark

MSA

Purpose: Show how multi-mapping events and genome duplications can be used to elucidate potentially evolutionarily conserved sequences of importance.

  1. Run the pipeline and generate the counts table.
  2. Search in the ref_table file for the license plate ID and the associated merged cluster id, here we will call it X.
  3. Generate MSA
python analyze_srnafrag.py create_msa_plot gtf filtered_counts ref_table merged_id_X outdir
  1. Generate summary
python analyze_srnafrag.py create_msa_summary gtf filtered_counts ref_table outdir
  1. Generate summary with annotated 5' and 3' ends.
python analyze_srnafrag.py create_msa_summary_not_end gtf filtered_counts ref_table outdir
  1. In your favorite graphical editor of your choice, generate relative percentages of each proportion.
  2. Generate pie charts.
  3. Stitch together with graphical processing software.

miRNA benchmark

Purpose: The purpose of the miRNA benchmark was to show that peak calling can accurately call start and end positions and that the pipeline could work with reasonable accuracy. We note in the paper that the peaks that are miscalled are miscalled because there are "false peaks" that do not correspond to the miRbase annotation.

  1. Preparation of the miRNA annotation is done and is in the "pub_fig" directory.
  2. Copy and paste config parameters. Note that this is needed because we changed some config files after the benchmarking. (Note: No changes to algorithm)
  3. Run the pipeline with this config.
  4. Open the R script file, miRNA_bench.R, in R studio.
  5. Based upon the command, organize a directory such that it can read the specified files (most of which are included in the pub_fig directory).
  6. A certain degree of manual calculation was used to calculate metrics. However, figures will be produced properly.

Conserved Fragments

Purpose: Showing the utility of the license plating method proposed by the authors of MINTMap.

  1. Download all samples for species from paper links.
  2. Process all with pipeline, according to default settings.
  3. Run the following command, except replace the human1, human2, with species name and associate these keys with the proper merged_counts.csv file.
python analyze_srnafrag.py compare_merged_hamming '{"human1":"/home/usr/data/merged_counts.csv", "human2":"/home/usr/data/run2/merged_counts.csv"}' comp_1.csv 3
  1. This should produce a heatmap and a table of merged_ids that are conserved.
  2. To obtain MFE secondary structures, run the following command. It will automatically highlight portion of the source transcript that the fragment originated from. *Ensure that you have ViennaRNA installed.
python analyze_srnafrag.py rna_fold merged_1 gtf_filtered.gtf annotated_ref_table.csv filtered_corrected_counts.csv cluster_peak_relationship.csv out/dir/location

Multi-species benchmark

In the basics module, there is a timing class with three functions. To time,

  1. Initialize the object.
timer = timing()
  1. Think of it as writing down the time on a stopwatch from the time the object is called. Take time of stopwatch by:
timer.take_time("name")
  1. Name it whatever you please. Outputs will be in the form of name:time, where each is a column.
  2. Place this command anywhere where the basics module is loaded.
  3. Analyze and convert data using package such as lubridate in R.

To generate the figures about out of space maps use R and tidyverse.

  1. Str_detect for each key word... i.e. tRNA -> tRNA, miR -> miRNA, snoDB -> snoRNA, etc...
  2. Count proportions.
Clone this wiki locally