For our goal to simplify the detection of quantitative trait loci (QTL) with the use of WGS data based on the bulk segregant analysis (BSA) method, we developed a software tool named QTLspyer. We designed a user-friendly graphical interface using R shiny. The QTL detection process was divided into variant calling and QTL finding. In the first step, a Python script is used to call single nucleotide variants (SNVs) with the Genome Analysis Toolkit (GATK). In the second step, the probabilities of potential QTL findings are estimated based on the G' method and QTL-seq method approach using an R library called QTLseqr. Results are presented to the user as data tables of SNPs and QTLs statistical properties and graphically with plots showing QTL probabilities for all genome positions. The application together with all required tools is contained inside a Docker image. We demonstrated the accuracy of the used approach with the re-analysis of datasets from published studies.

How to build

The app can mainly be build using two methods. The first method is to save and execute the appropriate script inside the designated folder. The second method is to clone this repository to a local computer and build the docker image using the command docker build -t hudogriz/qtl_spyer:latest .. The command should be executed at the same location as the Dockerfile. We recommend the first method. Both methods require Docker.

To simplify the process, scripts for starting the app on Windows 10 and Linux have been provided.


  1. (Optional) Install Docker Desktop for Windows.
  2. Create a designated folder (recommended: Use lower cases for the name and create it at C:).
  3. Download Run_qtlspyer_on_windows10.bat from here
  4. Place Run_qtlspyer_on_windows10.bat into the created folder.
  5. Run Run_qtlspyer_on_windows10.bat as a Administrator.
  6. (Optional) Redirect your internet browser to localhost:3838.


  1. (Optional) Install Docker for Linux.
  2. Create a designated folder.
  3. Download from here.
  4. Make the script executable with chmod +x
  5. Run
  6. (Optional) Redirect your internet browser to localhost:3838.


  1. Download zip or pull the repository with git clone [email protected]:HudoGriz/QTLspyer.git.
  2. Build the Docker image with docker build -t hudogriz/qtl_spyer:latest ..
  3. Lunch the container with docker run -d --rm --init -p 3838:3838 --name qtl_spy -v $(pwd)/QTLspyer/:/QTLspyer hudogriz/qtl_spyer:latest.
  4. (Optional) For Windows use docker run -d --rm --init -p 3838:3838 --name qtl_spy -v %~dp0\QTLspyer\:\QTLspyer hudogriz/qtl_spyer:latest.
  5. Redirect your browser to localhost:3838.

Variant calling tab

The side menu enables switching between the app tabs.

Variant calling tab

Tab for customization and execution of the variant calling pipelines.

FastQC report tab

Tab where created FastQC reports can be viewed.

VCF filtering tab

Tab for filtering SNPs.

Data visualization tab

Tab for running QTL estimations. Below, the results are presented as interactive plots.

Variant calling tab

Tables with numerical results.

Variant calling tab

Plot showing p-values for genome positions per chromosome. The reference data is from a QTL study done on yeast by Pačnik et al. (2021).



The app consist of Python and R code. The variant calling pipelines are created using Python. The command lines for running the boinfromatics tools are defined as methods inside a class.

Folder structure

QTLspyer                            # Root project folder                   
├── shiny                           # Code for the R shiny application
├── variant_calling                 # Python scripts for variant calling pipeline
├── log                             # Process status reports and standard outputs from tools
├── input
│   ├── adapters                    # Expects adapters for sequence trimming (.fasta)
│   ├── annotation                  # Expects genome annotation (.gtf)
│   ├── references                  # Expects genome references (.fasta & .vcf)
│   └── sample_data                 # Expects sample read sequences (.fastq)
└── output                          # Output files
    ├── aligned                     # Aligned sequences (.bam)
    ├── fastqc                      # FastQC Quality reports (.html & .zip)
    ├── trimmed                     # Trimmed read sequences (.fasqc)
    └── GATK                        # Output of GATK tools
        ├── VCFs                    # Germline Variant files and their indexes (.vcf & .tbi)
        ├── nonfiltered             # Unfiltered SNPs selected out of variants and their indexes (.vcf & .idx)
        └── tables                  # VCF files transformed to tables (.snps.table)

Requirements and limitations

The most intensive step is the execution of the variant calling pipeline. The recommended specs are above 8GB of RAM. The minimum is 4GB. A CPU that allows multithreading is recommended. Benchmarking was performed on a computer with 32GB of RAM and a Intel i7 CPU (3.60Hz & 8 threads). Processing 1-3 GB .fastq samples took 2-4 hours. Samples between 5-10 GB can reach up to 15 hours. Make sure to have enough space on the hard drive. For every GB of input data we can expect on average 7 GB of output data.


Reference study:

Pačnik K., Ogrizović M., Diepold M., Eisenberg T., Žganjar M., Žun G., Kužnik B., Gostinčar C., Curk T., Petrovič U., Natter K. (2021) Identification of novel genes involved in neutral lipid storage by quantitative trait loci analysis of Saccharomyces cerevisiae. BMC Genomics, 22, 1: 110.


To my mentors Assist. Prof. Dr. Cene Gostinčar and Dr. Janez Kokosar.
To Prof. Dr. Uroš Petrovič for guidance.
To Dr. Roman Luštrik for his help with technical matters.
To Ana Markež for being a persistent alpha tester.

Special thanks to Genialis for resolwebio/rnaseq and cooperation in this project.


MIT License


