SQL Schema

SQL Schema for Summary Reports

The following image shows the format of the summary reports that are generated upon each serratus run:

Summary report example

The SQL Schema for accessing each summary report is made up of four tables: Runs, FamilySections, AccessionSections, and FastaSections.

Runs

'Runs' corresponds to the first line of the summary file, where the data for the SRA, reference genome, and date is present. This table has a one to many relationship with the three following tables, all linked by the SRA and auto-generated PK RunId.

FamilySections

'FamilySections' corresponds to the next section of the summary report, where the data for the pan-genome is present. The columns present on this table are as follows:

FamilySectionId: This is the PK for the table, autogenerated when entered into the database.
FamilySectionLineId: This is a number indicating the position of the family line in the summary file.
RunId: This is a FK linking back to the Runs table
Sra: This is also a FK linking back to the Runs table, added here for easier query construction
Family: This is the name of the family of the pan-genome that is being analyzed
Score: This is the score given for the quality of the alignment
PctId: This is the percent identity of the sequences aligned (wrt the reference genome)
Aln: This is the number of aligned reads
Glb: This is the number of global aligned reads
PanLen: This is the pangenome length
Cvg: This is the coverage cartoon generated, giving a picture of the quality of alignment throughout the specific sequence
Top: This is the top accession
TopAln: This is the top accession aligned reads
TopName: This the study name linking to the top accession

Overview

Architecture and Pipeline

Raw Data

Serratus Explorer (serratus.io)

Usage

Running Serratus
- Serratus-Lite, local
Finding Novel Viruses (tutorials)
Papers using Serratus
Containers
Summarizer usage
Cloud Budgeting
Serratus SQL Database Management
Data Policy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly