-
Notifications
You must be signed in to change notification settings - Fork 2
Genomic Features metrics
The Genomics Features metrics facet reports statistics regarding genomic features contained within a GFF file (e.g. GENCODE GFF). Currently, only counting records within the gene regions (intronic, exonic, intergenic) and exonic translation regions (five prime UTR, three prime UTF, coding sequence) is supported. The report is delivered at under the features
key within the results.json
file. You can easily examine the output of the general facet by using jq
:
cat results.json | jq .features
IMPORTANT: to enable this facet, you must provide the -f
/--feature-gff
flag! The facet is automatically enabled when this flag is provided. Otherwise, it is disabled and the value for features
in result.json
will be null
.
First, a interval lookup data structure is built using rust_lapper
to store all of the genomic features from the GFF. Next, each record is processed as follows:
- If the record is unmapped, the record is ignored.
- The cigar string is used to calculate the length of the record.
- The start and end genomic coordinates of the record are then used to find all features which intersect with the record.
- For records that fall within a 5' UTR, 3' UTR, or CDS, the appropriate counters are incremented. Note that a record may span more than one of these categories, but it will only increment each category by one at most.
- The intronic, exonic, and intergenic counters are incremented appropriately for the record. Here, records are classified as one and only one category:
- If the record falls within a gene and within an exon, it's classified as exonic.
- If the record falls within a gene but outside an exon, it's classified as intronic.
- If the record falls outside a gene, it's classified as intergenic.
This facet has the following top-level keys,
Key | Description |
---|---|
exonic_translation_regions |
Contains metrics about the which exonic translation regions a record overlaps with. |
gene_regions |
Contains metrics related to simple record counting for this facet. Includes details on how many records were processed versus how many were ignored (typically due to the insert size being out of range of the histogram). |
records |
Contains statistics regarding how many records were processed, how many records were ignored, and for what reasons. |
summary |
Contains summary statistics regarding this QC facet, most notably percentages regarding how many records were ignored and for what reasons. |
Contains metrics about the which exonic translation regions a record overlaps with. Namely, a record can overlap with an untranslated region (either 5' or 3' end) or a coding sequence.
Contains the counts for exonic, intronic, and intergenic records. Note, as the description above outlines, records are currently classified as one and only one category for this metric.
- If any part of the record falls within a gene and within an exon, it's classified as exonic.
- If any part of the record falls within a gene but outside an exon, it's classified as intronic.
- If the record falls outside a gene, it's classified as intergenic.
Contains metrics regarding how many records were processed, how many records were ignored, and for what reason were records ignored.
Contains summary statistics about which records were ignored and why.
-
Subcommands
ngs convert
ngs derive
ngs generate
ngs index
ngs list
ngs plot
-
ngs qc
- Record-based Facets
- Sequence-based Facets
ngs view
- Development