Library Format Specification

JSON Format

A nimble library is a valid JSON file. The top-level of the file is an array containing two JSON objects.

Aligner Configuration

The first object contains the aligner configuration:

{
  "score_threshold": number,
  "score_filter": number,
  "num_mismatches": number,
  "discard_multiple_matches": boolean,
  "intersect_level: number",
  "group_on": string,
  "discard_multi_hits": number,
  "require_valid_pair": boolean,
  "data_type": string,
  "filters": array<string>
}

score_threshold: controls the score an alignment needs to reach to be considered a match. For perfect matches, set this value equal to the length of the reads being aligned to the reference library.
score_filter: sets a lower boundary on the number of matches needed on a reference before it is reported. For instance, if you set "score_filter": 25, no reference with less than 25 matches will be reported in the output.
num_mismatches: sets the allowable number of mismatches during alignment.
discard_multiple_matches: flag for whether a read that matches multiple references should be counted. If true, a read that matches multiple references will count toward the scores of all of those references. If false, the read's matches are discarded.
intersect_level: controls logic behind how to count matches during alignment. There are three intersect levels. intersect_level: 0 takes the best matches from either the read or reverse read, determined by alignment score. intersect_level: 1 takes the intersection between the read and reverse read -- if there is no intersection, it defaults to the best match. intersect_level: 2 takes the intersection and reports no match if there is no intersection.
group_on: if this is set to the name of a header in the reference metadata file, the output results.tsv will be filtered to that level of specificity. For instance, if you've added a column with lineage information under a header called "lineage", setting "group_on": "lineage" will report lineage-level information, rather than the default case of allele-level information. If a single read matches onto the group_on category more than once during alignment (for instance, if a read matches multiple alleles in the same lineage and you're grouping on lineage), it will only count as one match. If group_on is unset, allele-level information is returned.

Reference Metadata

The second object is the reference metadata:

{
  "headers": ["reference_genome", "sequence_name", "nt_length", "sequence", ...]
  "columns": [[...], [...], [...], [...], ...]
}

This object contains a headers field and a columns field. headers is an array of strings that label the corresponding column in the columns field. The aligner must have at least reference_genome, sequence_name nt_length, and sequence headers, along with their corresponding columns.

reference_genome: string data about which genome the read is from
sequence_name: name of the read
nt_length: length of the sequence data
sequence: RNA string

The columns field is a multidimensional array of strings. Each sub-array corresponds to a header in the headers field.

To add another header/column pair (e.g. to add per-allele lineage or locus information), add a string to the headers array and add a column to the corresponding index in the columns field. However, you shouldn't need to directly edit this object -- nimble generate has several convenient options for adding additional metadata to libraries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Library Format Specification

JSON Format

Aligner Configuration

Reference Metadata

Clone this wiki locally