Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
rnmitchell authored Apr 2, 2020
1 parent 5b2f707 commit 813dbd7
Showing 1 changed file with 34 additions and 8 deletions.
42 changes: 34 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,46 @@
lusSTR is a tool written to convert NGS sequence data to different annotation types for forensic STR loci.
# lusSTR

These python scripts have been written for use with the 27 autosomal STR loci from the ForenSeq panel and the sequence range output by the ForenSeq Universal Analysis Software (UAS).
lusSTR is a tool written in Python to convert NGS sequence data of forensic STR loci to different annotation types for ease in downstream analyses.

The format_UAS_output.R script requires the Sample Details Report output directly from the UAS software. The R script removes unnecessary rows/columns and outputs a table in CSV format containing the following columns:
This Python package has been written for use with the 27 autosomal STR loci from the ForenSeq panel and the sequence range output by the ForenSeq Universal Analysis Software (UAS).

## Installation

For best results, install from bioconda.
```
conda install -c bioconda lusstr
```
Contributors/developers who want to create a dedicated environment on their machine can do so:
```
conda create --name lusSTR -y python=3.7 pandas
git clone https://www.github.com/bioforensics/lusSTR.git
cd lusSTR
make devenv
```

## Usage

lusSTR accomodates two different input formats:
(1) UAS Sample Details Report in .xlsx format
(2) Sample(s) sequences in CSV format; first four columns must be Locus, NumReads, Sequence, SampleID; Optional last two columns can be Project and Analysis IDs.

If inputting the UAS Sample Details Report, the user must first invoke the ```format``` command to extract necessary information from the UAS Sample Details Report and format for the ```annotate``` command. The user must specify the input file as well an output file:
```
lusstr format <input> -o <output>
```
The ```format``` command removes unnecessary rows/columns and outputs a table in CSV format containing the following columns:
* Locus
* Number of Reads observed with the specified sequence
* Sequence
* Sample ID
* Project ID (if provided)
* Analysis ID (if provided)

The STR_annotation.py script is run on the output from the above R script (or any provided .csv files in the correct format) and currently outputs a table with the following columns:
This CSV file format is required for the ```annotate``` command:
```
lusstr annotate <input> -o <output>
```
The ```annotate``` command produces a table with the following columns:
* Sample ID
* Project ID (if provided)
* Analysis ID (if provided)
Expand All @@ -23,10 +53,6 @@ The STR_annotation.py script is run on the output from the above R script (or an
* LUS: Longest uninterrupted stretch
* LUS+: annotation combining multiple annotations including traditional STR allele designation, LUS, secondary motif (if applicable) and tertiary motif (if applicable)
* Reads: number of reads observed with the specified sequence


The provided shell script (submit_for_annotation.sh) is to be run in the directory containing either .xlsx files (direct from the UAS software) or the provided .csv files.

User should copy the scripts to a directory in their `$PATH`.

lusSTR is still under development and any suggestions/issues found are welcome!

0 comments on commit 813dbd7

Please sign in to comment.