-
Notifications
You must be signed in to change notification settings - Fork 15
FCS adaptor output
Expected outputs for run_fcsadaptor.sh
:
-
cleaned_sequences/*.fa.gz
: cleaned sequences file. -
combined.calls.jsonl
: final FCS-adaptor report (JSON file format). -
fcs.log
: auto-generated, empty file. -
fcs_adaptor.log
: log file for the FCS-adaptor run -
fcs_adaptor_report.txt
: final FCS-adaptor report (TSV file format). -
logs.jsonl
: auto-generated, empty file. -
pipeline_args.yaml
: YAML file format of parameters specified for FCS-adaptor run (BLAST db, input FASTA) -
skipped_trims.jsonl
: JSON file format of internal adaptor hits skipped by cleanup. -
validate_fasta.txt
: report of any formatting issues with input FASTA. empty if input FASTA is valid
Expected outputs for fcs.py clean genome
:
- FCS-adaptor cleaning report (printed to console): Information about the contamination cleaning actions taken.
- Separated cleaned and contaminated sequences: Two FASTA files corresponding to the cleaned and contaminated sequence set.
A final report of recommended actions from FCS-adaptor is provided in the file fcs_adaptor_report.txt
.
The following table illustrates column numbers (first column) with corresponding column headers (second column):
1: accession seq_00001
2: length 230276
3: action ACTION_TRIM
4: range 1..58
5: name CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB00360.1:Illumina PCR Primer
- Column 1: A seq-id (sequence ID) for a whole sequence, as found in the input FASTA.
- Column 2: Length of the entire sequence in Column 1. Only a portion may be identified as contaminant, according to the range column.
- Column 3: The recommended action. Action values are as follows:
- ACTION_EXCLUDE: Remove the entire sequence.
- ACTION_TRIM: Remove the sequence at the beginning or end of the sequence.
- Column 4: Start and end coordinates for the identified contamination. If only a portion of the sequence is identified as contaminant, these values indicate the range that should be removed.
- Column 5: The matched synthetic sequence identified by FCS-adaptor. See here for the sequences contained in the FCS-adaptor database.
FCS-adaptor uses the following rules to determine calls in fcs_adaptor_report.txt
:
- If adaptors are found at the beginning or end of the sequence, the matching span is reported as "ACTION_TRIM," and is removed in the
cleaned_sequences/*.fa.gz
output. - If adaptors are found within 100 bp of either end of the sequence, the span to trim is extended to the end of the contig. If additional adaptors are found within 100 bp of the proposed trim range, then the trim span is transitively extended to cover the additional hits. These spans are reported as "ACTION_TRIM," and are removed in the
cleaned_sequences/*.fa.gz
output. - If adaptors are found at greater than 100 bp from either end of the sequence, the matching span is reported as “ACTION_TRIM,” but the internal span is not removed in the
cleaned_sequences/*.fa.gz
output. - If adaptors are found at greater than 100 bp from either end of the sequence but 50 bp or less from each other, the spans are joined and reported as “ACTION_TRIM,” but the internal span is not removed in the
cleaned_sequences/*.fa.gz
output. - If more than 75% of the sequence matches the adaptors, the whole sequence is reported as “ACTION_EXCLUDE,” and is removed in the
cleaned_sequences/*.fa.gz
output. - If less than 200 bp of the sequence remains unmatched to the adaptors, the whole sequence is reported as “ACTION_EXCLUDE,” and is removed in the
cleaned_sequences/*.fa.gz
output.
Use fcs.py clean genome
as described in the FCS-adaptor Quickstart to automatically clean all adaptor contaminant spans and see Separated cleaned and contaminated sequences for information on how fcs.py clean genome
handles adaptor report calls.
A successful fcs.py clean genome
run will print the summary of cleaning actions:
Applied 11 actions; 522 bps dropped; 0 bps hardmasked.
fcs.py clean genome
performs the following actions on FCS-adaptor reports to separate "clean" from "contaminated" sequences:
-
ACTION_EXCLUDE : whole sequences are removed in
clean.fasta
, sent tocontam.fasta
. -
ACTION_TRIM : beginning or end of sequence is removed in
clean.fasta
, not sent tocontam.fasta
. -
FIX : internal contamination range is masked in
clean.fasta
at the range defined by start-pos>end-pos. This action is not defined automatically byrun_fcsadaptor.sh
and must be substituted by the user for internal ACTION_TRIM ranges where appropriate. Not sent tocontam.fasta
. -
SPLIT :
clean.fasta
is split at the internal contamination range defined by start-pos>end-pos. This action is synonymous with internal ACTION_TRIM ranges.
Please create an Issue if you encounter any problems.
For all other questions or comments, please contact us at [email protected]
-
FCS-adaptor
-
FCS-GX
-
Setting up FCS in the cloud
-
FCS in Galaxy