-
Notifications
You must be signed in to change notification settings - Fork 15
FCS adaptor quickstart
FCS-adaptor detects adaptor and vector contamination in genome sequences. This tool is one module within the NCBI Foreign Contamination Screening (FCS) program suite.
We recommend running FCS-adaptor after the initial contig assembly and on the final assembly prior to GenBank submission. If additional valid contaminants are identified in the final assembly, we recommend re-screening after contaminant removal.
FCS-adaptor operates in three main steps:
- BLAST alignment to reference database
- Generate contaminant cleaning actions
- Clean the genome
- Prerequisites
- Downloading FCS-adaptor
- Screen the genome
- Clean the genome
- Usage examples
- Input
- Output
- Troubleshooting
- Docker or Singularity The current Singularity image is made using version 3.4.0.
- Any general-purpose host should be sufficient for execution.
- A genome assembly in FASTA format.
- Retrieve the
run_fcsadaptor.sh
runner script:curl -LO https://github.com/ncbi/fcs/raw/main/dist/run_fcsadaptor.sh
- Change the permissions of
run_fcsadaptor.sh
chmod 755 run_fcsadaptor.sh
- If using Singularity, retrieve the FCS-adaptor
.sif
:curl https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/latest/fcs-adaptor.sif -Lo fcs-adaptor.sif
- Set
--prok
(prokaryotes) or--euk
(eukaryotes) depending on the source organism. - Run FCS-adaptor:
- Using Docker:
mkdir outputdir ./run_fcsadaptor.sh --fasta-input h_sapiens.fa.gz --output-dir ./outputdir --euk
- Using Singularity:
mkdir outputdir ./run_fcsadaptor.sh --fasta-input h_sapiens.fa.gz --output-dir ./outputdir --euk --container-engine singularity --image fcs-adaptor.sif
-
Retrieve the
fcs.py
runner script:curl -LO https://github.com/ncbi/fcs/raw/main/dist/fcs.py
-
If using Singularity, also download the FCS-GX sif file:
curl https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/releases/latest/fcs-gx.sif -Lo fcs-gx.sif export FCS_DEFAULT_IMAGE=fcs-gx.sif
-
Perform cleaning actions on input genome. By default this will split contigs at internal
ACTION_TRIM
locations. Modify theaction
column toFIX
for locations you wish to mask instead:zcat h_sapiens.fa.gz | python3 ./fcs.py clean genome --action-report ./outputdir/fcs_adaptor_report.txt --output clean.fasta --contam-fasta-out contam.fasta
⚠️ FCS-adaptor currently produces acleaned_sequences/*.fa.gz
containing a cleaned FASTA where whole contaminant sequences assignedACTION_EXCLUDE
, or adaptors from ends of sequences assignedACTION_TRIM
are removed. Internal adaptor sequences are not automatically cleaned byrun_fcsadaptor.sh
. When internal adaptor hits are present, users are responsible for determining whether splitting or masking on internal adaptors withfcs.py clean genome
is more appropriate. See Interpreting Outputs for additional information.
Test that FCS-adaptor is operating normally on a small FASTA file:
-
Download the test FASTA:
curl -LO https://zenodo.org/records/10932013/files/FCS_combo_test.fa
-
Screen the genome:
mkdir outputdir ./run_fcsadaptor.sh --fasta-input FCS_combo_test.fa --output-dir ./outputdir --euk
A successful FCS-adaptor run will print the log to console, ending with:
[workflow ] completed success Output will be placed in: /output-volume Executing the workflow run_av_screen_x run_av_screen_x
The output directory will contain the following files:
cleaned_sequences/FCS_combo_test.fa combined.calls.jsonl fcs.log fcs_adaptor.log fcs_adaptor_report.txt logs.jsonl pipeline_args.yaml skipped_trims.jsonl validate_fasta.txt
The output from this example
fcs_adaptor_report.txt
should match this file. -
Clean the genome:
cat FCS_combo_test.fa | python3 ./fcs.py clean genome --action-report ./outputdir/fcs_adaptor_report.txt --output clean.fasta --contam-fasta-out contam.fasta
By default this will trim 5 sequences (terminal
ACTION_TRIM
), split 5 sequences (internalACTION_TRIM
), and exclude 1 sequence (ACTION_EXCLUDE
):Applied 11 actions; 522 bps dropped; 0 bps hardmasked.
Confirm the cleaning actions:
grep seq_00001 fcs_adaptor_report.txt seq_00001 230276 ACTION_TRIM 1..58 CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB00360.1:Illumina PCR Primer grep seq_00001 clean.fasta >seq_00001~59..230276 grep seq_00006 fcs_adaptor_report.txt seq_00006 270219 ACTION_TRIM 100001..100058 CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB00360.1:Illumina PCR Primer grep seq_00006 clean.fasta >seq_00006~100059..270219 >seq_00006~1..100000
Please create an Issue if you encounter any problems.
For all other questions or comments, please contact us at [email protected]
-
FCS-adaptor
-
FCS-GX
-
Setting up FCS in the cloud
-
FCS in Galaxy