Skip to content

FCS adaptor troubleshooting

Eric Tvedte edited this page Apr 17, 2024 · 2 revisions

Please check the GitHub Issues page to see whether similar issues have been reported.

Please check genome sequence formatting requirements to see whether an invalid FASTA could be the source of the error.

At what stage in the genome assembly process should I run FCS-adaptor?

We recommend running FCS-adaptor after the intitial contig assembly stage. If you are planning to submit the genome to NCBI or another public archive, we also recommend running FCS-adaptor on the final assembly prior to submission. In rare cases tandem adaptors are not discovered in a single FCS-adaptor run; we recommend re-screening if contamination is identified.

Can FCS-adaptor run on sequencing reads?

FCS-adaptor is developed to operate on assembled genomes and is not intended to replace adaptor trimming on reads. There are various tools for read trimming; tools that are specialized for sequencing technologies may have a more expansive adaptor catalogue relative to FCS-adaptor.

What adaptors are included in the FCS-adaptor database? Sequences with the pattern >gnl|uv|NGB* from the UniVec representation list correspond to adaptor/adapter/primer sequences screened by FCS-adaptor. These synthetic sequences can be retrieved from NCBI FTP.

I ran FCS-adaptor on my genome, why is NCBI still reporting adaptor contamination?

This could occur due to multiple reasons. If you ran FCS-adaptor for a single iteration, it is possible that tandem adaptors were not detected. Another common reason is that users assume the FCS-adaptor output file cleaned_sequences/*.fa.gz has all adaptor contamination removed. However, internal adaptors require the user to make a decision on masking vs. splitting and clean the genome with fcs.py screen genome to clean the sequences. See Interpreting Outputs for details.

The FCS-adaptor hits don't match my sequencing library...

Sequencing primers/adaptors from multiple vendors can contain shared subsequences. It is possible that the top reported hit originates from a similar library prep kit from a different vendor. The adaptor/adapter/primer sequences starting with >gnl|uv|NGB* on NCBI FTP may reveal shared sequence content.

I believe FCS-adaptor is reporting false positive contamination in my genome...

Please report any concerns with false positive adaptor results on the GitHub Issues page.

What files are important for debugging purposes?

The validate_fasta.txt file can reveal invalid FASTA issues for input genomes. The fcs_adaptor.log can reveal other issues with the FCS-adaptor run. When submitting a GitHub Issue, include fcs_adaptor.log and run in --debug mode.

Technical Information

FCS-adaptor is a reimplementation of NCBI VecScreen for general public use. See About VecScreen for information regarding BLAST search parameters and score filtering used in the adaptor report generation.

Clone this wiki locally