Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastp not removing all Illumina universal adapter sequences as indicated by FastQC #558

Open
luckyvivi opened this issue Apr 15, 2024 · 5 comments

Comments

@luckyvivi
Copy link

Hi, I recently ran fastp on an Illumina dataset with the following command:
fastp -i SRR18278237.fastq.gz -o SRR18278237.fastp.gz -z 9 -l 15 -w 16 --dedup --dup_calc_accuracy 6 -x -3 --cut_mean_quality 20 -j SRR18278237.fastp.json -h SRR18278237.fastp.html

I expected that this command would remove the Illumina universal adapter sequences from the reads. However, after running FastQC on the output files, I'm still seeing a significant adapter content in the FastQC report, specifically towards the end of the reads (please see attached screenshot).
image

Could you please help me understand the following:

  1. Is there a possibility that fastp might not remove some of the adapter sequences under certain conditions?
  2. Do I need to specify the adapter sequences explicitly using the -a option, even though these are standard Illumina universal adapters?
  3. Is there anything in my fastp command that might have prevented the adapter sequences from being adequately detected and trimmed?

I have attached the JSON and HTML reports from fastp for your reference. I would greatly appreciate any insights or suggestions you might have to resolve this issue.

Thank you for your assistance and for developing such a useful tool.

Best regards,
Xiaowen
Uploading SRR18278237 (1).fastp.zip…

@luckyvivi
Copy link
Author

@nreid
Copy link

nreid commented May 14, 2024

I have a similar issue, but with Nextera adapters. fastp says no contamination, FastQC says nextera, up to 10% by the read end. Even when I supply the Nextera fasta file (the one provided by trimmomatic) virtually no trimming happens.

Trimmomatic with ILLUMINACLIP:"${ADAPTERS}":2:30:10 SLIDINGWINDOW:4:25 MINLEN:45 and drops 7.25% of all reads.

This isn't a perfect comparison, I think fastp default min window Q is 20, not 25, but still. Something seems off here. I'm using v0.23.2.

@realzhang
Copy link

Same problem. Any suggestion is welcome. Thanks!

@nreid
Copy link

nreid commented May 30, 2024

I switched back to fastqc/trimmomatic/fastqc. I'm removing fastp from my workflows.

There are also a couple concerning GitHub issues about reproducibility. I like the tool but I can't use it if these things aren't resolved.

@hp399
Copy link

hp399 commented Sep 18, 2024

Hi, there~I met a similar problem and I figured out an explanation myself which at least works for mine.

The possible reason that Fastp does not recoginze and remove the adapter while FastQC detects is that R1 reads are shorter than 150bp, which means the adapter in R1.fastq.gz detected by FastQC is actually the reversed and complementary adapter of R2. So, in this situation, if you want to remove the adapter in R1 via Fastp, specify the adapter sequence in Fastp command with "-a reversed_and_complementary_adapter_sequence_of_Read2". And if you want to remove the adapter in R2, use the sequence of reversed and complementary adapter of R1.

When you have a library shorter than 150bp, Sequencer will keep reading bases after finishing your inserts and continue to read the bases according to the adapter of the opposite strand. My guess is that FastQC can detect those widely-used adapters both reversed or not while Fastp can't, which means Fastp can only auto-detect those widely-used adapters literally based on the sequences given.

I would suggest to play with Fastp with the sequence of the other strand adapter. Or you can simply extract some reads sequence and analyze it manually, to find where the adapter is and what actual it is.

Please feel free to let me know if I didn't make it clear or if it works for you. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants