Fehler in .io_bam(.count_bamfile, file, param = param) : seqlevels(param) not in BAM header #1

hoelzer · 2024-11-04T14:50:13Z

Hey,

I tried the script on my data set (which is a fragmented bacterial genome extracted from a metagenome).

Rscript --vanilla plot_BAM_density.R -f ../GCA_012799365.1_ASM1279936v1_genomic.fna -b ../GCA_012799365.1_ASM1279936v1_genomic.illumina.sorted.bam -o ../GCA_012799365-baja-coverage

but I get:

Fehler in .io_bam(.count_bamfile, file, param = param) :
  seqlevels(param) not in BAM header:
    seqlevels: 'JAAZLJ010000085.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 100867_AS23, whole genome shotgun sequence', 'JAAZLJ010000016.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 10111_AS23, whole genome shotgun sequence', 'JAAZLJ010000086.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 103774_AS23, whole genome shotgun sequence', 'JAAZLJ010000087.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 104241_AS23, whole genome shotgun sequence', 'JAAZLJ010000089.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105954_AS23, whole genome shotgun sequence', 'JAAZLJ010000090.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 108066_AS23, whole genome shotgun sequence', 'JAAZLJ010000017.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 11411_AS23, whole genome shotgun sequence', 'JAAZLJ010000099.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 123082_AS23, whole
Ruft auf: plot_BAM_density ... kpPlotBAMDensity -> <Anonymous> -> <Anonymous> -> .io_bam
Ausführung angehalten

I should probably mention that I am trying on a MacBook...

I am unsure, but maybe the R script does not extract the FASTA headers correctly and then they don't match between the FASTA and BAM.

❯ samtools view -H GCA_012799365.1_ASM1279936v1_genomic.nanopore.sorted.bam
@HD	VN:1.6	SO:coordinate
@SQ	SN:JAAZLJ010000085.1	LN:27319
@SQ	SN:JAAZLJ010000016.1	LN:14213
@SQ	SN:JAAZLJ010000086.1	LN:12008
@SQ	SN:JAAZLJ010000087.1	LN:64549
@SQ	SN:JAAZLJ010000088.1	LN:9320
@SQ	SN:JAAZLJ010000089.1	LN:51768
...

❯ grep ">" GCA_012799365.1_ASM1279936v1_genomic.fna
>JAAZLJ010000085.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 100867_AS23, whole genome shotgun sequence
>JAAZLJ010000016.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 10111_AS23, whole genome shotgun sequence
>JAAZLJ010000086.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 103774_AS23, whole genome shotgun sequence
>JAAZLJ010000087.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 104241_AS23, whole genome shotgun sequence
>JAAZLJ010000088.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105800_AS23, whole genome shotgun sequence
>JAAZLJ010000089.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105954_AS23, whole genome shotgun sequence
>JAAZLJ010000234.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 107784_AS23, whole genome shotgun sequence

hoelzer · 2024-11-07T13:19:53Z

Works when I do

awk '{print $1}' GCA_012799365.1_ASM1279936v1_genomic.fna > GCA_012799365.1_ASM1279936v1_genomic.fna.renamed

and use this as reference.

However, I think a user might often end up in a difference between the IDs of the ref and the BAM depending on the used mapper. It might be good to implement ID mapping always only until the first white space in the FASTA ID...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fehler in .io_bam(.count_bamfile, file, param = param) : seqlevels(param) not in BAM header #1

Fehler in .io_bam(.count_bamfile, file, param = param) : seqlevels(param) not in BAM header #1

hoelzer commented Nov 4, 2024

hoelzer commented Nov 7, 2024

Fehler in .io_bam(.count_bamfile, file, param = param) : seqlevels(param) not in BAM header #1

Fehler in .io_bam(.count_bamfile, file, param = param) : seqlevels(param) not in BAM header #1

Comments

hoelzer commented Nov 4, 2024

hoelzer commented Nov 7, 2024