Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fehler in .io_bam(.count_bamfile, file, param = param) : seqlevels(param) not in BAM header #1

Open
hoelzer opened this issue Nov 4, 2024 · 1 comment

Comments

@hoelzer
Copy link

hoelzer commented Nov 4, 2024

Hey,

I tried the script on my data set (which is a fragmented bacterial genome extracted from a metagenome).

Rscript --vanilla plot_BAM_density.R -f ../GCA_012799365.1_ASM1279936v1_genomic.fna -b ../GCA_012799365.1_ASM1279936v1_genomic.illumina.sorted.bam -o ../GCA_012799365-baja-coverage

but I get:

Fehler in .io_bam(.count_bamfile, file, param = param) :
  seqlevels(param) not in BAM header:
    seqlevels: 'JAAZLJ010000085.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 100867_AS23, whole genome shotgun sequence', 'JAAZLJ010000016.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 10111_AS23, whole genome shotgun sequence', 'JAAZLJ010000086.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 103774_AS23, whole genome shotgun sequence', 'JAAZLJ010000087.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 104241_AS23, whole genome shotgun sequence', 'JAAZLJ010000089.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105954_AS23, whole genome shotgun sequence', 'JAAZLJ010000090.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 108066_AS23, whole genome shotgun sequence', 'JAAZLJ010000017.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 11411_AS23, whole genome shotgun sequence', 'JAAZLJ010000099.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 123082_AS23, whole
Ruft auf: plot_BAM_density ... kpPlotBAMDensity -> <Anonymous> -> <Anonymous> -> .io_bam
Ausführung angehalten

I should probably mention that I am trying on a MacBook...

I am unsure, but maybe the R script does not extract the FASTA headers correctly and then they don't match between the FASTA and BAM.

❯ samtools view -H GCA_012799365.1_ASM1279936v1_genomic.nanopore.sorted.bam
@HD	VN:1.6	SO:coordinate
@SQ	SN:JAAZLJ010000085.1	LN:27319
@SQ	SN:JAAZLJ010000016.1	LN:14213
@SQ	SN:JAAZLJ010000086.1	LN:12008
@SQ	SN:JAAZLJ010000087.1	LN:64549
@SQ	SN:JAAZLJ010000088.1	LN:9320
@SQ	SN:JAAZLJ010000089.1	LN:51768
...
❯ grep ">" GCA_012799365.1_ASM1279936v1_genomic.fna
>JAAZLJ010000085.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 100867_AS23, whole genome shotgun sequence
>JAAZLJ010000016.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 10111_AS23, whole genome shotgun sequence
>JAAZLJ010000086.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 103774_AS23, whole genome shotgun sequence
>JAAZLJ010000087.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 104241_AS23, whole genome shotgun sequence
>JAAZLJ010000088.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105800_AS23, whole genome shotgun sequence
>JAAZLJ010000089.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105954_AS23, whole genome shotgun sequence
>JAAZLJ010000234.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 107784_AS23, whole genome shotgun sequence
@hoelzer
Copy link
Author

hoelzer commented Nov 7, 2024

Works when I do

awk '{print $1}' GCA_012799365.1_ASM1279936v1_genomic.fna > GCA_012799365.1_ASM1279936v1_genomic.fna.renamed

and use this as reference.

However, I think a user might often end up in a difference between the IDs of the ref and the BAM depending on the used mapper. It might be good to implement ID mapping always only until the first white space in the FASTA ID...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant