Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entire chromosome outputs as high signal #32

Open
Pooran-Dewari opened this issue Nov 19, 2022 · 7 comments
Open

Entire chromosome outputs as high signal #32

Pooran-Dewari opened this issue Nov 19, 2022 · 7 comments

Comments

@Pooran-Dewari
Copy link

Pooran-Dewari commented Nov 19, 2022

Entire chr29 as High Signal Region

I have created mappability file for chr29 (smallest chromosome for quicker analysis) with kmer 100 and 150, and used final uint output file of combine_umaps
However, it spits out entire chromosome as high signal region, not sure if this is because of low number of inputs I have provided (5 inputs).

Blacklist was installed using https://anaconda.org/bioconda/encode-blacklist

$ Blacklist chr29
chr29   0       43051100        High Signal Region

Directory structure below

$ tree
.
├── input
│   ├── Ss1_final_chr29.bam
│   ├── Ss1_final_chr29.bam.bai
│   ├── Ss2_final_chr29.bam
│   ├── Ss2_final_chr29.bam.bai
│   ├── Ss3_final_chr29.bam
│   ├── Ss3_final_chr29.bam.bai
│   ├── Ss4_final_chr29.bam
│   ├── Ss4_final_chr29.bam.bai
│   ├── Ss5_final_chr29.bam
│   └── Ss5_final_chr29.bam.bai
└── mappability
    └── chr29.uint8.unique

Contents of uint file # combined kmer 100 and 150

$ od -t x1 mappability/chr29.uint8.unique | head
0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0000140 00 00 00 00 00 00 00 96 96 96 96 96 96 96 96 96
0000160 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96
*
0000220 96 96 96 96 96 96 96 96 96 64 64 64 64 64 64 64
0000240 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64
*
0000740 64 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0000760 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Please could you suggest why Blacklist output is the entire chromosome.
Thanks.

@Pooran-Dewari
Copy link
Author

Pooran-Dewari commented Nov 21, 2022

Update: Blacklist with 23 inputs, doesn't help.

Tried with 23 inputs, and still gives me the entire chromosome as high signal region.
Please could you suggest why Blacklist output is the entire chromosome.
Thanks.

@Pooran-Dewari Pooran-Dewari changed the title Unable to read mappability files Entire chromosome outputs as high signal Nov 28, 2022
@aboyle
Copy link
Contributor

aboyle commented Nov 29, 2022

I think you probably need more than 100 input tracks for a quality blacklist. We used 1256, 287, 443, and 490 for human, mouse, fly, and worm.

@Pooran-Dewari
Copy link
Author

Thanks. I might be able to get another 8 inputs from our collaborators, so total 30+ inputs, hopefully should help. Is there a way we could restrict the size of bins to overcome this reporting of entire chr as high signal? Or do you think trying with another chromosome(s) would help?
thanks

@aboyle
Copy link
Contributor

aboyle commented Nov 30, 2022

It may be that there is very low signal in this chromosome? If you uncomment like 442 you can see what the thresholds are defined as and that might give some insight into why you are getting a whole chromosome like this.

@Pooran-Dewari
Copy link
Author

Pooran-Dewari commented Nov 30, 2022

Thanks. I can see on IGV tracks that many regions have high signal in input files, so assuming not low signal issue.
Do you think changing int uniqueLength from default 36 to read length (i.e. 150) and recompile the source will do the trick? I had tested with 100 150 kmers and didn't go with lower numbers for mappability, just wondering if that is causing the issue.

int uniqueLength = 36; //This is arbitraty and defines how long a read needs
				// to be to be considered unique
				// Should be set to something actually calculated in the uint8 files

	for(int i = 0; i <= mappability.size() - binSize; i+=binOverlap) {
		uniqueCntr = 0;
		for(int j = 0; j < binSize; j++) {
			if(mappability[i+j] > 0 && mappability[i+j] <= uniqueLength) {
				uniqueCntr++;
			}

thanks for your help.

@aboyle
Copy link
Contributor

aboyle commented Nov 30, 2022

This might be the issue - based on my comment it looks like the length needs to have been included in the unit8 files. Change that to 100 and give it a shot.

@Pooran-Dewari
Copy link
Author

Pooran-Dewari commented Nov 30, 2022

Thanks. Yessss, changing int uniqueLength to 100 worked!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants