Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue with uint8 in bmat #19

Open
dawe opened this issue Sep 26, 2019 · 3 comments
Open

Possible issue with uint8 in bmat #19

dawe opened this issue Sep 26, 2019 · 3 comments

Comments

@dawe
Copy link

dawe commented Sep 26, 2019

I'm trying to parse snap files into python sparse matrices. This is what I'm doing

import numpy as np
import h5py
import snaptools.utilities
import scipy.sparse as sp

f= h5py.File(myfile, 'r')
n_cells = len(f['BD/name'])
bin_dict = snaptools.utilities.getBinsFromGenomeSize(genome_dict, bin_size) #from snaptools code
n_bins = len(bin_dict)
idy = f['AM/5000/idy'][:]
idx = np.arange(n_cells + 1)
data = data = f['AM/5000/count'][:]

X = sp.csc_matrix((data, idy, idx), shape=(n_bins, n_cells))

Everything seems to work but I've noticed two things:

  • my data are capped at 255
  • there are many more zeros than I previously found with another method (outside snaptools)

as for the second I've thought that maybe I was counting wrong but reading the snap file internals I've realized counts are saved as uint8, which explains the capping to 255. The problem is that at line 55 of add_bmat.py the counter is a generic python integer

            bins = collections.defaultdict(lambda : 0);

which is then casted to uint8 at time of writing (line 79).

        f.create_dataset("AM/"+str(bin_size)+"/count", data=countList[bin_size], dtype="uint8", compression="gzip", compression_opts=9);    

This causes the values to be set to the modulus of X % 256. I don't know if standard scATAC experiments expect read counts per bin being below 255, but this is not my case.

@r3fang
Copy link
Owner

r3fang commented Sep 30, 2019 via email

@dawe
Copy link
Author

dawe commented Oct 1, 2019

I see. Still if you have a bin counting 256 that would be set to 0, also when it is binarized. If 255 should be the max value any bin should be capped before writing the snap object

@r3fang
Copy link
Owner

r3fang commented Oct 2, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants