Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide access to index stats #294

Closed
jeromekelleher opened this issue Jan 18, 2024 · 6 comments
Closed

Provide access to index stats #294

jeromekelleher opened this issue Jan 18, 2024 · 6 comments

Comments

@jeromekelleher
Copy link
Contributor

It would be super useful to be able to find out the number of variants in a file, like bcftools index -n provides. The code used in bcftools index is here and looks relatively straightforward. I think there's access to the tbx and/or hts_idx already in the VCF class, so it should be possible to port this code into a new method with no other changes?

I guess ideally this would return a dictionary mapping contig names to the number of records (which is the raw information from the index, if I'm reading the bcftools code correctly)?

I guess a simple num_records property would also be a useful thing to have, with the proviso that it raises an error or returns None if there is no index, or the required information isn't in the index.

Does this sound doable?

@brentp
Copy link
Owner

brentp commented Jan 18, 2024

Yes, this sounds completely doable and in scope.
I'm open to a PR that adds it.

@jeromekelleher
Copy link
Contributor Author

Great! I'll have a crack at it. The code will definitely need some close review as I'm not that familiar with htslib internals - I hope that's OK.

@brentp
Copy link
Owner

brentp commented Jan 18, 2024

Certainly!
I think it should be straightforward.

@jeromekelleher
Copy link
Contributor Author

I'm having a go, but have hit a wall with accessing additional symbols from htslib. I want to use, e.g., hts_idx_nseq but I always seem to get:

CFLAGS="-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION" CYVCF2_HTSLIB_MODE=BUILTIN CYTHONIZE=1 python3 setup.py build_ext --inplace
Compiling cyvcf2/cyvcf2.pyx because it changed.
[1/1] Cythonizing cyvcf2/cyvcf2.pyx

Error compiling Cython file:
------------------------------------------------------------
...

        if idx == NULL:
            return -1

        #nseq= 0
        nseq = hts_idx_nseq(self.hidx)
               ^
------------------------------------------------------------

cyvcf2/cyvcf2.pyx:700:15: undeclared name not builtin: hts_idx_nseq

Error compiling Cython file:
------------------------------------------------------------
...

        if idx == NULL:
            return -1

        #nseq= 0
        nseq = hts_idx_nseq(self.hidx)
                                ^
------------------------------------------------------------

cyvcf2/cyvcf2.pyx:700:32: Cannot convert 'hts_idx_t *' to Python object

I can't see any difference between the way I'm using this vs things like cnames = tbx_seqnames(self.idx, &n) . Is there some trick for getting C functions recognised by Cython? Sorry, I'm not very good with Cython.

@brentp
Copy link
Owner

brentp commented Jan 18, 2024

You can add the definition of hts_idx_nseq to the pxd around here
then it should work as you expect.

@jeromekelleher
Copy link
Contributor Author

🤦‍♂️

Thanks @brentp!

jeromekelleher added a commit to jeromekelleher/cyvcf2 that referenced this issue Jan 18, 2024
@brentp brentp closed this as completed in 4659f50 Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants