Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring histogram binning for gene ontology data #35

Open
noamteyssier opened this issue Jun 28, 2022 · 1 comment
Open

Exploring histogram binning for gene ontology data #35

noamteyssier opened this issue Jun 28, 2022 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@noamteyssier
Copy link
Collaborator

Opening an issue to see if you have already tested this @artemy-bakulin, but could be interesting to see what the effect of different binning strategies are w.r.t. gene membership binning.

Your current method GeneOntology._build_bin_split is performing an even split over the gene membership, but since that value is more exponentially distributed it could be interesting to instead perform the split using the log2 histogram.

Here's an example of that split with the two methods with a bin size of 5 on GO_BP_2021:

image

and in log2

image

@noamteyssier noamteyssier added the question Further information is requested label Jun 28, 2022
@artemy-bakulin
Copy link
Collaborator

Yeah, I have tested that with simulated data and split method has higher TNR than hist. The difference is small, though.
I also added a third variant from my initial code and it is even better. It is also much simpler and faster. I will make a pull request of it soon, there I also removed many of your expression preprocessing functions because I could not understand how they work though the task is actually pretty straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants