Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with regression failing in Bagel 2. Version 0.91 still supported? #8

Open
mjafin opened this issue Sep 23, 2019 · 2 comments
Open

Comments

@mjafin
Copy link

mjafin commented Sep 23, 2019

Hi there,
Thanks for providing the code for Bagel, great program.

Long story short, we'd like to fix all random elements in our pipeline for integration tests and the seed setting in later versions of Bagel helps us immensely. I've noticed however that with our test data the linear regression in 2.0 fails so that the cross validation or bootstrapping almost never finish (https://github.com/hart-lab/bagel/blob/master/BAGEL.py#L700)

I've therefore reverted back to 0.91, which doesn't support setting the seed. I've modified it locally to set the random seed and was wondering if you still support the code at https://sourceforge.net/p/bagel-for-knockout-screens/code/ci/master/tree/ and accept patches? Any plans of having the old code as a branch on Github?

Best wishes and thanks again,
Miika

@rooeikim
Copy link
Contributor

rooeikim commented Sep 25, 2019

Hi Miika,
Thank you for using BAGEL.
The linear regression in v2 can be failed when the sample doesn't meet our thresholds to define a fc window. It usually happened in case of that core-ess and non-ess fc distributions of the sample are indistinguishable or inverted. I recommend to check the fc distributions. I made a simple ipynb code for you. https://github.com/rooeikim/codes
And, I'll port v0.91 to github to keep the legacy code soon and add the option to support setting a seed.

Best regards,
Eiru Kim.

@mjafin
Copy link
Author

mjafin commented Sep 27, 2019

Thanks Eiru. I modified 0.91 on my side so happy to do a pull request if you want the seed setting incorporated that way.

Regarding v2, rather than exiting when the regression doesn't work how about reverting to some kind of meaningless values or skipping to the next CV/bootstrap iteration? At the moment my analysis randomly never finishes. Or what about using some sort of a regularised regression method? I'm assuming this is some kind of a multicollinearity problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants