-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extensive traversal of cross-validation folds #95
Conversation
Thanks for this excellent PR! I really appreciate the amount of depth that you went into in your exploration, your thoughtful experimentation plan, and the detailed documentation of the narrative and your interpretations in the notebook comments. This notebook really makes for compelling reading, which is the goal with this type of work. If you would like to merge this as-is so as to record this contribution, that is fine. Otherwise, you're welcome to continue working on it. @SaumyaSinghal Please let me know either way in the comments. Also, please resolve the merge conflicts that are currently showing. I would start by removing the startup task notebook and rebasing onto the current main repo (upstream) master. I think the interesting point of the comparison between ShuffleSplit and KFold CV is the difference between resampled (overlapping) splits and deterministic disjoint ones. However, varying the What would make for a more insightful comparison I think is to reparametrize your Personally I would use line plots rather than bars for the graphs, as I think that would make comparisons easier. It would also be interesting to include a couple of really large values for Don't worry about holding out a separate test set here. That is definitely best practice in training a model with CV, but this analysis is more about exploring how CV works, and it's not necessary here. |
Thank you so much for your feedback and suggestions! These are some very interesting points that you have mentioned and I definitely look forward to exploring them.
I forgot to change this. I was doing this at the start of this month thinking that I can somehow get the 10% returned and use it as a separate test set (I wondered why else would they even have this option) but I later realized that that is absolutely not how cross-validation works. I even changed ShuffleSplit to K-fold for the Startup task but I overlooked this one and left out 10% for no reason. I shall fix this; and comparing with similar train-test splits sounds like a very interesting idea too.
@dzeber I'd like to work on the aspects that you've mentioned before merging this in. Regarding conflicts, there is an updated version of these files lying in master but as the master has also not been merged (I don't have a folder with my name as of yet in /dev) there is a conflict. I suppose I can just pull these two files into this branch as you have mentioned. The master is actually associated with PR #36 (I should have created a separate branch for this though) and I think that should be merged before merging any other branch so would you suggest this branch to wait for it? |
Oh my bad, one of my PRs was merged some hours ago, so I think a folder under my name exists now :) |
Sounds good! I've added the WIP tag to this effect - feel free to remove it when you're ready to merge. |
Thank you. I am working on all the unfinished issues parallelly so none of them are complete now with the degree of exploration I would like but I hope to complete everything by tomorrow midnight and then get them merged :D |
@dzeber this can be merged now, I have kept in mind your suggestions (which has made the exploration much more interesting) and have updated it accordingly :) |
In the notebook I have tried to go deep into the two main cross-validation techniques (with reference to issue #4), shuffle split and k-fold split, along with their stratified variations. It was an interesting exploration and I learnt a great deal about each. Some things, however, that I would later incorporate into this are:
I am eager for any suggestion for improvement or interesting direction for exploration :)