-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random seed and model stability #114
Comments
Great suggestion! |
I think it's a great suggestion. I've been seeing that as well with our attempts. |
I agree, a good suggestion. For most models, it wouldn't take much for me to implement this, and usually the model instantiation is separate enough from the CV workflow that I can make it fast enough to do many realizations.., but some models have been 'tricky' to reproduce, either because of the workflow or the setting of seeds (eg we had trouble getting seeds to work properly with Keras/TensorFlow). I'll take a look at the top 3 now, since I know I can work with those, and report back. |
OK, here's a description of 100 realizations for the current HouMath model: Median: 0.619 So it looks like, indeed, my lazy 'method' was rather favourable. In my defence, I used the same approach for everyone, so I hope there's been no unfairness. Either way, I'll take a look at some more models now. To make sure they see this conversation, I'll cc @dalide @ar4 @bestagini @gccrowther @lperozzi @thanish @mycarta @alexcombessie |
I guess a Std of 0.007 would not be a big deal in an industrial application (the number of misclassification does not change dramatically). But looking at how close to each other we are in the contest, this is significant. |
Result from @ar4's submission: |
Hi everybody! I also agree that considering the average, median, or some other value obtained after testing multiple random seeds could be a good option. This could also solve another problem. Working with Keras (TensorFlow or Theano backend), I am having issues in fixing a given seed for reproducibility. Hopefully, averaged results could be more representative of the proposed method. |
@bestagini Ah yes, of course that property of the 'unreproducible' results is dealt with here... I was thinking they'd be a problem but obviously that's the whole point: we're fixing that problem :) Here's the same treatment of your own entry: So implementing this will indeed change the order of the 2nd and 3rd entries, as things stand. Side note to @alexcombessie — I can't reproduce your workflow, so only have your submission to go on. I will have another crack at it. @Antoine-Cate I am working on yours now. |
Here's geoLEARN's result: cc @Antoine-Cate @lperozzi @mablou Rather than soaking up this thread, maybe I'll just start putting the validation scores (all realizations) into another folder, so everyone can see the data etc. Stay tuned. Fearing for the rest of my day, I might adopt the following strategy:
For the record, here's how I'm getting the realizations (generic example:
|
Excellent idea. One thing to consider is how to handle cases (such as my own) where there are two fits (PE, and then Facies). Would two loops be a good approach - an outer loop for 10 iterations that picks the seed for the PE fit, and an inner loop around the Facies fit for another 10 iterations? Edit: Or one outer loop for 100 iterations that picks two random seeds each time. |
@ar4 Just to be clear: I'm just getting the results from 100 seeds, and averaging the scores those results achieve. So there's no optimization going on. You probably got this, just checking :) I made a new workflow, bringing in a super hacky way the PE part into the seed-setting loop. I checked this in so you can see it HERE... Please check it! The score is now like this: Did I understand what you were asking?? |
By the way everyone, the results from realizations are now in the Stochastic_validations directory. |
Ah, I wondered for a moment why you wanted to clarify that there was no optimization going on, and finally realised that my choice of the phrase "picks the seed" was problematic. Now there would be an example of overfitting! ;-) (I just meant a new seed is picked/set for each loop iteration.) Your modification seems to be approximately my second proposal (one outer loop), but I see you use the same seed for both the PE and Facies steps. It's probably not a problem, but two random seeds - one for each - seems like it might be a bit safer. |
Since this was referenced here, what our team was trying to prove is that using an "ensemble" of results one would get an improved result. Full credit to the top four and our score will stay at 0.568 (until we submit our own work). We did expect this "meta-submission" to perform better than it did though. |
Hi everyone,
As everyone has seen, the random seed can have a significant effect on the prediction scores. This is due to the fact that most of us are using algorithms with a random component (e.g., random forest, extra trees...).
The effect is probably enhanced by the fact that the dataset we are working on is small and non stationary.
Matt has been solving the problem by testing a series of random seeds and taking the best. This avoids discarding a model just because of a "bad" random seed. However, this might favor the most unstable models. A very stable model will yield scores in a small range when testing several random seeds, while an unstable model will yield a wide range of scores when testing several random seeds. Thus, it is likely that an unstable model can get a very high score given enough random seeds are tested. But it does not mean the model will be good at predicting new test data.
A possible solution would be to test 10 (or an other number) random seeds and to take the median score as the prediction score. It would require us to directly include that in our scripts to avoid further work for Matt. We could just make 10 predictions, using 10 random seeds and export them in a single csv file.
What do you guys (and especially Matt) think about that?
The text was updated successfully, but these errors were encountered: