MLlib - Logistic Regression with Dropout

This is an extension of Spark MLlib, implementing logistic regression with dropout regularization.

Dropout regularization usually works better than L2-regularization, as it emphasis the contribution of rarely occurring, but discriminative, features during classification [2]. This makes it well suited for application like NLP, where the data is sparse.

Having said that, it might actually act as a detriment when the data is extremely sparse, as dropping off some of the features in already sparse space might not leave sufficient information for the model to learn at all [4].

Building

This repo is written in Scala with sbt, using Spark 1.3.0.

Use the following to run a simple example.

sbt
run-main dropout.example

To check performnce of NewsGroup-20 dataset (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html), run the following

sbt 
run-main dropout.news20

References

Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.
Wager, Stefan, Sida Wang, and Percy S. Liang. "Dropout training as adaptive regularization." Advances in Neural Information Processing Systems. 2013.
http://www-nlp.stanford.edu/~sidaw/home/_media/papers:fastdropout.pdf
McMahan, H. Brendan, et al. "Ad click prediction: a view from the trenches." Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.

Disclaimer

The repo is not thoroughly tested. Performs might not be as expected. I will add more testing and examples along the way. Any comments or contribution are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MLlib - Logistic Regression with Dropout

Building

References

Disclaimer

Files

README.md

Latest commit

History

README.md

File metadata and controls

MLlib - Logistic Regression with Dropout

Building

References

Disclaimer