Skip to content

Latest commit

 

History

History
43 lines (24 loc) · 1.73 KB

README.md

File metadata and controls

43 lines (24 loc) · 1.73 KB

MLlib - Logistic Regression with Dropout

This is an extension of Spark MLlib, implementing logistic regression with dropout regularization.

Dropout regularization usually works better than L2-regularization, as it emphasis the contribution of rarely occurring, but discriminative, features during classification [2]. This makes it well suited for application like NLP, where the data is sparse.

Having said that, it might actually act as a detriment when the data is extremely sparse, as dropping off some of the features in already sparse space might not leave sufficient information for the model to learn at all [4].

Building

This repo is written in Scala with sbt, using Spark 1.3.0.

Use the following to run a simple example.

sbt
run-main dropout.example

To check performnce of NewsGroup-20 dataset (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html), run the following

sbt 
run-main dropout.news20

References

  1. Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.

  2. Wager, Stefan, Sida Wang, and Percy S. Liang. "Dropout training as adaptive regularization." Advances in Neural Information Processing Systems. 2013.

  3. http://www-nlp.stanford.edu/~sidaw/home/_media/papers:fastdropout.pdf

  4. McMahan, H. Brendan, et al. "Ad click prediction: a view from the trenches." Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.

Disclaimer

The repo is not thoroughly tested. Performs might not be as expected. I will add more testing and examples along the way. Any comments or contribution are welcome.