Skip to content

rakeshchalasani/MLlib-dropout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLlib - Logistic Regression with Dropout

This is an extension of Spark MLlib, implementing logistic regression with dropout regularization.

Dropout regularization usually works better than L2-regularization, as it emphasis the contribution of rarely occurring, but discriminative, features during classification [2]. This makes it well suited for application like NLP, where the data is sparse.

Having said that, it might actually act as a detriment when the data is extremely sparse, as dropping off some of the features in already sparse space might not leave sufficient information for the model to learn at all [4].

Building

This repo is written in Scala with sbt, using Spark 1.3.0.

Use the following to run a simple example.

sbt
run-main dropout.example

To check performnce of NewsGroup-20 dataset (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html), run the following

sbt 
run-main dropout.news20

References

  1. Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.

  2. Wager, Stefan, Sida Wang, and Percy S. Liang. "Dropout training as adaptive regularization." Advances in Neural Information Processing Systems. 2013.

  3. http://www-nlp.stanford.edu/~sidaw/home/_media/papers:fastdropout.pdf

  4. McMahan, H. Brendan, et al. "Ad click prediction: a view from the trenches." Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013.

Disclaimer

The repo is not thoroughly tested. Performs might not be as expected. I will add more testing and examples along the way. Any comments or contribution are welcome.

About

Package adding dropout regularization to Apache Spark MLlib project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages