-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementations of various Naive Bayes classifier. #39
Conversation
Multinomial version with - smoothing - Bernoulli evaluation Gaussian for dealing with continous data.
let aa = feature_size + 1 in | ||
let update arr idx = | ||
Array.iter (fun i -> arr.(i) <- arr.(i) + 1) idx; | ||
(* keep track of the class count at the end of array. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the count at the end is a bit sneaky. Does it have a significant performance impact or keep the code significantly simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably does not have a significant performance impact. The type signature
'a * float * float array
looked awkward to me. Plus un/re-boxing the tuple in the association list below seemed like a waste. It did make things easy to multiply since you could just fold over the entire array, until I started dealing with the smoothing.
Let's see if I like the way the code looks in 3-6 months. If it seems like a bad choice then, I'll factor out the prior into a separate float.
Another nice to have feature for this package would be to provide an incremental version, perhaps in the spirit of scikit's implementation (although it still does leave you with some tough choices to make). I was trying to dig up a nice paper on online multinomial NB but I haven't found one yet (here's a good paper on Online LDA though, which also has a python implementation). |
An online version wouldn't be that difficult to implement. Move the code after https://github.com/rleonid/oml/blob/naive_bayes/src/lib/classify.ml#L104, into a closure for the type to be evaluated at the I'll also add the LDA paper link to another issue. |
Implementations of various Naive Bayes classifier.
Implementations of various Naive Bayes classifier.
Implementations of various Naive Bayes classifier.
Multinomial version with
Gaussian version for dealing with continuous data.