Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

organize or extract some useful utils #46

Open
lazywei opened this issue Dec 27, 2014 · 2 comments
Open

organize or extract some useful utils #46

lazywei opened this issue Dec 27, 2014 · 2 comments

Comments

@lazywei
Copy link

lazywei commented Dec 27, 2014

Hi,

I notice there are many useful tools in this package, for example, read/write libsvm format file, and the data matrix package etc. Do you have any plan to organize to extract them into packages? I think it would be good if we can put them into packages, so other ML-related package can be implemented base on these utils.
I'd like to help with such task if you have any plan on it! Please let me know what's your thought!
Thanks.

@ryanbressler
Copy link
Owner

The code has definitely outgrown all being in one package but I haven't had time to reorganize it. I'm happy to provide feedback on proposals and accept pull requests though.

The data file formats may not be the best starting point as they produce FeatureMatrixes filled with DenseNumFeature's and DenseCatFeature's and these types/interface also implement logic specific to the split searching and splitting criteria used in decision trees and handeling missing values in the way I do and thus aren't great for general use.

A general purpose parser should probably parse to either simple slices of data or something like the matrix types defined in gonum and i'm not sure they handle missing values or categorical data. Something like a pandas or R dataframe for go would be a good target but i'm not aware of one.

The code is BSD licensed so parts could also be spun off into independent projects if there is something you have a pressing need for.

@lazywei
Copy link
Author

lazywei commented Dec 28, 2014

The code has definitely outgrown all being in one package but I haven't had time to reorganize it. I'm happy to provide feedback on proposals and accept pull requests though.

Do you have any idea or draft thought on how we can organize it?

A general purpose parser should probably parse to either simple slices of data or something like the matrix types defined in gonum and i'm not sure they handle missing values or categorical data. Something like a pandas or R dataframe for go would be a good target but i'm not aware of one.

I agree. I always feel we should have a dataframe or pandas in Go. I've used gonum in another machine learning package golearn, and it's really good. However, I think it'd be better if we can have a higher level wrapper based on gonum (something like data-frame). It's necessary to have such fundamental infrastructure in order to build some awesome ML package in Go. Do you have any thought on such dataframe? I'd definitely like to find someone to discuss and build such tool together, it's a little off topic to this issue though LOL...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants