-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugins for structured data and relative models #566
Comments
Looks great, just went through it. Danfojs pretty much has the capability to solve 1 and 2. As we can easily port it. But for modeling part, we'd have to design that from scratch. Overall it's doable |
@risenW Thank you for this review and do you mind working on the modeling part with us together? |
Yes, sure. We can work on it together. Should be fun implementing it JS 😊 |
That's great @risenW! And @rickycao-qy shall we have more implementation details about this issue? |
@risenW Recommendation system has been heavily used in eCommerce industry companies. One of the traditional tasks is CTR prediction. We can start with some open-source dataset, like this one. For the model, we can start with GBDT+LR model to do this task. I remember @WenheLI has already developed a JS-version GBDT. Maybe we can make it easy to transfer this to our plugin to see if it meets our requirments. @WenheLI could you provide more details |
@risenW Any idea on implementing the models in JS? |
I think that's a great idea as well. We would still write JS wrappers to call methods right? |
Yep, we still need to write the wrapper in js. But it is trivial compared with writing the whole logic in JS. Do you happen to know any library/implementation about SVM or GBDT written in c/rust/go? |
Just made an MVP SVM-wasm export based on |
Cool, I'll check it out later today. Also as regards GBDT, I think the popular Xgboost and Lightgbm are both written in C, and then wrappers were written on top of that. We could try exporting the core module |
So I'm trying to test out this package, I'm getting an error when compiling to Js. I'm also confused about this line: because I can't find the module you're importing. |
You need to build it first. make . |
@risenW I will update detailed documentation on building later. |
Now I'm getting the output |
@risenW This is the full command: git submodule update
make Be sure to install |
Hi @WenheLI , Have you seen this repo. It looks useful for what we intend to do https://github.com/nok/sklearn-porter |
@risenW It might be translating the trained models to executable by C/Java/JavaScript, developers have to use Python to write training scripts. However @WenheLI's libsvm exports train/inference abilities to Web developers. By the way, we could also use boa to create JavaScript APIs by scikit-learn package, and use sklearn-porter to convert trained model to an executable for JavaScript runtimes, just like what @WenheLI have done at #582, which uses boa to call tensorflow/pytorch to train models out, and generating wasm format executables via TVM. |
See https://github.com/nok/sklearn-porter/blob/stable/examples/estimator/classifier/SVC/js/basics.pct.ipynb, it seems to generate pure JavaScript, which should be compatible with #582, that sounds really good but I'm also considering the performance :) |
Oh I get it now. Thanks for the clarification. Will definitely take a look at it. |
Currently, pipcook official-plugins have supported various types of tasks in CV and NLP area. The data in CV are normally pictures (png, jpg...) and will be texts in NLP.
However, there are still a lot of tasks that are closely related to front-end that will use structured data. For example, the front-end can collect much information about user's behaviours and fetch information about user's attrubutes. These structued data can be used in the recommendation tasks and regression tasks.
For structured-data process, we have found a good library built on top of js: Danfo.js. We could seek the chance to work closely with Danfo.js team to handle relational or labeled data.
Accordingly, it's good to have some machine learning models that are fast and accurate, like GBDT, SVM.
This issue is to suggest to have plugins:
data-collect / data-access
data-process
model
Welcome for more opinions
The text was updated successfully, but these errors were encountered: