Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins for structured data and relative models #566

Open
rickycao-qy opened this issue Sep 8, 2020 · 19 comments
Open

plugins for structured data and relative models #566

rickycao-qy opened this issue Sep 8, 2020 · 19 comments
Assignees
Labels
model Machine learning model related issues and discussions

Comments

@rickycao-qy
Copy link
Collaborator

Currently, pipcook official-plugins have supported various types of tasks in CV and NLP area. The data in CV are normally pictures (png, jpg...) and will be texts in NLP.

However, there are still a lot of tasks that are closely related to front-end that will use structured data. For example, the front-end can collect much information about user's behaviours and fetch information about user's attrubutes. These structued data can be used in the recommendation tasks and regression tasks.

For structured-data process, we have found a good library built on top of js: Danfo.js. We could seek the chance to work closely with Danfo.js team to handle relational or labeled data.

Accordingly, it's good to have some machine learning models that are fast and accurate, like GBDT, SVM.

This issue is to suggest to have plugins:

  • data-collect / data-access

    • csv data collect
    • text data collect
  • data-process

    • missing-value process
    • normalization
    • Outliers process
  • model

    • GBDT
    • SVM

Welcome for more opinions

@rickycao-qy rickycao-qy self-assigned this Sep 8, 2020
@rickycao-qy rickycao-qy added the model Machine learning model related issues and discussions label Sep 8, 2020
@FeelyChau FeelyChau mentioned this issue Sep 8, 2020
16 tasks
@risenW
Copy link

risenW commented Sep 8, 2020

Looks great, just went through it. Danfojs pretty much has the capability to solve 1 and 2. As we can easily port it. But for modeling part, we'd have to design that from scratch.

Overall it's doable

@yorkie
Copy link
Member

yorkie commented Sep 8, 2020

@risenW Thank you for this review and do you mind working on the modeling part with us together?

@risenW
Copy link

risenW commented Sep 8, 2020

Yes, sure. We can work on it together. Should be fun implementing it JS 😊

@yorkie
Copy link
Member

yorkie commented Sep 8, 2020

That's great @risenW! And @rickycao-qy shall we have more implementation details about this issue?

@rickycao-qy
Copy link
Collaborator Author

@risenW Recommendation system has been heavily used in eCommerce industry companies. One of the traditional tasks is CTR prediction. We can start with some open-source dataset, like this one. For the model, we can start with GBDT+LR model to do this task. I remember @WenheLI has already developed a JS-version GBDT. Maybe we can make it easy to transfer this to our plugin to see if it meets our requirments. @WenheLI could you provide more details

@WenheLI
Copy link
Collaborator

WenheLI commented Sep 11, 2020

@risenW Any idea on implementing the models in JS?
I was thinking of exporting the wasm code directly from c++/rust implementation. In this way, we can have a good performance & less workload.

@risenW
Copy link

risenW commented Sep 11, 2020

I think that's a great idea as well. We would still write JS wrappers to call methods right?

@WenheLI
Copy link
Collaborator

WenheLI commented Sep 11, 2020

I think that's a great idea as well. We would still write JS wrappers to call methods right?

Yep, we still need to write the wrapper in js. But it is trivial compared with writing the whole logic in JS. Do you happen to know any library/implementation about SVM or GBDT written in c/rust/go?

@WenheLI
Copy link
Collaborator

WenheLI commented Sep 13, 2020

Just made an MVP SVM-wasm export based on libsvm.
https://github.com/WenheLI/libsvm-wasm
@risenW Any suggestions?

@risenW
Copy link

risenW commented Sep 13, 2020

Just made an MVP SVM-wasm export based on libsvm.
https://github.com/WenheLI/libsvm-wasm
@risenW Any suggestions?

Cool, I'll check it out later today. Also as regards GBDT, I think the popular Xgboost and Lightgbm are both written in C, and then wrappers were written on top of that. We could try exporting the core module

@risenW
Copy link

risenW commented Sep 14, 2020

Just made an MVP SVM-wasm export based on libsvm.
https://github.com/WenheLI/libsvm-wasm
@risenW Any suggestions?

So I'm trying to test out this package, I'm getting an error when compiling to Js. I'm also confused about this line:
import * as module from '../dist/libsvm'

because I can't find the module you're importing.

@WenheLI
Copy link
Collaborator

WenheLI commented Sep 14, 2020

https://github.com/WenheLI/libsvm-wasm

You need to build it first.
Run the following command under the root folder.

make .

@WenheLI
Copy link
Collaborator

WenheLI commented Sep 14, 2020

Just made an MVP SVM-wasm export based on libsvm.
https://github.com/WenheLI/libsvm-wasm
@risenW Any suggestions?

So I'm trying to test out this package, I'm getting an error when compiling to Js. I'm also confused about this line:
import * as module from '../dist/libsvm'

because I can't find the module you're importing.

@risenW I will update detailed documentation on building later.

@risenW
Copy link

risenW commented Sep 14, 2020

https://github.com/WenheLI/libsvm-wasm

You need to build it first.
Run the following command under the root folder.

make .

Now I'm getting the output nothing to be done for

@WenheLI
Copy link
Collaborator

WenheLI commented Sep 14, 2020

@risenW This is the full command:

 git submodule update
make

Be sure to install emscripten before running make

@risenW
Copy link

risenW commented Sep 26, 2020

Hi @WenheLI ,

Have you seen this repo. It looks useful for what we intend to do https://github.com/nok/sklearn-porter

@yorkie
Copy link
Member

yorkie commented Sep 26, 2020

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

@risenW It might be translating the trained models to executable by C/Java/JavaScript, developers have to use Python to write training scripts. However @WenheLI's libsvm exports train/inference abilities to Web developers.

By the way, we could also use boa to create JavaScript APIs by scikit-learn package, and use sklearn-porter to convert trained model to an executable for JavaScript runtimes, just like what @WenheLI have done at #582, which uses boa to call tensorflow/pytorch to train models out, and generating wasm format executables via TVM.

@yorkie
Copy link
Member

yorkie commented Sep 26, 2020

See https://github.com/nok/sklearn-porter/blob/stable/examples/estimator/classifier/SVC/js/basics.pct.ipynb, it seems to generate pure JavaScript, which should be compatible with #582, that sounds really good but I'm also considering the performance :)

@risenW
Copy link

risenW commented Sep 26, 2020

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.

@risenW It might be translating the trained models to executable by C/Java/JavaScript, developers have to use Python to write training scripts. However @WenheLI's libsvm exports train/inference abilities to Web developers.

By the way, we could also use boa to create JavaScript APIs by scikit-learn package, and use sklearn-porter to convert trained model to an executable for JavaScript runtimes, just like what @WenheLI have done at #582, which uses boa to call tensorflow/pytorch to train models out, and generating wasm format executables via TVM.

Oh I get it now. Thanks for the clarification. Will definitely take a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Machine learning model related issues and discussions
Projects
None yet
Development

No branches or pull requests

4 participants