Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasm-based UDF feature request and discussion #4060

Open
arcosx opened this issue Mar 21, 2022 · 5 comments
Open

Wasm-based UDF feature request and discussion #4060

arcosx opened this issue Mar 21, 2022 · 5 comments
Labels
community Source: who proposed the issue type/feature req Type: feature request

Comments

@arcosx
Copy link

arcosx commented Mar 21, 2022

Is your feature request related to a problem? Please describe.

After seeing some issues #3946 in the community about UDF, I tried to submit a Wasm-based UDF system. I submitted an RFC during the Nebula Hackathon 2021, which is some initial thoughts. This #3566 is the initial code.
I think Wasm-based UDF is a more radical route and needs more guidance and advice.

Describe the solution you'd like

The overall module design is as follows.

Compilation Toolchain: The compiler toolchain is provided for user use and contains a series of initial templates. Users must embed their own code into the initial templates

Parsing: Create functions based on Flex and Bison implementations, and delete SQL statements for functions. The implementation syntax is in the figure below. We consider the two mainstream formats of WebAssembly text format (wat) and Wasm binary files. And its loading methods have two types. The first one is wat base64 encoding and Wasm binary file base64 encoding, which is convenient for direct input in the terminal, the program with the running size at the KB level can be directly imported. The second is the MB level Wasm Program binary files, which support importing by HTTP address and by local file address.

Function Management: Responsible for unified management of Wasm virtual machines, providing dynamic update, loading, and unloading of functions. It can also be said to be the glue for Nebula's other systems and the Wasm extension components.

Runtime: Here we introduce the C++ SDK of Wasm virtual machine,like Wasmtime, and WasmEdge. Calling SDK to compile wat code, compile and execute Wasm binary file, manage sandbox instances and WASI features, etc.

无标题-2022-03-21-1508

The project will face some known difficulties

  • Embedding the whole Wasm runtime into C++ has many compatibility issues.
  • Large differences in UDF design with other database systems. There are some learning costs for users.
  • Other language communities are not very mature in Wasm related, and although we aim to support all languages, only UDFs for Rust languages can be more easily compiled into Wasm for execution.

Describe alternatives you've considered

Additional context

Does the official team have some initial thoughts on the UDF system design. If the official thinks the Wasm-based UDF system is in line with the project's future plans, I can do this PR.

@arcosx arcosx added the type/feature req Type: feature request label Mar 21, 2022
@Shylock-Hg
Copy link
Contributor

How do you think about LLVM backend instead of WASM?

@arcosx
Copy link
Author

arcosx commented Mar 22, 2022

How do you think about LLVM backend instead of WASM?

Nice idea! I don't know LLVM, but just now I have checked some documentation and lesson to understand it 😂.
Processing Java UDFs in a C++ environment
Wanderman-Milne-Cloudera.pdf
24 - Server-Side Logic Execution (CMU Databases / Spring 2020) - YouTube
I think LLVM is more mature and stable than Wasm in the compilation stage.It also seems to have more performance advantages.
It seems that the only advantage of Wasm is sandbox security, which can also be circumvented by static checks.
So Nebula is there a plan to use LLVM for UDF? I think it's a great idea.

@Shylock-Hg
Copy link
Contributor

How do you think about LLVM backend instead of WASM?

Nice idea! I don't know LLVM, but just now I have checked some documentation and lesson to understand it 😂. Processing Java UDFs in a C++ environment Wanderman-Milne-Cloudera.pdf 24 - Server-Side Logic Execution (CMU Databases / Spring 2020) - YouTube I think LLVM is more mature and stable than Wasm in the compilation stage.It also seems to have more performance advantages. It seems that the only advantage of Wasm is sandbox security, which can also be circumvented by static checks. So Nebula is there a plan to use LLVM for UDF? I think it's a great idea.

Not, it's just my mind. The plan of our team is still ongoing.

@arcosx
Copy link
Author

arcosx commented Mar 22, 2022

I think there are many ways to extend UDF here, nothing more than extending the connection to the code itself, like DLL, or putting in a new VM, like Wasm` or V8. probably a lot of the work is in the middle glue layer. There is also the development cost for the user to consider.
If your team are considering to use Wasm based UDF, can refer to my UDF implementation in this PR #3566, this code is relatively minimal and the code logic is the same as the RFC I submitted.
I am also considering to try to apply Wasm based extension mechanism in other areas.More work is really needed here to refine it.

@Shylock-Hg
Copy link
Contributor

I think there are many ways to extend UDF here, nothing more than extending the connection to the code itself, like DLL, or putting in a new VM, like Wasm` or V8. probably a lot of the work is in the middle glue layer. There is also the development cost for the user to consider. If your team are considering to use Wasm based UDF, can refer to my UDF implementation in this PR #3566, this code is relatively minimal and the code logic is the same as the RFC I submitted. I am also considering to try to apply Wasm based extension mechanism in other areas.More work is really needed here to refine it.

Yes, on the other hand, supporting multiple configurable UDF backends is also a good choice if valuable. In this case, LLVM or WASM is not a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Source: who proposed the issue type/feature req Type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants