-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Idea] Rewriting in LLVM DSL #411
Comments
Here is an example of what it might look like: https://github.com/indutny/llparse/blob/master/test/fixtures/example.js |
If you're thinking of writing some lexer generator, I could definitely get behind that - I've had the same thought more than once - but why LLVM IR specifically and wouldn't we be reinventing ragel? |
@bnoordhuis there're probably better solutions out there, but I always wanted to write something that compiles to LLVM IR. Is it a valid excuse? On a more serious note, I'd like every state implementation to live in a separate procedure that tail-calls other procedures in the most of the cases. This might turn out to be faster than having a single grand dispatch. |
Also on the feature list is having a low-level trie implementation that works seamlessly. |
It could probably work in a way similar to Ragel, however it'd have to rely on LLVM to inline the C code into compiled nodes of the state machine graph. |
You can't replace http-parser with something that spits out LLVM IR, that leaves too many users out in the cold. Something that compiles to C or has different back-ends could work though (but then you're half-way on the road to reimplementing ragel or re2c.)
What for? |
We obviously need a trie-like state machine to replace hand-written state code for the methods. |
Not sure I follow. For lexers, you compute the DFA or NFA and then emit state tables or goto-based code. I guess you could implement it as a trie but I don't know why you would. |
I didn't mean in-memory trie, as a data-structure. Rather a compiled code consisting of branches and states linked in a trie-like structure. |
Ah, okay. You get that for free with a DFA but I guess a trie is pretty much a DFA encoded as a tree. |
I think I'm on the edge of giving up. Just figured out that |
Nvm, it works! 👍 |
Took few iterations, but I think I've reached some intermediate milestone here: https://github.com/indutny/llparse/blob/master/test/api-test.js What do you think, @bnoordhuis ? |
Here is what it produces: https://gist.github.com/indutny/ac9eedc036a43098b2ad0bf7b63e9f65 |
@bnoordhuis better example here: https://github.com/indutny/llparse/tree/master/examples/http |
API-wise it looks real nice and the generated code looks tight apart from a hiccup:
Do you have plans for a C back-end? |
The hiccup is due to a bug in Here is some real work on porting http parser: https://github.com/indutny/llhttp. I don't really have plans for a C back-end yet, but will likely have to explore this eventually. |
Note that despite not being present in the example, I've just introduced an API for creating callbacks using LLVM IR (instead of a reference to C function): https://github.com/indutny/llparse/blob/81578c8f41a926514a6625b2a9c9941218728408/test/fixtures/index.js#L85-L121 After I'll add an API to extend the state structure from the user code, these compiled code chunks will be able to update the state without ever calling the C-land. C-land calls are sort of expensive right now, as they can't be inlined (due to various machine-specific flags that has to be matched). Additionally, I plan to introduce "mark"s that would work in the same way as they do in http_parser.c |
@bnoordhuis and so we got spans (instead of "mark"s): https://github.com/indutny/llhttp/blob/38fb3aed8c7290aec389a109e2e8cd1c004872c4/lib/llhttp/url.js#L12 . They can be interleaved if we'd ever want to, and they should be efficient. |
"Getting rid of huge switch statement that isn't optimized well" Can the switches be permuted to get speedup on average inputs? Perhaps a utility that takes as input a URI corpus file and outputs optimal permutations for the switches? |
@chadbrewbaker there's little point in this, switches like the one in http_parser are optimized into jump table. |
I know it sounds crazy, but bear with me 😉
What would you think about rewriting whole project using some JavaScript tool to compile a DSL to a LLVM IR?
I've several reasons for this:
It doesn't sound to complicated, and may finally make our codebase shine.
cc @bnoordhuis @mscdex
The text was updated successfully, but these errors were encountered: