[Idea] Rewriting in LLVM DSL #411

indutny · 2018-02-11T03:34:58Z

I know it sounds crazy, but bear with me 😉

What would you think about rewriting whole project using some JavaScript tool to compile a DSL to a LLVM IR?

I've several reasons for this:

Getting rid of huge switch statement that isn't optimized well, and putting separate states into separate procedures with optimized jumps between the states
Having architecture-specific vector comparisons that are easy to use

It doesn't sound to complicated, and may finally make our codebase shine.

indutny · 2018-02-11T20:39:12Z

Here is an example of what it might look like: https://github.com/indutny/llparse/blob/master/test/fixtures/example.js

bnoordhuis · 2018-02-12T13:41:28Z

If you're thinking of writing some lexer generator, I could definitely get behind that - I've had the same thought more than once - but why LLVM IR specifically and wouldn't we be reinventing ragel?

indutny · 2018-02-12T13:47:46Z

@bnoordhuis there're probably better solutions out there, but I always wanted to write something that compiles to LLVM IR. Is it a valid excuse?

On a more serious note, I'd like every state implementation to live in a separate procedure that tail-calls other procedures in the most of the cases. This might turn out to be faster than having a single grand dispatch.

indutny · 2018-02-12T13:49:01Z

Also on the feature list is having a low-level trie implementation that works seamlessly.

indutny · 2018-02-12T13:56:21Z

It could probably work in a way similar to Ragel, however it'd have to rely on LLVM to inline the C code into compiled nodes of the state machine graph.

bnoordhuis · 2018-02-12T16:11:28Z

You can't replace http-parser with something that spits out LLVM IR, that leaves too many users out in the cold. Something that compiles to C or has different back-ends could work though (but then you're half-way on the road to reimplementing ragel or re2c.)

a low-level trie implementation

What for?

indutny · 2018-02-12T21:28:48Z

We obviously need a trie-like state machine to replace hand-written state code for the methods.

bnoordhuis · 2018-02-12T21:47:07Z

Not sure I follow. For lexers, you compute the DFA or NFA and then emit state tables or goto-based code. I guess you could implement it as a trie but I don't know why you would.

indutny · 2018-02-12T21:55:50Z

I didn't mean in-memory trie, as a data-structure. Rather a compiled code consisting of branches and states linked in a trie-like structure.

bnoordhuis · 2018-02-12T22:03:09Z

Ah, okay. You get that for free with a DFA but I guess a trie is pretty much a DFA encoded as a tree.

indutny · 2018-02-13T06:07:37Z

I think I'm on the edge of giving up. Just figured out that musttail may not work on arm.

indutny · 2018-02-13T06:12:16Z

Nvm, it works! 👍

indutny · 2018-02-17T08:40:10Z

Took few iterations, but I think I've reached some intermediate milestone here: https://github.com/indutny/llparse/blob/master/test/api-test.js

What do you think, @bnoordhuis ?

indutny · 2018-02-17T09:34:55Z

Here is what it produces: https://gist.github.com/indutny/ac9eedc036a43098b2ad0bf7b63e9f65

indutny · 2018-02-17T23:30:57Z

@bnoordhuis better example here: https://github.com/indutny/llparse/tree/master/examples/http

bnoordhuis · 2018-02-19T22:32:03Z

API-wise it looks real nice and the generated code looks tight apart from a hiccup:

$ make -C examples/http
node index.js > http.ll
cc -g3 -Os -flto -fvisibility=hidden -Wall http.ll main.c -o http
warning: overriding the module target triple with x86_64-apple-macosx10.13.0 [-Woverride-module]
1 warning generated.
cannot guarantee tail call due to mismatched parameter counts
  %15 = musttail call fastcc i8* @http_parser__invoke_on_complete(%http_parser_state* %0, i8* %14, i8* %2)
cannot guarantee tail call due to mismatched parameter counts
  %40 = musttail call fastcc i8* @http_parser__invoke_on_complete(%http_parser_state* %0, i8* %39, i8* %2)
cannot guarantee tail call due to mismatched parameter counts
  %10 = musttail call fastcc i8* @http_parser__method(%http_parser_state* %0, i8* %1, i8* %2, i32 0)
LLVM ERROR: Broken module found, compilation aborted!
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [http] Error 1

$ cc -v
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin17.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Do you have plans for a C back-end?

indutny · 2018-02-19T22:37:32Z

The hiccup is due to a bug in -flto, see: https://bugs.llvm.org/show_bug.cgi?id=36441 . I've just pushed a fix to a Makefile in that example.

Here is some real work on porting http parser: https://github.com/indutny/llhttp.

I don't really have plans for a C back-end yet, but will likely have to explore this eventually.

indutny · 2018-02-19T22:40:44Z

Note that despite not being present in the example, I've just introduced an API for creating callbacks using LLVM IR (instead of a reference to C function): https://github.com/indutny/llparse/blob/81578c8f41a926514a6625b2a9c9941218728408/test/fixtures/index.js#L85-L121

After I'll add an API to extend the state structure from the user code, these compiled code chunks will be able to update the state without ever calling the C-land. C-land calls are sort of expensive right now, as they can't be inlined (due to various machine-specific flags that has to be matched).

Additionally, I plan to introduce "mark"s that would work in the same way as they do in http_parser.c

indutny · 2018-02-22T06:29:42Z

@bnoordhuis and so we got spans (instead of "mark"s): https://github.com/indutny/llhttp/blob/38fb3aed8c7290aec389a109e2e8cd1c004872c4/lib/llhttp/url.js#L12 . They can be interleaved if we'd ever want to, and they should be efficient.

chadbrewbaker · 2018-03-27T19:52:24Z

"Getting rid of huge switch statement that isn't optimized well"

Can the switches be permuted to get speedup on average inputs? Perhaps a utility that takes as input a URI corpus file and outputs optimal permutations for the switches?

indutny · 2018-03-27T20:18:49Z

@chadbrewbaker there's little point in this, switches like the one in http_parser are optimized into jump table.

indutny changed the title ~~[Idea] Re-writing LLVM DSL~~ [Idea] Re-writing in LLVM DSL Feb 12, 2018

indutny changed the title ~~[Idea] Re-writing in LLVM DSL~~ [Idea] Rewriting in LLVM DSL Feb 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Idea] Rewriting in LLVM DSL #411

[Idea] Rewriting in LLVM DSL #411

indutny commented Feb 11, 2018 •

edited

Loading

indutny commented Feb 11, 2018 •

edited

Loading

bnoordhuis commented Feb 12, 2018

indutny commented Feb 12, 2018

indutny commented Feb 12, 2018

indutny commented Feb 12, 2018

bnoordhuis commented Feb 12, 2018

indutny commented Feb 12, 2018

bnoordhuis commented Feb 12, 2018

indutny commented Feb 12, 2018

bnoordhuis commented Feb 12, 2018

indutny commented Feb 13, 2018

indutny commented Feb 13, 2018

indutny commented Feb 17, 2018

indutny commented Feb 17, 2018

indutny commented Feb 17, 2018

bnoordhuis commented Feb 19, 2018

indutny commented Feb 19, 2018

indutny commented Feb 19, 2018

indutny commented Feb 22, 2018

chadbrewbaker commented Mar 27, 2018

indutny commented Mar 27, 2018 •

edited

Loading

[Idea] Rewriting in LLVM DSL #411

[Idea] Rewriting in LLVM DSL #411

Comments

indutny commented Feb 11, 2018 • edited Loading

indutny commented Feb 11, 2018 • edited Loading

bnoordhuis commented Feb 12, 2018

indutny commented Feb 12, 2018

indutny commented Feb 12, 2018

indutny commented Feb 12, 2018

bnoordhuis commented Feb 12, 2018

indutny commented Feb 12, 2018

bnoordhuis commented Feb 12, 2018

indutny commented Feb 12, 2018

bnoordhuis commented Feb 12, 2018

indutny commented Feb 13, 2018

indutny commented Feb 13, 2018

indutny commented Feb 17, 2018

indutny commented Feb 17, 2018

indutny commented Feb 17, 2018

bnoordhuis commented Feb 19, 2018

indutny commented Feb 19, 2018

indutny commented Feb 19, 2018

indutny commented Feb 22, 2018

chadbrewbaker commented Mar 27, 2018

indutny commented Mar 27, 2018 • edited Loading

indutny commented Feb 11, 2018 •

edited

Loading

indutny commented Feb 11, 2018 •

edited

Loading

indutny commented Mar 27, 2018 •

edited

Loading