Bytecode to bir #1472

Bike · 2023-07-21T23:02:07Z

Defines a system for compiling bytecode directly into BIR (and thereafter into native code). Hooks it up to cl:compile, so that running cl:compile on a bytecoded function will compile it to native. Note that bytecoded functions are still considered compiled (i.e. are compiled-function-p). I think this behavior is kosher. There is no automatic compilation yet.

This is experimental. It is not ready for prime time and some of my design is pretty sloppy. But it's stable enough that I want to try using it as part of normal workflows so we can see what breaks.

Bike · 2023-07-28T17:41:45Z

@drmeister has asked me not to merge this until we sort out what's going wrong with snapshots - they are not currently working on this branch, but not working on main either.

rather than as calls that futz with MV. This can be used as a correctness condition, and it makes compiling to IR easier.

The unbound objects still cause all kinds of bad problems like segffaults that we absolutely do not want to be user-exposed.

This uniformly puts arguments on the VM stack before a call, VM or not, and saves a copy for multi-argument-form mv calls (the copy into the mv vector by pop-values). But my actual impetus is this structure makes the compilation to BIR easier, as all mv-calls are preceded by push/append-values, like how BIR requires a value-collect for any mv call. I had to increase the VM stack size to keep the build working, though. Bit worrying. The failure was when compiling the package definition for closer-mop, of all things.

This localizes the LLVM stuff in an entry point that accepts BIR, so if we produce BIR by some other means (e.g. from bytecode) we can reuse said stuff.

Still needs work, but it's functional.

If we need to test bytecodeness, we can just look at the type of the simple fun.

this makes setq a little less ridiculous to generate - we just use a stack spot instead of a local variable. This will reduce the number of local variables needed by some functions. Also makes things a little easier on bytecode-to-bir. It was screwing things up in that the temporary variable was conflated with earlier bound variables, which caused problems when the variable was closed over.

I will be adding more for the sake of further compilation, and this will make that easier. It also makes it possible to assert that the debug infos are in order (i.e. their START indices are nondecreasing) which is very convenient for the compiler and debugger.

Doing optional variables etc is going to be annoying

FLET might need a little more work

The bytecode compiler will have to keep track of optimize declarations for the btb compiler.

christ, i am stupid sometimes

This cuts down on pointless annotations.

By generation, and also in LTV FASLs

not gonna lie, this code is kind of ugly.

No longer necessary given the variable annotations and DUP.

Functions being relocated means that just generating the debug infos all in the same module vector as you go can result in non monotonic positions.

This is a pretty messy solution. drop-mv never actually needs to be executed, but it lets the compiler treat unconditional jumps as not happening for the purpose of compiling unreachable code correctly. FIX ME

without the notnilp it always returns true, which screws a few things up. durr

The more important intent of this is to store information about PHIs to simplify the btb compiler.

It does not seem to be used anywhere. It adds extra memory to the threads which is then never used, since those threads never even run Lisp code. Running the constructors can also be complicated and is again unnecessary.

Having a large array inside the VirtualMachine (and thus the ThreadLocalState) seems to crash the linker on Mac. It's also just sort of wonky. Probably ideally we'd have a more sophisticated setup here with mmap and guard pages and growth and yada yada, but this works ok. There is a definite suboptimality in allocating the entire max stack size all at once, but it's not _too_ big so maybe it's ok. It also means that the GC will walk the entire stack for pointers, including the inactive part, but Boehm is probably not capable of doing something more complex anyway. Or if it is it's wizardry.

This was apparently used for parallel linking at some point but was dummied out for mysterious reasons. Even without that, I suspect that the thread pool wouldn't need to be global.

Bike force-pushed the bytecode-to-bir branch 2 times, most recently from 4b4b90b to a02ece5 Compare July 27, 2023 13:27

Bike added 27 commits July 31, 2023 07:52

bytecode compiler: compile 0-receiving calls as such

4d097a5

rather than as calls that futz with MV. This can be used as a correctness condition, and it makes compiling to IR easier.

Don't try to print unbound structure slots

e0b162d

The unbound objects still cause all kinds of bad problems like segffaults that we absolutely do not want to be user-exposed.

bytecode FASLs: delete obsolete *compiler* variable

621fa41

Delete unused internal parameters

5137396

Refactor clasp-cleavir translate

11b2ee1

This localizes the LLVM stuff in an entry point that accepts BIR, so if we produce BIR by some other means (e.g. from bytecode) we can reuse said stuff.

Reorganize bytecode debug info class hierarchy

e7a791d

bytecode compiler: generate debug infos sorted

f5eeb50

Combined interface for all bytecode debug infos

1b88df4

Integrate bytecode-to-bir into clasp

9f66ac3

Still needs work, but it's functional.

Remove unused Closure_O field

c77c0cb

If we need to test bytecodeness, we can just look at the type of the simple fun.

bytecode to IR: simpler interface

0ee844d

Add missing IGNORE declarations

b97ef86

bytecode: declaration annotations

b1cf063

bytecode compiler: preserve some variable declarations

e883bb1

Doing optional variables etc is going to be annoying

Delete obsolete function

807f579

bytecode compiler: more precise source locations for functions

5029037

FLET might need a little more work

Oops, left in debugging code

96f6646

b-t-b compiler: more annotations

a4815b0

bytecode compiler: retain info about THE in annotations

6e8b10f

btb compiler: get function attributes

1a291ef

Make cmp:*optimize* primitive

6a76f41

The bytecode compiler will have to keep track of optimize declarations for the btb compiler.

bytecode compiler: store declarations lexically

7728800

bytecode compiler: actually use passed source info

6fe4be8

christ, i am stupid sometimes

bytecode compiler: don't record declarations redundantly

5f4783e

This cuts down on pointless annotations.

Bike added 20 commits July 31, 2023 07:52

bytecode compiler: source locations for entire function bodies

bcdb93e

btb compiler: annotations are sorted already

67cf907

By generation, and also in LTV FASLs

btb compiler: first stab at THE

2192509

btb compiler: track optimization policy

caea7c6

not gonna lie, this code is kind of ugly.

btb compiler: type checks

09d306a

bytecode compiler: Actually store function-level declarations

dd84ea3

btb compiler: handle IGNORE

aef6a8f

btb compiler: remove old kludge

413b6d7

No longer necessary given the variable annotations and DUP.

btb compiler: type declarations

e9d8573

btb compiler: hook into cl:compile

e90c789

bytecode compiler: ensure debug infos are generated ordered

fda6203

Functions being relocated means that just generating the debug infos all in the same module vector as you go can result in non monotonic positions.

bytecode compiler: maintain mv/stack state in unreachable code

542a438

This is a pretty messy solution. drop-mv never actually needs to be executed, but it lets the compiler treat unconditional jumps as not happening for the purpose of compiling unreachable code correctly. FIX ME

bytecode compiler: fix memberEq usage

89a4720

without the notnilp it always returns true, which screws a few things up. durr

bytecode compiler: record decls for rest variables

b73ed52

btb compiler: skip redundant variable

74492f8

Function names for early defun

feb8589

record and use bytecode block names from tagbody

218e305

The more important intent of this is to store information about PHIs to simplify the btb compiler.

Remove ThreadLocalState from snapshot workers

65723f8

It does not seem to be used anywhere. It adds extra memory to the threads which is then never used, since those threads never even run Lisp code. Running the constructors can also be complicated and is again unnecessary.

Delete global_thread_pool

5c14773

This was apparently used for parallel linking at some point but was dummied out for mysterious reasons. Even without that, I suspect that the thread pool wouldn't need to be global.

Bike force-pushed the bytecode-to-bir branch from a02ece5 to 5c14773 Compare July 31, 2023 11:52

Bike merged commit 6a66ec9 into main Jul 31, 2023
8 checks passed

Bike deleted the bytecode-to-bir branch August 1, 2023 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bytecode to bir #1472

Bytecode to bir #1472

Bike commented Jul 21, 2023

Bike commented Jul 28, 2023

Bytecode to bir #1472

Bytecode to bir #1472

Conversation

Bike commented Jul 21, 2023

Bike commented Jul 28, 2023