Skip to content

Commit

Permalink
edit exception handling
Browse files Browse the repository at this point in the history
  • Loading branch information
iritkatriel committed May 30, 2024
1 parent cd7725f commit a879805
Showing 1 changed file with 101 additions and 103 deletions.
204 changes: 101 additions & 103 deletions InternalDocs/exception_handling.md
Original file line number Diff line number Diff line change
@@ -1,94 +1,88 @@
Description of exception handling in Python 3.11
------------------------------------------------
Description of exception handling
---------------------------------

Python 3.11 uses what is known as "zero-cost" exception handling.
Prior to 3.11, exceptions were handled by a runtime stack of "blocks".

In zero-cost exception handling, the cost of supporting exceptions is minimized.
In the common case (where no exception is raised) the cost is reduced
to zero (or close to zero).
Python uses a technique known as "zero-cost" exception handling, which
minimizes the cost of supporting exceptions. In the common case (where
no exception is raised) the cost is reduced to zero (or close to zero).
The cost of raising an exception is increased, but not by much.

The following code:

def f():
try:
g(0)
except:
return "fail"

compiles as follows in 3.10:

2 0 SETUP_FINALLY 7 (to 16)

3 2 LOAD_GLOBAL 0 (g)
4 LOAD_CONST 1 (0)
6 CALL_NO_KW 1
8 POP_TOP
10 POP_BLOCK
12 LOAD_CONST 0 (None)
14 RETURN_VALUE

4 >> 16 POP_TOP
18 POP_TOP
20 POP_TOP

5 22 POP_EXCEPT
24 LOAD_CONST 3 ('fail')
26 RETURN_VALUE

Note the explicit instructions to push and pop from the "block" stack:
SETUP_FINALLY and POP_BLOCK.

In 3.11, the SETUP_FINALLY and POP_BLOCK are eliminated, replaced with
a table to determine where to jump to when an exception is raised.

1 0 RESUME 0

2 2 NOP

3 4 LOAD_GLOBAL 1 (g + NULL)
16 LOAD_CONST 1 (0)
18 PRECALL 1
22 CALL 1
32 POP_TOP
34 LOAD_CONST 0 (None)
36 RETURN_VALUE
>> 38 PUSH_EXC_INFO

4 40 POP_TOP

5 42 POP_EXCEPT
44 LOAD_CONST 2 ('fail')
46 RETURN_VALUE
>> 48 COPY 3
50 POP_EXCEPT
52 RERAISE 1
ExceptionTable:
4 to 32 -> 38 [0]
38 to 40 -> 48 [1] lasti

(Note this code is from 3.11, later versions may have slightly different bytecode.)

If an instruction raises an exception then its offset is used to find the target to jump to.
For example, the CALL at offset 22, falls into the range 4 to 32.
So, if g() raises an exception, then control jumps to offset 38.

<code>
try:
g(0)
except:
res = "fail"
</code>

compiles into pseudo-code like the following:

<code>
`RESUME` 0

1 `SETUP_FINALLY` 8 (to L1)

2 `LOAD_NAME` 0 (g)
`PUSH_NULL`
`LOAD_CONST` 0 (0)
`CALL` 1
`POP_TOP`
`POP_BLOCK`

-- L1: `PUSH_EXC_INFO`

3 `POP_TOP`

4 `LOAD_CONST` 1 ('fail')
`STORE_NAME` 1 (res)
</code>

The `SETUP_FINALLY` instruction specifies that henceforth, exceptions
are handled by the code at label L1. The `POP_BLOCK` instruction
reverses the effect of the last `SETUP_FINALLY`, so the exception
handler reverts to what it was before.

Note that the `SETUP_FINALLY` and `POP_BLOCK` instructions have no effect
when no exceptions are raised. The idea of zero-cost exception handling
is to replace these instructions by metadata which is stored alongside
the code, and which is inspected only when an exception occurs.
This metadata is the exception table, which is stored in the code
object's `co_exceptiontable` field.

When the pseudo-instructions are translated into bytecode, the
`SETUP_FINALLY` and `POP_BLOCK` instructions are removed, and the
exception table is constructed, mapping each instruction to the
the exception handler that covers it, if any. Instructions which
are not covered by any exception handler within the same code
object's bytecode, do not appear in the exception table at all.

For the code object in our example above, the table has a single
entry specifying that all instructions between the `SETUP_FINALLY`
and the `POP_BLOCK` are covered by the exception handler located
at label `L1`.

At runtime, when an exception occurs, the interpreted looks up
the offset of the current instruction in the exception table. If
it finds a handler, control flow transfers to it. Otherwise, the
exception bubbles up to the caller, and the caller's frame is
checked for a handler covering the `CALL` instruction. This
repeats until a handler is found or the topmost frame is reached,
and the program terminates. During unwinding, the traceback
is constructed.

Unwinding
---------

When an exception is raised, the current instruction offset is used to find following:
target to jump to, stack depth, and 'lasti', which determines whether the instruction
offset of the raising instruction should be pushed.

This information is stored in the exception table, described below.
Along with the location of an exception handler, each entry of the
exception table also contains the stack depth of the `try` instruction
and a boolean `lasti` value, which indicates whether the instruction
offset of the raising instruction should be pushed to the stack.

If there is no relevant entry, the exception bubbles up to the caller.
Handling an exception, once an exception table entry is found, consists
of the following steps:

If there is an entry, then:
1. pop values from the stack until it matches the stack depth for the handler.
2. if 'lasti' is true, then push the offset that the exception was raised at.
2. if `lasti` is true, then push the offset that the exception was raised at.
3. push the exception to the stack.
4. jump to the target offset and resume execution.

Expand All @@ -97,51 +91,51 @@ Format of the exception table
-----------------------------

Conceptually, the exception table consists of a sequence of 5-tuples:
1. start-offset (inclusive)
2. end-offset (exclusive)
3. target
4. stack-depth
5. push-lasti (boolean)
1. `start-offset` (inclusive)
2. `end-offset` (exclusive)
3. `target`
4. `stack-depth`
5. `push-lasti` (boolean)

All offsets and lengths are in instructions, not bytes.
All offsets and lengths are in code units, not bytes.

We want the format to be compact, but quickly searchable.
For it to be compact, it needs to have variable sized entries so that we can store common (small) offsets compactly, but handle large offsets if needed.
For it to be searchable quickly, we need to support binary search giving us log(n) performance in all cases.
Binary search typically assumes fixed size entries, but that is not necessary, as long as we can identify the start of an entry.

It is worth noting that the size (end-start) is always smaller than the end, so we encode the entries as:
start, size, target, depth, push-lasti
`start, size, target, depth, push-lasti`

Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each instruction takes 2 bytes.
Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each code unit takes 2 bytes.
It also happens that depth is generally quite small.

So, we need to encode:
start (up to 30 bits)
size (up to 30 bits)
target (up to 30 bits)
depth (up to ~8 bits)
lasti (1 bit)
`start` (up to 30 bits)
`size` (up to 30 bits)
`target` (up to 30 bits)
`depth` (up to ~8 bits)
`lasti` (1 bit)

We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set.
Since the most significant bit is reserved for marking the start of an entry, we have 7 bits per byte to encode offsets.
Encoding uses a standard varint encoding, but with only 7 bits instead of the usual 8.
The 8 bits of a bit are (msb left) SXdddddd where S is the start bit. X is the extend bit meaning that the next byte is required to extend the offset.
The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the extend bit meaning that the next byte is required to extend the offset.

In addition, we will combine depth and lasti into a single value, ((depth<<1)+lasti), before encoding.
In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding.

For example, the exception entry:
start: 20
end: 28
target: 100
depth: 3
lasti: False
`start`: 20
`end`: 28
`target`: 100
`depth`: 3
`lasti`: False

is encoded first by converting to the more compact four value form:
start: 20
size: 8
target: 100
depth<<1+lasti: 6
`start`: 20
`size`: 8
`target`: 100
`depth<<1+lasti`: 6

which is then encoded as:
148 (MSB + 20 for start)
Expand All @@ -157,6 +151,7 @@ for a total of five bytes.
Script to parse the exception table
-----------------------------------

<code>
def parse_varint(iterator):
b = next(iterator)
val = b & 63
Expand All @@ -165,7 +160,9 @@ def parse_varint(iterator):
b = next(iterator)
val |= b&63
return val
</code>

<code>
def parse_exception_table(code):
iterator = iter(code.co_exceptiontable)
try:
Expand All @@ -180,3 +177,4 @@ def parse_exception_table(code):
yield start, end, target, depth, lasti
except StopIteration:
return
</code>

0 comments on commit a879805

Please sign in to comment.