From 81a0e13f362d1cc0ac441399c2a5befb5f039e5e Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Thu, 30 May 2024 19:23:24 +0100 Subject: [PATCH 01/10] rename index->README and move exception handling doc to new folder --- InternalDocs/{index.md => README.md} | 0 .../exception_handling.md | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename InternalDocs/{index.md => README.md} (100%) rename Objects/exception_handling_notes.txt => InternalDocs/exception_handling.md (100%) diff --git a/InternalDocs/index.md b/InternalDocs/README.md similarity index 100% rename from InternalDocs/index.md rename to InternalDocs/README.md diff --git a/Objects/exception_handling_notes.txt b/InternalDocs/exception_handling.md similarity index 100% rename from Objects/exception_handling_notes.txt rename to InternalDocs/exception_handling.md From cd7725fe91b307f5403db9b843df81e2643b1ee3 Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Thu, 30 May 2024 19:23:54 +0100 Subject: [PATCH 02/10] add link to exception handling file from index --- InternalDocs/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 32b66a254bcf2c..e69e27d1542990 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -10,3 +10,7 @@ to hold for other implementations of the Python language. The core dev team attempts to keep this documentation up to date. If it is not, please report that through the [issue tracker](https://github.com/python/cpython/issues). + + +[Exception Handling](exception_handling.md) + From 7719234e2607bf0621f4b898aa7caeab42b2ddf5 Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Fri, 31 May 2024 00:29:51 +0100 Subject: [PATCH 03/10] edit exception handling --- InternalDocs/exception_handling.md | 204 ++++++++++++++--------------- 1 file changed, 101 insertions(+), 103 deletions(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index 387ef935ce739e..a6deefa0e567c2 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -1,94 +1,88 @@ -Description of exception handling in Python 3.11 ------------------------------------------------- +Description of exception handling +--------------------------------- -Python 3.11 uses what is known as "zero-cost" exception handling. -Prior to 3.11, exceptions were handled by a runtime stack of "blocks". - -In zero-cost exception handling, the cost of supporting exceptions is minimized. -In the common case (where no exception is raised) the cost is reduced -to zero (or close to zero). +Python uses a technique known as "zero-cost" exception handling, which +minimizes the cost of supporting exceptions. In the common case (where +no exception is raised) the cost is reduced to zero (or close to zero). The cost of raising an exception is increased, but not by much. The following code: -def f(): - try: - g(0) - except: - return "fail" - -compiles as follows in 3.10: - - 2 0 SETUP_FINALLY 7 (to 16) - - 3 2 LOAD_GLOBAL 0 (g) - 4 LOAD_CONST 1 (0) - 6 CALL_NO_KW 1 - 8 POP_TOP - 10 POP_BLOCK - 12 LOAD_CONST 0 (None) - 14 RETURN_VALUE - - 4 >> 16 POP_TOP - 18 POP_TOP - 20 POP_TOP - - 5 22 POP_EXCEPT - 24 LOAD_CONST 3 ('fail') - 26 RETURN_VALUE - -Note the explicit instructions to push and pop from the "block" stack: -SETUP_FINALLY and POP_BLOCK. - -In 3.11, the SETUP_FINALLY and POP_BLOCK are eliminated, replaced with -a table to determine where to jump to when an exception is raised. - - 1 0 RESUME 0 - - 2 2 NOP - - 3 4 LOAD_GLOBAL 1 (g + NULL) - 16 LOAD_CONST 1 (0) - 18 PRECALL 1 - 22 CALL 1 - 32 POP_TOP - 34 LOAD_CONST 0 (None) - 36 RETURN_VALUE - >> 38 PUSH_EXC_INFO - - 4 40 POP_TOP - - 5 42 POP_EXCEPT - 44 LOAD_CONST 2 ('fail') - 46 RETURN_VALUE - >> 48 COPY 3 - 50 POP_EXCEPT - 52 RERAISE 1 -ExceptionTable: - 4 to 32 -> 38 [0] - 38 to 40 -> 48 [1] lasti - -(Note this code is from 3.11, later versions may have slightly different bytecode.) - -If an instruction raises an exception then its offset is used to find the target to jump to. -For example, the CALL at offset 22, falls into the range 4 to 32. -So, if g() raises an exception, then control jumps to offset 38. - + +try: + g(0) +except: + res = "fail" + + +compiles into pseudo-code like the following: + +``` + `RESUME` 0 + + 1 `SETUP_FINALLY` 8 (to L1) + + 2 `LOAD_NAME` 0 (g) + `PUSH_NULL` + `LOAD_CONST` 0 (0) + `CALL` 1 + `POP_TOP` + `POP_BLOCK` + + -- L1: `PUSH_EXC_INFO` + + 3 `POP_TOP` + + 4 `LOAD_CONST` 1 ('fail') + `STORE_NAME` 1 (res) +``` + +The `SETUP_FINALLY` instruction specifies that henceforth, exceptions +are handled by the code at label L1. The `POP_BLOCK` instruction +reverses the effect of the last `SETUP_FINALLY`, so the exception +handler reverts to what it was before. + +Note that the `SETUP_FINALLY` and `POP_BLOCK` instructions have no effect +when no exceptions are raised. The idea of zero-cost exception handling +is to replace these instructions by metadata which is stored alongside +the code, and which is inspected only when an exception occurs. +This metadata is the exception table, which is stored in the code +object's `co_exceptiontable` field. + +When the pseudo-instructions are translated into bytecode, the +`SETUP_FINALLY` and `POP_BLOCK` instructions are removed, and the +exception table is constructed, mapping each instruction to the +the exception handler that covers it, if any. Instructions which +are not covered by any exception handler within the same code +object's bytecode, do not appear in the exception table at all. + +For the code object in our example above, the table has a single +entry specifying that all instructions between the `SETUP_FINALLY` +and the `POP_BLOCK` are covered by the exception handler located +at label `L1`. + +At runtime, when an exception occurs, the interpreted looks up +the offset of the current instruction in the exception table. If +it finds a handler, control flow transfers to it. Otherwise, the +exception bubbles up to the caller, and the caller's frame is +checked for a handler covering the `CALL` instruction. This +repeats until a handler is found or the topmost frame is reached, +and the program terminates. During unwinding, the traceback +is constructed. Unwinding --------- -When an exception is raised, the current instruction offset is used to find following: -target to jump to, stack depth, and 'lasti', which determines whether the instruction -offset of the raising instruction should be pushed. - -This information is stored in the exception table, described below. +Along with the location of an exception handler, each entry of the +exception table also contains the stack depth of the `try` instruction +and a boolean `lasti` value, which indicates whether the instruction +offset of the raising instruction should be pushed to the stack. -If there is no relevant entry, the exception bubbles up to the caller. +Handling an exception, once an exception table entry is found, consists +of the following steps: -If there is an entry, then: 1. pop values from the stack until it matches the stack depth for the handler. - 2. if 'lasti' is true, then push the offset that the exception was raised at. + 2. if `lasti` is true, then push the offset that the exception was raised at. 3. push the exception to the stack. 4. jump to the target offset and resume execution. @@ -97,13 +91,13 @@ Format of the exception table ----------------------------- Conceptually, the exception table consists of a sequence of 5-tuples: - 1. start-offset (inclusive) - 2. end-offset (exclusive) - 3. target - 4. stack-depth - 5. push-lasti (boolean) + 1. `start-offset` (inclusive) + 2. `end-offset` (exclusive) + 3. `target` + 4. `stack-depth` + 5. `push-lasti` (boolean) -All offsets and lengths are in instructions, not bytes. +All offsets and lengths are in code units, not bytes. We want the format to be compact, but quickly searchable. For it to be compact, it needs to have variable sized entries so that we can store common (small) offsets compactly, but handle large offsets if needed. @@ -111,37 +105,37 @@ For it to be searchable quickly, we need to support binary search giving us log( Binary search typically assumes fixed size entries, but that is not necessary, as long as we can identify the start of an entry. It is worth noting that the size (end-start) is always smaller than the end, so we encode the entries as: - start, size, target, depth, push-lasti + `start, size, target, depth, push-lasti` -Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each instruction takes 2 bytes. +Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each code unit takes 2 bytes. It also happens that depth is generally quite small. So, we need to encode: - start (up to 30 bits) - size (up to 30 bits) - target (up to 30 bits) - depth (up to ~8 bits) - lasti (1 bit) + `start` (up to 30 bits) + `size` (up to 30 bits) + `target` (up to 30 bits) + `depth` (up to ~8 bits) + `lasti` (1 bit) We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set. Since the most significant bit is reserved for marking the start of an entry, we have 7 bits per byte to encode offsets. Encoding uses a standard varint encoding, but with only 7 bits instead of the usual 8. -The 8 bits of a bit are (msb left) SXdddddd where S is the start bit. X is the extend bit meaning that the next byte is required to extend the offset. +The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the extend bit meaning that the next byte is required to extend the offset. -In addition, we will combine depth and lasti into a single value, ((depth<<1)+lasti), before encoding. +In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding. For example, the exception entry: - start: 20 - end: 28 - target: 100 - depth: 3 - lasti: False + `start`: 20 + `end`: 28 + `target`: 100 + `depth`: 3 + `lasti`: False is encoded first by converting to the more compact four value form: - start: 20 - size: 8 - target: 100 - depth<<1+lasti: 6 + `start`: 20 + `size`: 8 + `target`: 100 + `depth<<1+lasti`: 6 which is then encoded as: 148 (MSB + 20 for start) @@ -157,6 +151,7 @@ for a total of five bytes. Script to parse the exception table ----------------------------------- +``` def parse_varint(iterator): b = next(iterator) val = b & 63 @@ -165,7 +160,9 @@ def parse_varint(iterator): b = next(iterator) val |= b&63 return val +``` +``` def parse_exception_table(code): iterator = iter(code.co_exceptiontable) try: @@ -180,3 +177,4 @@ def parse_exception_table(code): yield start, end, target, depth, lasti except StopIteration: return +``` From 023ad9d90949f6bc18ca85ee1886450c2af0c796 Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Fri, 31 May 2024 00:49:17 +0100 Subject: [PATCH 04/10] whitespace --- InternalDocs/exception_handling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index a6deefa0e567c2..dd7f8a4ca51f70 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -13,6 +13,7 @@ try: g(0) except: res = "fail" + compiles into pseudo-code like the following: @@ -161,7 +162,6 @@ def parse_varint(iterator): val |= b&63 return val ``` - ``` def parse_exception_table(code): iterator = iter(code.co_exceptiontable) From fc5d21701ee54571433a73e6ec80295b20bfb8b5 Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Fri, 31 May 2024 00:51:43 +0100 Subject: [PATCH 05/10] formatting --- InternalDocs/exception_handling.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index dd7f8a4ca51f70..f47e0d3bfddf3d 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -91,12 +91,14 @@ of the following steps: Format of the exception table ----------------------------- +``` Conceptually, the exception table consists of a sequence of 5-tuples: 1. `start-offset` (inclusive) 2. `end-offset` (exclusive) 3. `target` 4. `stack-depth` 5. `push-lasti` (boolean) +``` All offsets and lengths are in code units, not bytes. From 3436cdf7d7ee0f003fa22cde38620ecfc6e0390b Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Fri, 31 May 2024 10:36:31 +0100 Subject: [PATCH 06/10] formatting --- InternalDocs/exception_handling.md | 36 +++++++++++++++++------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index f47e0d3bfddf3d..67be3f0c7c31d3 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -8,34 +8,34 @@ The cost of raising an exception is increased, but not by much. The following code: - +``` try: g(0) except: res = "fail" - +``` compiles into pseudo-code like the following: ``` - `RESUME` 0 + RESUME 0 - 1 `SETUP_FINALLY` 8 (to L1) + 1 SETUP_FINALLY 8 (to L1) - 2 `LOAD_NAME` 0 (g) - `PUSH_NULL` - `LOAD_CONST` 0 (0) - `CALL` 1 - `POP_TOP` - `POP_BLOCK` + 2 LOAD_NAME 0 (g) + PUSH_NULL + LOAD_CONST 0 (0) + CALL 1 + POP_TOP + POP_BLOCK - -- L1: `PUSH_EXC_INFO` + -- L1: PUSH_EXC_INFO - 3 `POP_TOP` + 3 POP_TOP - 4 `LOAD_CONST` 1 ('fail') - `STORE_NAME` 1 (res) + 4 LOAD_CONST 1 ('fail') + STORE_NAME 1 (res) ``` The `SETUP_FINALLY` instruction specifies that henceforth, exceptions @@ -91,8 +91,8 @@ of the following steps: Format of the exception table ----------------------------- -``` Conceptually, the exception table consists of a sequence of 5-tuples: +``` 1. `start-offset` (inclusive) 2. `end-offset` (exclusive) 3. `target` @@ -108,7 +108,7 @@ For it to be searchable quickly, we need to support binary search giving us log( Binary search typically assumes fixed size entries, but that is not necessary, as long as we can identify the start of an entry. It is worth noting that the size (end-start) is always smaller than the end, so we encode the entries as: - `start, size, target, depth, push-lasti` + `start, size, target, depth, push-lasti`. Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each code unit takes 2 bytes. It also happens that depth is generally quite small. @@ -128,17 +128,21 @@ The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding. For example, the exception entry: +``` `start`: 20 `end`: 28 `target`: 100 `depth`: 3 `lasti`: False +``` is encoded first by converting to the more compact four value form: +``` `start`: 20 `size`: 8 `target`: 100 `depth<<1+lasti`: 6 +``` which is then encoded as: 148 (MSB + 20 for start) From 49af1931edee11a78015bc88c3777c522abc9cc1 Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Fri, 31 May 2024 11:57:28 +0100 Subject: [PATCH 07/10] add explanation of reraise and lasti. fix typos --- InternalDocs/exception_handling.md | 36 +++++++++++++++++++----------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index 67be3f0c7c31d3..31cea50912756f 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -40,20 +40,20 @@ compiles into pseudo-code like the following: The `SETUP_FINALLY` instruction specifies that henceforth, exceptions are handled by the code at label L1. The `POP_BLOCK` instruction -reverses the effect of the last `SETUP_FINALLY`, so the exception -handler reverts to what it was before. +reverses the effect of the last `SETUP` instruction, so that the +active exception handler reverts to what it was before. Note that the `SETUP_FINALLY` and `POP_BLOCK` instructions have no effect when no exceptions are raised. The idea of zero-cost exception handling is to replace these instructions by metadata which is stored alongside -the code, and which is inspected only when an exception occurs. -This metadata is the exception table, which is stored in the code +the bytecode, and which is inspected only when an exception occurs. +This metadata is the exception table, and it is stored in the code object's `co_exceptiontable` field. When the pseudo-instructions are translated into bytecode, the `SETUP_FINALLY` and `POP_BLOCK` instructions are removed, and the exception table is constructed, mapping each instruction to the -the exception handler that covers it, if any. Instructions which +exception handler that covers it, if any. Instructions which are not covered by any exception handler within the same code object's bytecode, do not appear in the exception table at all. @@ -62,17 +62,17 @@ entry specifying that all instructions between the `SETUP_FINALLY` and the `POP_BLOCK` are covered by the exception handler located at label `L1`. -At runtime, when an exception occurs, the interpreted looks up +Handling Exceptions +------------------- + +At runtime, when an exception occurs, the interpreter looks up the offset of the current instruction in the exception table. If it finds a handler, control flow transfers to it. Otherwise, the exception bubbles up to the caller, and the caller's frame is checked for a handler covering the `CALL` instruction. This -repeats until a handler is found or the topmost frame is reached, -and the program terminates. During unwinding, the traceback -is constructed. - -Unwinding ---------- +repeats until a handler is found or the topmost frame is reached. +If no handler is found, the program terminates. During unwinding, +the traceback is constructed as each frame is added to it. Along with the location of an exception handler, each entry of the exception table also contains the stack depth of the `try` instruction @@ -88,6 +88,16 @@ of the following steps: 4. jump to the target offset and resume execution. +Reraising Exceptions and `lasti` +-------------------------------- + +The purpose of pushing `lasti` to the stack is for cases where an exception +needs to be re-raised, and be associated with the original instruction that +raised it. This happens, for example, at the end of a `finally` block, when +any in-flight exception needs to be propagated on. As the frame's instruction +pointer now points into the finally block, a `RERAISE` instruction +(with `oparg > 0`) sets it to the `lasti` value from the stack. + Format of the exception table ----------------------------- @@ -136,7 +146,7 @@ For example, the exception entry: `lasti`: False ``` -is encoded first by converting to the more compact four value form: +is encoded by first converting to the more compact four value form: ``` `start`: 20 `size`: 8 From d299a3ef2a251af90bb5615fc79d5f2005689856 Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Fri, 31 May 2024 17:41:59 +0100 Subject: [PATCH 08/10] address review - clarify pseudo-instructions/intermediate code --- InternalDocs/exception_handling.md | 32 ++++++++++++++++-------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index 31cea50912756f..60ee5e4ef5a7d0 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -16,7 +16,7 @@ except: ``` -compiles into pseudo-code like the following: +compiles into intermediate code like the following: ``` RESUME 0 @@ -38,29 +38,31 @@ compiles into pseudo-code like the following: STORE_NAME 1 (res) ``` -The `SETUP_FINALLY` instruction specifies that henceforth, exceptions +`SETUP_FINALLY` and `POP_BLOCK` are pseudo-instruction. This means +that they can appear in intermediate code but they are not bytecode +instructions. `SETUP_FINALLY` specifies that henceforth, exceptions are handled by the code at label L1. The `POP_BLOCK` instruction reverses the effect of the last `SETUP` instruction, so that the active exception handler reverts to what it was before. -Note that the `SETUP_FINALLY` and `POP_BLOCK` instructions have no effect -when no exceptions are raised. The idea of zero-cost exception handling -is to replace these instructions by metadata which is stored alongside -the bytecode, and which is inspected only when an exception occurs. +`SETUP_FINALLY` and `POP_BLOCK` have no effect when no exceptions +are raised. The idea of zero-cost exception handling is to replace +these pseudo-instructions by metadata which is stored alongside the +bytecode, and which is inspected only when an exception occurs. This metadata is the exception table, and it is stored in the code object's `co_exceptiontable` field. -When the pseudo-instructions are translated into bytecode, the -`SETUP_FINALLY` and `POP_BLOCK` instructions are removed, and the -exception table is constructed, mapping each instruction to the -exception handler that covers it, if any. Instructions which -are not covered by any exception handler within the same code -object's bytecode, do not appear in the exception table at all. +When the pseudo-instructions are translated into bytecode, +`SETUP_FINALLY` and `POP_BLOCK` are removed, and the exception +table is constructed, mapping each instruction to the exception +handler that covers it, if any. Instructions which are not +covered by any exception handler within the same code object's +bytecode, do not appear in the exception table at all. For the code object in our example above, the table has a single -entry specifying that all instructions between the `SETUP_FINALLY` -and the `POP_BLOCK` are covered by the exception handler located -at label `L1`. +entry specifying that all instructions that were between the +`SETUP_FINALLY` and the `POP_BLOCK` are covered by the exception +handler located at label `L1`. Handling Exceptions ------------------- From e93a253d93e99c50e9e3e14cd64b8206b1b1679f Mon Sep 17 00:00:00 2001 From: Irit Katriel Date: Fri, 31 May 2024 17:44:57 +0100 Subject: [PATCH 09/10] fix markup --- InternalDocs/exception_handling.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index 60ee5e4ef5a7d0..fdfc1cf7d53155 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -126,11 +126,13 @@ Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each It also happens that depth is generally quite small. So, we need to encode: +``` `start` (up to 30 bits) `size` (up to 30 bits) `target` (up to 30 bits) `depth` (up to ~8 bits) `lasti` (1 bit) +``` We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set. Since the most significant bit is reserved for marking the start of an entry, we have 7 bits per byte to encode offsets. @@ -157,16 +159,17 @@ is encoded by first converting to the more compact four value form: ``` which is then encoded as: +``` 148 (MSB + 20 for start) 8 (size) 65 (Extend bit + 1) 36 (Remainder of target, 100 == (1<<6)+36) 6 +``` for a total of five bytes. - Script to parse the exception table ----------------------------------- From a3da5172d39ff0d25ed2e840cbdd5b0c1284f119 Mon Sep 17 00:00:00 2001 From: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> Date: Mon, 3 Jun 2024 10:11:06 +0100 Subject: [PATCH 10/10] typo Co-authored-by: Mark Shannon --- InternalDocs/exception_handling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md index fdfc1cf7d53155..22d9c3bf7933f1 100644 --- a/InternalDocs/exception_handling.md +++ b/InternalDocs/exception_handling.md @@ -38,7 +38,7 @@ compiles into intermediate code like the following: STORE_NAME 1 (res) ``` -`SETUP_FINALLY` and `POP_BLOCK` are pseudo-instruction. This means +`SETUP_FINALLY` and `POP_BLOCK` are pseudo-instructions. This means that they can appear in intermediate code but they are not bytecode instructions. `SETUP_FINALLY` specifies that henceforth, exceptions are handled by the code at label L1. The `POP_BLOCK` instruction