diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 48c893bde2a6310..2ef6e653ac19d47 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -19,6 +19,8 @@ Compiling Python Source Code - [Compiler Design](compiler.md) +- [Changing Python's Grammar](changing_grammar.md) + Runtime Objects --- diff --git a/InternalDocs/changing_grammar.md b/InternalDocs/changing_grammar.md new file mode 100644 index 000000000000000..1a5eebdc1418dca --- /dev/null +++ b/InternalDocs/changing_grammar.md @@ -0,0 +1,63 @@ +# Changing CPython's grammar + +There's more to changing Python's grammar than editing +[`Grammar/python.gram`](../Grammar/python.gram). +Below is a checklist of things that may need to change. + +> [!NOTE] +> +> Many of these changes require re-generating some of the derived +> files. If things mysteriously don't work, it may help to run +> ``make clean``. + +## Checklist + +* [`Grammar/python.gram`](../Grammar/python.gram): The grammar definition, + with actions that build AST nodes. + After changing it, run ``make regen-pegen`` (or ``build.bat --regen`` on Windows), + to regenerate [`Parser/parser.c`](../Parser/parser.c). + (This runs Python's parser generator, [`Tools/peg_generator`](../Tools/peg_generator)). + +* [`Grammar/Tokens`](../Grammar/Tokens) is a place for adding new token types. After + changing it, run ``make regen-token`` to regenerate + [`Include/internal/pycore_token.h`](../Include/internal/pycore_token.h), + [`Parser/token.c`](../Parser/token.c), [`Lib/token.py`](../Lib/token.py) + and [`Doc/library/token-list.inc`](../Doc/library/token-list.inc). + If you change both ``python.gram`` and ``Tokens``, run ``make regen-token`` + before ``make regen-pegen``. + On Windows, ``build.bat --regen`` will regenerate both at the same time. + +* [`Parser/Python.asdl`](../Parser/Python.asdl) may need changes to match the grammar. + Then run ``make regen-ast`` to regenerate + [`Include/internal/pycore_ast.h`](../Include/internal/pycore_ast.h) and + [`Python/Python-ast.c`](../Python/Python-ast.c). + +* [`Parser/lexer/`](../Parser/lexer/) contains the tokenization code. + This is where you would add a new type of comment or string literal, for example. + +* [`Python/ast.c`](../Python/ast.c) will need changes to validate AST objects + involved with the grammar change. + +* [`Python/ast_unparse.c`](../Python/ast_unparse.c) will need changes to unparse + AST involved with the grammar change ("unparsing" is used to turn annotations + into strings per [PEP 563](https://peps.python.org/pep-0563/). + +* The [`compiler`](compiler.md) may need to change when there are changes + to the `AST`. + +* ``_Unparser`` in the [`Lib/ast.py`](../Lib/ast.py) file may need changes + to accommodate any modifications in the AST nodes. + +* [`Doc/library/ast.rst`](../Doc/library/ast.rst) may need to be updated + to reflect changes to AST nodes. + +* Add some usage of your new syntax to ``test_grammar.py``. + +* Certain changes may require tweaks to the library module + [`pyclbr`](https://docs.python.org/3/library/pyclbr.html#module-pyclbr). + +* [`Lib/tokenize.py`](../Lib/tokenize.py) needs changes to match changes + to the tokenizer. + +* Documentation must be written! Specifically, one or more of the pages in + [`Doc/reference/`](../Doc/reference/) will need to be updated.