-
-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GC docs for incremental collection. #1379
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to do this. A couple of notes:
(Also, @savannahostrowski: this PR diff is a good summary of the differences between the GC implementation you've been working with and the current devguide article, which is outdated.)
internals/garbage-collector.rst
Outdated
|
||
|
||
To collect all unreachable cycles in the heap, the garbage collector must scan the | ||
whole heap. This whole heap scan is called a cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider a different term here (and throughout below) since "cycle" already has an important meaning here and it can be confusing if we overload it:
whole heap. This whole heap scan is called a cycle. | |
whole heap. This whole heap scan is called a full collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of cycle use "full scavenge".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A collection is an increment, so that's not a good term.
Cycle is overloaded, so that's not great either.
"Scavenge" is the least ambiguous, but obscure.
Anyone have any other suggestions? I'll go with "scavenge" if not.
internals/garbage-collector.rst
Outdated
* All any objects reachable from those objects that have not yet been scanned this cycle. | ||
|
||
Any objects surviving this collection are moved to the old generation. | ||
ollection from cycles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ollection from cycles. |
internals/garbage-collector.rst
Outdated
* The oldest fraction of the old generation | ||
* All any objects reachable from those objects that have not yet been scanned this cycle. | ||
|
||
Any objects surviving this collection are moved to the old generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe clarify that the objects that started in the old generation are considered "youngest of the old" instead of "oldest of the old" now (there's probably a better way of phrasing it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't really the oldest, it is the "least recently scanned". I'll rework this section.
internals/garbage-collector.rst
Outdated
implementation for the default build uses incremental collection with two | ||
generations. | ||
|
||
The purpose of generations is to take advantage of what is known as the weak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of generations is to take advantage of what is known as the weak | |
Generational garbage collection takes advantage of what is known as the weak |
internals/garbage-collector.rst
Outdated
two generations: young and old. Every new object starts in the young generation. | ||
|
||
|
||
To collect all unreachable cycles in the heap, the garbage collector must scan the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add detection as distinct from collection.
To collect all unreachable cycles in the heap, the garbage collector must scan the | |
To detect and collect all unreachable cycles in the heap, the garbage collector must scan the |
internals/garbage-collector.rst
Outdated
To collect all unreachable cycles in the heap, the garbage collector must scan the | ||
whole heap. This whole heap scan is called a cycle. | ||
|
||
In order to limit the time each garbage collection takes, the previous algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to limit the time each garbage collection takes, the previous algorithm | |
To limit the time each garbage collection takes, the detection and collection algorithm |
internals/garbage-collector.rst
Outdated
|
||
* The young generation | ||
* The oldest fraction of the old generation | ||
* All any objects reachable from those objects that have not yet been scanned this cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* All any objects reachable from those objects that have not yet been scanned this cycle. | |
* All objects reachable from those objects that have not yet been scanned this cycle. |
internals/garbage-collector.rst
Outdated
|
||
|
||
To collect all unreachable cycles in the heap, the garbage collector must scan the | ||
whole heap. This whole heap scan is called a cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of cycle use "full scavenge".
When all objects in the heap have been scanned a cycle ends, and all objects are | ||
considered unscanned again. | ||
|
||
In order to collect all unreachable cycles, each increment must contain all of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure that I understand this section on unreachable cycles, I think you're saying that we want to ensure that we fully capture the unreachable cycle because we want to ensure that the cycle is either fully gc'd or not, to avoid partial processing, which could be problematic later on?
If so, maybe it's worth mentioning explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't scan the full cycle at once, we cannot collect it. It is otherwise safe.
The old generational collector would scan part cycles all the time; it just delayed the collection of the cycle, at worst until a full collection.
With the incremental collector, if we only scan part of a cycle, it may never be collected. Which would be a problem.
Thanks for the comments and suggestions. |
internals/garbage-collector.rst
Outdated
Survivors are moved to the back of the scanned list. The old part of increment is taken | ||
from the front of the unscanned list. | ||
|
||
When a cycle starts, no objects in the heap are considered to have been scanned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using cycle of a "reference cycle" and a "gc execution" it's a bit confusing, can we use other terminology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used "full scavenge"
internals/garbage-collector.rst
Outdated
In order to make sure that the whole of any unreachable cycle is contained in an | ||
increment, all unscanned objects reachable from any object in the increment must | ||
be included in the increment. | ||
Thus, to form a complete increment we perform a transitive closure over reachable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much as I enjoy proper terminology myself. Many people reading this document may not immediately know what a transitive closure is, could you add a brief explanation in parentheses or just explain the process?
internals/garbage-collector.rst
Outdated
|
||
* The young generation | ||
* The least recently scanned fraction of the old generation. | ||
* All objects reachable from those objects that have not yet been scanned this cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as my other comments about "cycle"
internals/garbage-collector.rst
Outdated
unscanned objects from the initial increment. | ||
We can exclude scanned objects, as they must have been reachable when scanned. | ||
If a scanned object becomes part of an unreachable cycle after being scanned, it | ||
will not be collected this cycle, but it will be collected next cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can benefit from a concrete simple example showing how this can happen and how will be eventually cleaned. Either in English or with some diagram or pseudo code
internals/garbage-collector.rst
Outdated
generations. Every collection operates on the entire heap. | ||
two generations: young and old. Every new object starts in the young generation. | ||
|
||
To detect and collect all unreachable cycles in the heap, the garbage collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use objects instead of cycles here.
To detect and collect all unreachable cycles in the heap, the garbage collector | |
To detect and collect all unreachable objects in the heap, the garbage collector |
internals/garbage-collector.rst
Outdated
of three parts: | ||
|
||
* The young generation | ||
* The least recently scanned fraction of the old generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The least recently scanned fraction of the old generation. | |
* The old generation's least recently scanned heap found in the scanned objects list |
internals/garbage-collector.rst
Outdated
|
||
* The young generation | ||
* The least recently scanned fraction of the old generation. | ||
* All objects reachable from those objects that have not yet been scanned this cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* All objects reachable from those objects that have not yet been scanned this cycle. | |
* All objects reachable from those objects that have not yet been scanned. |
internals/garbage-collector.rst
Outdated
* The least recently scanned fraction of the old generation. | ||
* All objects reachable from those objects that have not yet been scanned this cycle. | ||
|
||
Any objects surviving this collection are moved to the old generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any objects surviving this collection are moved to the old generation. | |
Any young generation objects surviving this collection are moved to the old generation, and reachable objects in the old generation remain in the old generation. |
internals/garbage-collector.rst
Outdated
* All objects reachable from those objects that have not yet been scanned this cycle. | ||
|
||
Any objects surviving this collection are moved to the old generation. | ||
The old generation is composed of two lists, scanned and unscanned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old generation is composed of two lists, scanned and unscanned. | |
The old generation is composed of two lists: scanned increments and unscanned. |
Please note the inconsistency in argument or variable names: This devguide speaks of |
internals/garbage-collector.rst
Outdated
If a scanned object becomes part of an unreachable cycle after being scanned, it | ||
will not be collected this cycle, but it will be collected next cycle. | ||
|
||
The GC implementation for the free-threaded build does not use incremental collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably should be called out in an Important block.
internals/garbage-collector.rst
Outdated
will not be collected this cycle, but it will be collected next cycle. | ||
|
||
The GC implementation for the free-threaded build does not use incremental collection. | ||
Every collection operates on the entire heap. | ||
|
||
In order to decide when to run, the collector keeps track of the number of object | ||
allocations and deallocations since the last collection. When the number of | ||
allocations minus the number of deallocations exceeds ``threshold_0``, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocations minus the number of deallocations exceeds ``threshold_0``, | |
allocations minus the number of deallocations exceeds ``threshold0``, |
internals/garbage-collector.rst
Outdated
collection starts. ``threshold_1`` determines the fraction of the old | ||
collection that is included in the increment. | ||
The fraction is inversely proportional to ``threshold_1``, | ||
as historically a larger ``threshold_1`` meant that old generation | ||
collections were performed less frequency. | ||
``threshold2`` is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct as mentioned in @mr-bronson's message. Good catch @mr-bronson. 😄
collection starts. ``threshold_1`` determines the fraction of the old | |
collection that is included in the increment. | |
The fraction is inversely proportional to ``threshold_1``, | |
as historically a larger ``threshold_1`` meant that old generation | |
collections were performed less frequency. | |
``threshold2`` is ignored. | |
collection starts. ``threshold1`` determines the fraction of the old | |
collection that is included in the increment. | |
The fraction is inversely proportional to ``threshold1``, | |
as historically a larger ``threshold1`` meant that old generation | |
collections were performed less frequency. | |
``threshold2`` is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me! Thanks @markshannon.
Let's hold until we figure out https://discuss.python.org/t/incremental-gc-and-pushing-back-the-3-13-0-release/65285 |
Garbage collection doc has been moved to the CPython repo. Unless there is an objection, I recommend closing this devguide issue. |
@@ -399,6 +439,10 @@ more information. These thresholds can be examined using the | |||
The content of these generations can be examined using the | |||
``gc.get_objects(generation=NUM)`` function and collections can be triggered | |||
specifically in a generation by calling ``gc.collect(generation=NUM)``. | |||
Prior to 3.13, there we three generations. For that reason the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to 3.13, there we three generations. For that reason the | |
Prior to 3.14, there we three generations. For that reason the |
📚 Documentation preview 📚: https://cpython-devguide--1379.org.readthedocs.build/