-
Notifications
You must be signed in to change notification settings - Fork 28
Notes_on_state_new_and_slots
This is motivated by the discussion here:
https://github.com/zopefoundation/persistent/pull/44
Contents
A persistent object goes through a life-cycle that typically looks like:
-
Initial creation. This involves 2 steps:
- Calling
__new__
- Calling
__init__
- Calling
-
An object is saved in the database.
-
An object is converted to a ghost and it's state is released.
-
An object is removed from memory.
-
An object is created in memory, by calling
__new__
in it's class. Data may be passed to__new__
and stored on the object, but the object is a ghost because its state hasn't been loaded yet.The data stored by
__new__
isn't state. For lack of a better word, we'll call it intrinsic data, because it's even present in ghosts. -
The object is fully loaded by calling it's
__setstate__
method with its state.
Object data is of 2 forms:
- Intrinsic data, passed to
__new__
. - State, passed to
__setstate__
(and returned from__getstate__
).
The vast majority of objects have no intrinsic data. Intrinsic data is undesirable because it's held by ghosts and takes up memory even when it's not needed.
Object data may be stored in one or more of:
- the instance dictionary,
- slots, or
- for classes with C implementations, C structures.
Sometimes, especially for small objects that have many instances, we try to avoid using instance dictionaries, because dictionaries are expensive. In these cases, we might store all of our data in slots, however, this makes object data structures less flexible and should be avoided in most cases.
There are APIs for managing persistent state with default implementations provided by the persistent base class.
-
_p_deactivate
/_p_invalidate
-
Release an object's state converting it to a ghost.
The details of how these 2 methods differ or exactly what they do isn't important. The main idea is that they release references to the object's state.
The default implementation simply clears the instance dictionary. It also clears slots unless
__new__
has been overridden. See below.BTW, it would be really nice to have an API that does nothing but release state and to define _p_deactivate and _p_invalidate to use that. :)
- __new__
-
Create an uninitialized (ghost) object.
This isn't a Persistent-specific API, but it plays an important part in data management.
Persistent
supplies a default implementation that is similar to the one provided byobject
and that doesn't set any intrinsic data. - __getstate__
-
Get an object's state.
The default version returns the contents of an object's instance dictionary and slots.
IOW, the default implementation assumes that the data in slots and the instance dictionary are all state.
- __setstate__
-
Get an object's state.
The default version expects slot and/or instance dictionary data and sets them on the instance.
This is partly because we haven't explicitly acknowledged the existence of intrinsic data up to now.
It is a historical accident that objects with intrinsic data have chosen to store this data in slots [1]. In any case, these objects relied on slots not being cleared [2]. It's possible that this behavior motivated the decision to use slots in some cases.
There's no easy way to fix this for existing objects. We may not know
where all of these objects are. One things we do know though is that
all objects with intrinsic data have custom implementations of
__new__
. If an object uses the Persistent
implementation, we
can know that it's using slots solely as a memory optimization and
that we can clear the slots when we ghostify.
If an object has a custom __new__
and has state in __slots__
,
it can override _p_invalidate and _p_deactivate to release it. (This
is harder than t should be :(. )
We might choose in the future to make the use of intrinsic data more explicit. Doing this with deprecations and such would be a lot of work. It's unclear if it would be worth the effort.
For now, we've decided to clear slots when a data is deactivated only
if it doesn't override __new__
.
[1] | This was generally because these were ported from C and slots behaved similarly to C struct members in many ways. |
[2] | Some other object implementations initialized
data in __new__ and relied on the data being initialized
later. This isn't intrinsic data. No data was passed to __new__ . |