Skip to content

Notes_on_state_new_and_slots

Jim Fulton edited this page Nov 29, 2016 · 2 revisions

Notes on persistent-object state, __new__ and slots (and C :))

This is motivated by the discussion here:

https://github.com/zopefoundation/persistent/pull/44

A persistent object goes through a life-cycle that typically looks like:

  • Initial creation. This involves 2 steps:

    • Calling __new__
    • Calling __init__
  • An object is saved in the database.

  • An object is converted to a ghost and it's state is released.

  • An object is removed from memory.

  • An object is created in memory, by calling __new__ in it's class. Data may be passed to __new__ and stored on the object, but the object is a ghost because its state hasn't been loaded yet.

    The data stored by __new__ isn't state. For lack of a better word, we'll call it intrinsic data, because it's even present in ghosts.

  • The object is fully loaded by calling it's __setstate__ method with its state.

Object data is of 2 forms:

  • Intrinsic data, passed to __new__.
  • State, passed to __setstate__ (and returned from __getstate__).

The vast majority of objects have no intrinsic data. Intrinsic data is undesirable because it's held by ghosts and takes up memory even when it's not needed.

Object data may be stored in one or more of:

  • the instance dictionary,
  • slots, or
  • for classes with C implementations, C structures.

Sometimes, especially for small objects that have many instances, we try to avoid using instance dictionaries, because dictionaries are expensive. In these cases, we might store all of our data in slots, however, this makes object data structures less flexible and should be avoided in most cases.

There are APIs for managing persistent state with default implementations provided by the persistent base class.

_p_deactivate/_p_invalidate

Release an object's state converting it to a ghost.

The details of how these 2 methods differ or exactly what they do isn't important. The main idea is that they release references to the object's state.

The default implementation simply clears the instance dictionary. It also clears slots unless __new__ has been overridden. See below.

BTW, it would be really nice to have an API that does nothing but release state and to define _p_deactivate and _p_invalidate to use that. :)

__new__

Create an uninitialized (ghost) object.

This isn't a Persistent-specific API, but it plays an important part in data management. Persistent supplies a default implementation that is similar to the one provided by object and that doesn't set any intrinsic data.

__getstate__

Get an object's state.

The default version returns the contents of an object's instance dictionary and slots.

IOW, the default implementation assumes that the data in slots and the instance dictionary are all state.

__setstate__

Get an object's state.

The default version expects slot and/or instance dictionary data and sets them on the instance.

This is partly because we haven't explicitly acknowledged the existence of intrinsic data up to now.

It is a historical accident that objects with intrinsic data have chosen to store this data in slots [1]. In any case, these objects relied on slots not being cleared [2]. It's possible that this behavior motivated the decision to use slots in some cases.

There's no easy way to fix this for existing objects. We may not know where all of these objects are. One things we do know though is that all objects with intrinsic data have custom implementations of __new__. If an object uses the Persistent implementation, we can know that it's using slots solely as a memory optimization and that we can clear the slots when we ghostify.

If an object has a custom __new__ and has state in __slots__, it can override _p_invalidate and _p_deactivate to release it. (This is harder than t should be :(. )

We might choose in the future to make the use of intrinsic data more explicit. Doing this with deprecations and such would be a lot of work. It's unclear if it would be worth the effort.

For now, we've decided to clear slots when a data is deactivated only if it doesn't override __new__.

[1] This was generally because these were ported from C and slots behaved similarly to C struct members in many ways.
[2] Some other object implementations initialized data in __new__ and relied on the data being initialized later. This isn't intrinsic data. No data was passed to __new__.