Skip to content

Latest commit

 

History

History
343 lines (282 loc) · 12.5 KB

README.md

File metadata and controls

343 lines (282 loc) · 12.5 KB

config_state

CI Status

The python language is a flexible language often used as an interface to manipulate high performance libraries coded in less flexible native languages like C/C++. ConfigState is this idea applied on an higher level in the hierarchy, it provides a frame to bridge human-readable configuration languages (e.g. json or yaml) with python.

With ConfigState one can configure a complex hierarchy of python classes and instantiate them using a single configuration file. To avoid pitfalls and enhance the developer's experience, ConfigState provides a frame preventing inconsistencies and raising explicit explanation in failing situations. The performance is optimized for low runtime overhead, most of the logic is done during the class definition.

The ConfigState class

The core component is the class ConfigState that defines a pattern to represent python classes with two distinctive set of attributes: a set of immutable configuration values and a set of mutable state values.

The configuration is set upon initialization and is passed through the constructor. Once initialized, the configuration is frozen and cannot change. The state variables constitute the mutable state of the instance and can be updated throughout its lifetime. The configuration and state variables are meant to represent the necessary and sufficient information required to clone the object's instance. They can be used to save and restore the object from disk.

  • The configuration fields are defined using ConfigField class attributes. They can have typing constraints and be provided with a factory method for building complex types out of simpler/built-in ones.
  • State variables are defined using StateVar attributes within the constructor. They can alternatively be defined as class properties using @stateproperty if random logic execution is needed upon accession/modification.

Implementing a class inheriting from ConfigState as parent offers the following benefits:

  • Provides clear semantic separation between the static configuration values and the mutable state variables.
  • Configuration values and state variables are accessible through pythonic syntax and benefit from the IDE's type hinting feature.
  • Using a configuration file, one can instantiate a complex hierarchy of python classes. A config field may be another ConfigState object allowing to define tree-like structured ConfigState hierarchies.
  • A config field can be a reference to a nested ConfigState object's config field. This allows coupling between config fields. For example, configuration of a log folder path can be injected into the nested ConfigState objects through the configuration of the topmost ConfigState object.
  • ConfigState objects can be serialized/deserialized into/from a stream. They are pickleable and in some cases jsonable.

Basic usage

from pathlib import Path

from config_state import ConfigField
from config_state import ConfigState
from config_state import StateVar
import numpy as np

class Foo(ConfigState):
    learning_rate: float = ConfigField(0.1, 'The learning rate', force_type=True)
    license_key: str = ConfigField(None, 'License key', required=True)
    log_dir: Path = ConfigField('./', 'Path to a folder', type=Path)

    def __init__(self, config=None):
        super().__init__(config=config)
        self.weights: np.ndarray = StateVar(np.random.random((10, 10)),
                                            'The weights of the model')
        self.iteration: int = StateVar(0, 'Training iterations')

We can instantiate a ConfigState with a dictionary (that may have been obtained from loading a json or yaml file):

conf = {
        'learning_rate': 0.1,
        'license_key': 'ID123',
        'log_dir': 'logs/'
    }
foo = Foo(conf)

The configuration of foo can be summarized:

print(foo.config_summary())

Output:

learning_rate: 0.1
license_key: ID123
log_dir: logs

Values are accessible with pythonic syntax (the IDE should be able to perform type hinting and code completion):

assert isinstance(foo.learning_rate, float)
assert foo.learning_rate == 0.1

Config values are immutable:

foo.learning_rate = 0.2 # Not OK, raises 'AttributeError: Updating a conf field is forbidden'

But changing a state variable is ok:

foo.iteration += 1 # Ok, state variable

Missing required fields raises an exception:

conf = {
        'learning_rate': 0.1,
        'log_dir': 'logs/'
    }
foo = Foo(conf) # ConfigError: Configuring 'Foo': Those required fields have not been specified {'license_key'}

Configuring invalid fields raise an exception:

conf = {
        'color': 'red',
        'license_key': 'ID123'
    }
foo = Foo(conf) # ConfigError: Configuring 'Foo': Trying to update the conf field 'color' which has not been defined

Configuring with an invalid type raise an exception:

conf = {
        'learning_rate': '0.1',
        'license_key': 'ID123'
    }
foo = Foo(conf) # ConfigError: Configuring 'Foo': Value `0.1` of type `<class 'str'>` is not compatible with specified type `float`

State property

A state variable can be defined using properties with the @stateproperty decorator, this is convenient in case some logic need to be run while accessing or setting the variable.

from config_state import ConfigState
from config_state import stateproperty

import numpy as np

class Model(ConfigState):
    def __init__(self, config):
        super().__init__(config)
        self._weights: np.ndarray = np.random.random((10, 10))

    @stateproperty
    def weights(self) -> np.ndarray:
        '''Weights of the model'''
        return self._weights

    @weights.setter
    def weights(self, val):
        self._weights = val

Serialization

ConfigState objects are serializable if their config and state variables are serializable too. The state of an object is considered to be entirely encapsulated within the config values and the state variables. The state can be obtained with foo.get_state() which returns an ObjectState instance. Those objects represent the serialized information of a ConfigState object.

import pickle

pickle.dump(foo, open('foo.pkl', 'wb'))
foo2 = pickle.load(open('foo.pkl', 'rb'))

In some cases, ConfigState objects are json serializable:

from config_state.serializers import Json

class JsonableFoo(ConfigState):
    log_dir: str = ConfigField('log_dir/', 'Path to output folder')
    learning_rate: float = ConfigField(0.1, 'The learning rate')

    def __init__(self, config=None):
        super().__init__(config=config)
        self.iteration = StateVar(0, 'Training iterations')

foo = JsonableFoo()
# saving
Json().save(foo, 'foo.json')

# loading
foo = Json().load('foo.json')

Content of foo.json:

{
  "type": "__main__.JsonableFoo",
  "config": {
    "__VERSION__": {
      "value": 1.0,
      "doc": "ConfigState protocol's version",
      "type": "builtins.float"
    },
    "log_dir": {
      "value": "log_dir/",
      "doc": "Path to output folder.",
      "type": "builtins.str"
    },
    "learning_rate": {
      "value": 0.1,
      "doc": "The learning rate",
      "type": "builtins.float"
    }
  },
  "internal_state": {
    "iteration": {
      "value": 0,
      "doc": "Training iterations",
      "type": "builtins.int"
    }
  }
}

Pickle and Json serializers are available as plugin:

serializer = Serializer({'class': 'Pickle'})
serializer.save(foo, 'foo.pkl')

Config field factory

Implicit factory

If a ConfigField has a specified type but the type of the provided value is different, type is used as an implicit factory by calling type(value). This is useful for nested ConfigState objects:

class NestedFoo(ConfigState):
    license_key: str = ConfigField(type=str, required=True)
    foo: Foo = ConfigField(type=Foo,
                           doc='A ConfigState as config field',
                           required=True)
conf = {
    'license_key': '4321',
    'foo': {
        'learning_rate': 0.1,
        'license_key': 'ID123',
        'log_dir': 'logs/'
    }
}
nested_foo = NestedFoo(conf) # Ok, nested_foo.foo is instantiated using conf['foo']
isinstance(nested_foo.foo, Foo) # True

Explicit factory

A factory can be explicitly provided through a callable:

from datetime import datetime

def date_factory(str_date):
    return datetime.strptime(str_date, '%Y-%m-%d %H:%M:%S')

class DateFoo(ConfigState):
    date: datetime = ConfigField(value='2019-01-01 00:00:00', type=datetime,
                                 doc='some date',
                                 factory=date_factory)

date_foo = DateFoo({'date': '2021-04-28 00:00:00'})
print(type(date_foo.date)) # <class 'datetime.datetime'>

Deferred config fields

It may happen that the full configuration of an object is not known at the time of its instantiation. In such case it is possible to defer their specification at a later time using Ellipsis :

foo = Foo({'license_key': ...})

foo.license_key is Ellipsis # True

foo.license_key = 1337 # ok, we can update an Ellipsis

foo.license_key = 42 # Not OK, raises 'AttributeError: Updating a conf field is forbidden'

# Note: For convenience with configs defined within json or yaml files, strings '...' are interpreted as Ellipsis:
foo = Foo({'license_key': str('...')})

foo.license_key is Ellipsis # True

Reference fields

A ConfigField can be references to fields in nested ConfigState fields simplifying the configuration of complex hierarchies:

class FooWithRef(ConfigState):
    foo: Foo = ConfigField(type=Foo) # a nested ConfigState
    license_key = ConfigField(foo.license_key) # Reference to a nested field

# FooWithRef.license_key is a reference to FooWithRef.foo.license_key
# allowing to simplify the configuration, instead of:
FooWithRef({'foo':  {'license_key':  'ABC123'}})

# we can do:
foo_with_ref = FooWithRef({'license_key':  'ABC123'})

foo_with_ref.license_key ==  'ABC123'  # True
foo_with_ref.foo.license_key ==  'ABC123'  # True
foo_with_ref.foo.license_key is foo_with_ref.license_key # True

A reference can point to another reference:

class FooWithRef2(ConfigState):
    foo_with_ref: FooWithRef = ConfigField(type=FooWithRef)
    license_key = ConfigField(foo_with_ref.license_key) # It is a reference to another reference

foo = FooWithRef2({'license_key': 'ABC123'})

foo.foo_with_ref.license_key is foo.license_key # True
foo.foo_with_ref.foo.license_key is foo.license_key # True
foo.license_key == 'ABC123' # True

A reference can point to multiple fields using list or tuples:

class SubFooWithMultiRef(ConfigState):
    foo1: Foo = ConfigField(type=Foo)
    foo2: Foo = ConfigField(type=Foo)
    license_key = ConfigField([foo1.license_key, foo2.license_key]) # Reference to foo1 and foo2's license_key field

# Now instead of:
conf = {'foo1': {'license_key': 'ABC123'}, 'foo2': {'license_key': 'ABC123'}}
SubFooWithMultiRef(conf)

# One can simply do:
foo = SubFooWithMultiRef({'license_key': 'ABC123'})

foo.license_key == 'ABC123' # True
foo.foo1.license_key is foo.license_key # True
foo.foo2.license_key is foo.license_key # True

Plugins management

A ConfigState class can be decorated with @builder, this registers the class as a builder, this allows its sub classes to be decorated with @register, that registers them as plugins and enable their instantiation using the builder parent.

from config_state import builder
from config_state import register

@builder
class ColoredFoo(ConfigState):
    color: str = ConfigField(None, "Color", static=True)
    value: int = ConfigField(type=int, doc="Value")

@register
class RedFoo(ColoredFoo):
    color: str = ConfigField("Red", "Color", static=True)

@register
class BlueFoo(ColoredFoo):
    color: str = ConfigField("Blue", "Color", static=True)

colored_foo = ColoredFoo({'class': 'BlueFoo', 'value': 1})

print(type(colored_foo)) # <class '__main__.BlueFoo'>
print(colored_foo.color) # Blue
print(colored_foo.value) # 1

Builders can be defined in a hierarchy. For instance, we can define a master builder from which every builder can inherit. Building an object is made by specifying the hierarchy path:

@builder
class MasterBuilder(ConfigState):
    pass

@builder
@register
class ColoredFoo(MasterBuilder):
   pass

colored_foo = MasterBuilder({'class': 'ColoredFoo.BlueFoo', 'value': 1})