Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Graduate to v1.0 API #442

Open
george-zubrienko opened this issue Jul 20, 2023 · 37 comments
Open

[FEATURE] Graduate to v1.0 API #442

george-zubrienko opened this issue Jul 20, 2023 · 37 comments
Assignees
Labels
api/v1 enhancement New feature or request

Comments

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Jul 20, 2023

Description

Based on some discussions we see in issues around like #433 , #243, #139, #106 and probably more, I can pinpoint a few common pains:

  1. Users are generally confused about whether they should use annotation @dataclass_json or the subclass approach with DataClassJsonMixin, the latter being similar to what mashumaro does. There are also a few things that can only be done from the annotation or global/field level config, most common example being letter_case=....
  2. Annotation usage leads to problems with static code analysis and thus IDE autocompletion tools - from personal experience, pylint hates it so in our projects we always use both, which adds boilerplate and confusion, especially for new developers. Now we can also see from both recent and past issues that people in general have to be creative to make their static code analysis not hate the lib. This is a usability problem and I think it must be addressed one way or another.
  3. Mutating user classes via annotation or forcing people to subclass may lead to name conflicts or incorrect field resolution for cases like Serializing Polymophic Hierarchy #106 (note this is usually not a problem in any non-python json ser-deser lib for python reasons)

I'd like to propose a breaking API change in 1.0 release to get rid of those issues, and potentially greatly improve user experience.

Possible solution

Consider current quickstart:

from dataclasses import dataclass
from dataclasses_json import dataclass_json


@dataclass_json
@dataclass
class Person:
    name: str


person = Person(name='lidatong')
person.to_json()  # '{"name": "lidatong"}' <- this is a string
person.to_dict()  # {'name': 'lidatong'} <- this is a dict
Person.from_json('{"name": "lidatong"}')  # Person(1)
Person.from_dict({'name': 'lidatong'})  # Person(1)

# You can also apply _schema validation_ using an alternative API
# This can be useful for "typed" Python code

Person.from_json('{"name": 42}')  # This is ok. 42 is not a `str`, but
                                  # dataclass creation does not validate types

Change it to this:

from dataclasses import dataclass
from dataclasses_json.functions import from_json, to_json, to_dict, from_dict
# OR
# from dataclasses_json import JsonSerializer

@dataclass
class DataPerson:
    name: str

class RegularPerson:
    name: str

d_person = DataPerson(name='lidatong')
r_person = RegularPerson(name='foobar')

# '{"name": "lidatong"}' <- this is a string
# internally calls json.dumps(to_dict(d_person))
to_json(d_person)  
# OR
JsonSerializer[DataPerson].to_json(d_person)

# {'name': 'lidatong'} <- this is a dict
to_dict(d_person)
# OR
JsonSerializer[DataPerson].to_dict(d_person)

# TypeError: r_person is not a dataclass
to_json(r_person)
to_dict(r_person)


from_json(DataPerson, '{"name": "lidatong"}')  # DataPerson(name="lidatong")
# OR
JsonSerializer[DataPerson].from_json('{"name": "lidatong"}')  # DataPerson(name="lidatong")

from_dict(DataPerson, {'name': 'lidatong'})  # DataPerson(name="lidatong")
# OR
JsonSerializer[DataPerson].from_dict(DataPerson, {'name': 'lidatong'})  # DataPerson(name="lidatong")

Now, this removes the need to have either annotation OR the subclass and simply relies on the class def supplied by generic/type arg (subject for discussion) and generates the ser-deser boilerplate under the hood. This approach formalizes the return type of all ser-deser operations, thus making static code analysis happy, fixing autocompletion AND removes the mutation of usercode.

Note this also allows to supply encoder-decoder configuration on either global or method level via dataclasses_json.cfg.global_config.encoders, dataclasses_json.cfg.global_config.decoders. For method level we can allow to override decoders/encoders via encoders=... and decoders=... argument.

For class level, we can do the same via dataclasses_json.cfg.global_config.encoders[DataPerson]. For class field level this will be a problem. Alternative to all this can be ditching dataclasses_json.cfg.global_config.encoders/decoders and instead doing this:

from dataclasses_json.config import JsonSerializerConfig

d_person_serializer = JsonSerializerConfig[DataPerson](...)

dataclasses_json.cfg.global_config.serializers.append(d_person_serializer)

Then users can create a simple initialization file for their code where this is all set up once and for the lifetime of the app. Other big plus of this approach is that we essentially lock ser-deser pair for the user and will be able to provide clear and understandable error message if things go sideways.

Alternatives

As an alternative, we can only ditch annotation and instead convert mixin to a metaclass, which will allow us to mutate user code as we see fit to achieve proper ser-deser behaviours. This would allow us to achieve similar results by rewriting less code, but will require people to have a bit higher level knowledge of python in order to understand how the lib works and how to use it for their (potentially) complicated use case. Also metaclasses is one of the areas that has seen a breaking change not so long ago.

Context

For context, some thoughts from community members on this:

@george-zubrienko george-zubrienko added the enhancement New feature or request label Jul 20, 2023
@george-zubrienko george-zubrienko self-assigned this Jul 20, 2023
@george-zubrienko
Copy link
Collaborator Author

@lidatong @s-vitaliy @matt035343 @healthmatrice @artificial-aidan and many others please let us know your thoughts on this one :)

@george-zubrienko
Copy link
Collaborator Author

george-zubrienko commented Jul 20, 2023

Updated to make code look more like its actual python and not Java lol. Re how we will, if agreed, do this - by creating a new "release" v1 branch and working there so people can test the lib from git commit before we even consider pushing to PyPI.

@healthmatrice
Copy link
Contributor

healthmatrice commented Jul 20, 2023

I probably will hate the metaclass implementation. I used to use lots of metaclasses in one of my project. But in the end we almost ditched all the metaclasses and I never used them ever in other projects. Because it is still a class factory which can create lots of headache for type checkers.

I think mypy is still struggling python/mypy#2653 https://github.com/python/mypy/labels/topic-metaclasses

@george-zubrienko
Copy link
Collaborator Author

I probably will hate the metaclass implementation. I used to use lots of metaclasses in one of my project. But in the end we almost ditched all the metaclasses and I never used them ever in other projects. Because it is still a class factory which can create lots of headache for type checkers.

I think mypy is still struggling python/mypy#2653 https://github.com/python/mypy/labels/topic-metaclasses

This is good to know! If mypy has issues with those, then the metaclass approach doesn't seem like a viable alternative.

@USSX-Hares
Copy link
Collaborator

I probably will hate the metaclass implementation. I used to use lots of metaclasses in one of my project. But in the end we almost ditched all the metaclasses and I never used them ever in other projects. Because it is still a class factory which can create lots of headache for type checkers.

Mostly agreed. Furthermore, if anyone does already use metaclasses, their code would become even more broken because all metaclasses should derive from the same parent which is not always the case.

@george-zubrienko personally I like the idea of going forward, but I think this changeset is too breaking to be a root cause of dependency management mess. The reason being is the fact DCJ is often used in the dependency libraries or frameworks, and since there's no way for pip to provide different versions for each of them, if any of them wants to update, every other one should update too.

IMAO, we should keep the backwards compatibility while changing the core logic behind AND throwing a ton of DepreciationWarnings in the process.

@USSX-Hares
Copy link
Collaborator

USSX-Hares commented Jul 21, 2023

The real improvement I can ask for is to allow user to use precompilation step for (de)serializer (similarly to how DCs constructor work) and use those instead of relying on the meta fields. This change could be breaking and/or not working in existing user cases, thus I suggest hiding it behind either enable optimization Python flag or some DCJ setting (either per-class or global).

@george-zubrienko
Copy link
Collaborator Author

george-zubrienko commented Jul 21, 2023

frameworks, and since there's no way for pip to provide different versions for each of them, if any of them wants to update, every other one should update

Poetry solves this issue, but if we bump to 1.0 and cut out current functionality, users wont be able to mix those anyway. Worst case people will (they really should though) pin major to 0.x until their lib is ready to upgrade.

IMAO, we should keep the backwards compatibility while changing the core logic behind AND throwing a ton of DepreciationWarnings in the process.

One way could be by making v1 API opt-in instead of forced-use and as you said, get some deprecation warnings in. But then how long v0 API should stay in code, and why can't we rely on people pinning major?

@george-zubrienko
Copy link
Collaborator Author

de)serializer (similarly to how DCs constructor work) and use those instead of relying on the meta fields.

I tend to get rid of annotation usage all together to be fair, so in your case this would go to a separate configuration/serializer class like MyTypeSerializer that will be picked from global scope.
The reason I think hiding constructor logic in the annotation is not a great choice (also applies to how dataclasses are implemented in python) is its poor compatibility with code analysis tools and the fact their order matters when multiple are used, and the fact it mutates whatever goes in instead of adding metadata to the type that can be used at runtime.

@USSX-Hares
Copy link
Collaborator

I tend to get rid of annotation usage all together to be fair, so in your case this would go to a separate configuration/serializer class like MyTypeSerializer that will be picked from global scope.

The second part of my message has nothing to do with annotations.

Imagine this code:

from dataclasses_json import settings
from dataclasses_json import JsonSerializer

settings.enable_optimizations()
JsonSerializer[DataPerson].to_json(d_person)

Which internally checks if there is a compiled serializer and, if not, compiles one so the futher runs would be faster.

@USSX-Hares
Copy link
Collaborator

JsonSerializer[DataPerson].from_json('{"name": "lidatong"}') # DataPerson(name="lidatong")

How would you achieve such behavior? Usually generics won't provide their own type variables (unless typing has changed how it works)

@george-zubrienko
Copy link
Collaborator Author

Imagine this code:

from dataclasses_json import settings
from dataclasses_json import JsonSerializer

settings.enable_optimizations()
JsonSerializer[DataPerson].to_json(d_person)

Which internally checks if there is a compiled serializer and, if not, compiles one so the futher runs would be faster.

Makes sense, I was thinking of having something like dataclasses_json.initialize().with_serializer(<custom serializer here)

So initialize does internally checks if there is a compiled serializer and, if not, compiles one so the futher runs would be faster. and with_serializer adds user-defined serializers

@george-zubrienko
Copy link
Collaborator Author

How would you achieve such behavior? Usually generics won't provide their own type variables (unless typing has changed how it works)

You can check for the actual type provided by class-scoped type variable and then try find a configured (compiled) serializer for it. Small correction, code will look like

JsonSerializer[DataPerson]().from_json('{"name": "lidatong"}')  # DataPerson(name="lidatong")

@matt035343
Copy link
Collaborator

matt035343 commented Jul 21, 2023

Probably also related to #84 and #264

@healthmatrice
Copy link
Contributor

is there any plan to support discriminator? https://github.com/Fatal1ty/mashumaro#discriminator-config-option

@george-zubrienko
Copy link
Collaborator Author

is there any plan to support discriminator? https://github.com/Fatal1ty/mashumaro#discriminator-config-option

I don't think we will need that at all with the new API. As I mentioned, class hierarchies should be ser-desered naturally in v1, at least that would be the goal. Re other libraries, I don't think taking functionality from there is a healthy thing. DCJ has a great core and what we seek is to improve that, not port features from another libs, especially if they won't be needed (hopefully).

@lidatong
Copy link
Owner

lidatong commented Jul 22, 2023

hey thanks so much @george-zubrienko for writing this up. i see that there is a lot of good thinking here and triggered a nice discussion :) thanks especially for synthesizing the user issues and pain points

in general i would strongly prefer not to introduce such a major breaking change. technically, keeping the library pre-1.0 "reserves that right" but there are enough users of the library and downstream libraries that a breaking change to the core API would be suboptimal (https://github.com/lidatong/dataclasses-json/network/dependents) (reading above basically echoing @USSX-Hares)

if we do decide to do breaking changes, i would really like them to be contained (e.g. refactoring the way unrecognized fields are handled is one thing on my mind) or perhaps gating them behind "optional" features like enabling a more optimized build of the library (i've been floating the idea of re-writing the parser core in C)

however, i do see a lot of value in what you wrote about introducing the top-level from_json, etc. in addition to the reasons you stated, one existing issue is ser/de types that you don't define (e.g. A depends on B and wants to ser/de classes in B). you can workaround with DataClassJsonMixin.to_json but it's not obvious and not documented

(side note: can it just be from_json[T](...)? i haven't played around too much with python typing)

so basically i think we can have our cake and eat it too by introducing additive API changes. so users that want the quality of life improvements that come with top level from_json etc. can upgrade if they choose. but users that don't want to refactor their codebase can also continue to get improvements and it avoids a Python 2 / 3 maintain both situation

let me know if that makes sense to you? and thanks again this writeup is much appreciated

@lidatong
Copy link
Owner

lidatong commented Jul 22, 2023

quick sketch:

to_json(person)  # more typing-friendly, does not modify your user type, can be done on third-party types
person.to_json()  # existing API, schema pre-compiled at module load time so maybe faster (doesn't do this currently)

@george-zubrienko
Copy link
Collaborator Author

Hey thanks for kind words, I do believe something good will come out of this :)

Re from_json[T](...) would be nice but python doesn't allow generics on methods this way, if I read the typevar docs correctly. You can only do something like this

from collections.abc import Sequence
from typing import TypeVar

T = TypeVar('T')                  # Declare type variable "T"

def first(l: Sequence[T]) -> T:   # Function is generic over the TypeVar "T"
    return l[0]

a = first([1,2,3,4]) # 1: int
b = first(["a", "b", "c"]) # "a": str

so basically i think we can have our cake and eat it too by introducing additive API changes. so users that want the quality of life improvements that come with top level from_json etc. can upgrade if they choose. but users that don't want to refactor their codebase can also continue to get improvements and it avoids a Python 2 / 3 maintain both situation

I think it makes sense not only to me, but also to other contributors, as well as users. I tried to lay out different options so people can iterate a bit in their heads and comment on the solution they would like/support most. I think this is exactly what is happening right now, and it is good to see people being more or less aligned on how we should implement changes. I expect a bit more reactions here next week so we have a full picture.
TLDR, the very top-level plan would be:

  • Keep existing API as-is
  • Start introducing new API right away in new library releases OR we can also do this by adding a v1 branch and back-porting features from v1 to v0 until eventually v1 becomes master. v0 will go to PyPI as usual and will contain both v0 (current) and v1 (new) APIs, while pure v1 could only be installed from git commit. Let me know if you find this too complicated - it very well may be, only really good thing I see with this approach is easier transition when everybody is ready for v1-only.
  • Choose the first API to introduce - can be what @USSX-Hares proposed, for example, to kick off the coding spree :)

@artificial-aidan
Copy link

Sort of related, but coming up with a way to do CI testing for typing on the new APIs. Typing has sort of become a requirement in python recently (from what I'm seeing) and making sure that all of the type checkers handle the new APIs would be great.

My brief glance over the current CI is that it just runs mypy on the codebase (which isn't a bad thing), but a set of test files exercising the different corner cases for typing would be ideal.

And maybe this just looks like running the type checkers on the test files 🤷‍♀️

@USSX-Hares
Copy link
Collaborator

Re from_json[T](...) would be nice but python doesn't allow generics on methods this way, if I read the typevar docs correctly. You can only do something like this

Actually, we could do something like that by manually defining __getitem__ operation for the method.

@USSX-Hares
Copy link
Collaborator

My brief glance over the current CI is that it just runs mypy on the codebase (which isn't a bad thing), but a set of test files exercising the different corner cases for typing would be ideal.

@artificial-aidan if you came up with these new test cases it would be great.

@USSX-Hares
Copy link
Collaborator

USSX-Hares commented Jul 23, 2023

@george-zubrienko, something like that:

A lot of code
from dataclasses import dataclass
from typing import *

T = TypeVar('T')
JsonPlain = str | bool | int | float | None
Json = JsonPlain | List['Json'] | Dict[str, 'Json']

class JsonSerializerImpl(Generic[T]):
    generic_over: Type[T]
    def __init__(self, generic_over: Type[T]):
        self.generic_over = generic_over
    
    def __repr__(self):
        return f'{JsonSerializerImpl.__qualname__}[{self.generic_over.__qualname__}]'
    
    def to_json(self, data: T) -> Json:
        print(f"> {self!r}.{self.to_json.__qualname__}({data=!r})")
        
        # Actual implementation here...
        pass
        
        return None
    
    def __call__(self, data: T) -> Json:
        return self.to_json(data)

class JsonSerializerGeneric(Generic[T]):
    def __getitem__(self, item: Type[T]) -> JsonSerializerImpl[T]:
        return JsonSerializerImpl(item)
    
    def to_json(self, data: T) -> T:
        warnings.warn("Please, use JsonSerializer[ClassRef].to_json(data); otherwise the method can go wrong.", DeprecationWarning, 2)
        return self[type(data)].to_json(data)
    
    def __call__(self, data: T) -> T:
        warnings.warn("Please, use to_json[ClassRef](data); otherwise the method can go wrong.", DeprecationWarning, 2)
        return self[type(data)](data)

to_json = JsonSerializer = JsonSerializerGeneric()

__all__ = \
[
    'JsonSerializer',
    'Json',
    'to_json',
]

And test it with the following:

from dataclasses import dataclass
from typing import *

from method_getitem.definitions import JsonSerializer, to_json

@dataclass
class MyClass:
    a: str
    b: int

c = MyClass('field', 8)

print("JsonSerializer:")
JsonSerializer[MyClass].to_json(c)
JsonSerializer[MyClass].to_json('15')
JsonSerializer[str].to_json(c)
JsonSerializer[str].to_json('15')
JsonSerializer.to_json(c)
print()

print("to_json:")
to_json[MyClass](c)
to_json[MyClass]('15')
to_json[str](c)
to_json[str]('15')
to_json(c)
print()

Execution result:

D:\Code\test\.venv-311\Scripts\python.exe D:\Code\test\method_getitem\definitions.py 
D:\Code\test\method_getitem\definitions.py:54: DeprecationWarning: Please, use JsonSerializer[ClassRef].to_json(data); otherwise the method can go wrong.
  JsonSerializer.to_json(c)
D:\Code\test\method_getitem\definitions.py:62: DeprecationWarning: Please, use to_json[ClassRef](data); otherwise the method can go wrong.
  to_json(c)
JsonSerializer:
> JsonSerializerImpl[MyClass].JsonSerializerImpl.to_json(data=MyClass(a='field', b=8))
> JsonSerializerImpl[MyClass].JsonSerializerImpl.to_json(data='15')
> JsonSerializerImpl[str].JsonSerializerImpl.to_json(data=MyClass(a='field', b=8))
> JsonSerializerImpl[str].JsonSerializerImpl.to_json(data='15')
> JsonSerializerImpl[MyClass].JsonSerializerImpl.to_json(data=MyClass(a='field', b=8))

to_json:
> JsonSerializerImpl[MyClass].JsonSerializerImpl.to_json(data=MyClass(a='field', b=8))
> JsonSerializerImpl[MyClass].JsonSerializerImpl.to_json(data='15')
> JsonSerializerImpl[str].JsonSerializerImpl.to_json(data=MyClass(a='field', b=8))
> JsonSerializerImpl[str].JsonSerializerImpl.to_json(data='15')
> JsonSerializerImpl[MyClass].JsonSerializerImpl.to_json(data=MyClass(a='field', b=8))


Process finished with exit code 0

... I was about to write here, but then noticed a really strange behaviour of PyCharm -- when I call these test code from the file I defined the JsonSerializer and to_json, it correctly shows all type errors and all but one deprecation mark, but when from the other file, it goes rogue.

Some PyCharm Screenshots

In the source file:

In the source file

In the external file:

In the external file

This may be related to this PyCharm issue I've reported a while ago.
Can be fixed by explicitly defining the type:

JsonSerializer: JsonSerializerGeneric = JsonSerializerGeneric()
to_json = JsonSerializer

Update

I've found the more proper solution that works with type checkers but requires metaclasses.

An updated example

Replace the JsonSerializerGeneric from the code snippet above with the following:

class JsonSerializerMeta(type):
    def __getitem__(self, item: Type[T]) -> JsonSerializerImpl[T]:
        return JsonSerializerImpl(item)
    
    def to_json(self, data: T) -> T:
        warnings.warn("Please, use JsonSerializer[ClassRef].to_json(data); otherwise the method can go wrong.", DeprecationWarning, 2)
        return self[type(data)].to_json(data)
    
    def __call__(cls, data: T) -> T:
        warnings.warn("Please, use to_json[ClassRef](data); otherwise the method can go wrong.", DeprecationWarning, 2)
        return cls[type(data)](data)


class JsonSerializer(metaclass=JsonSerializerMeta):
    pass

to_json = JsonSerializer

@george-zubrienko
Copy link
Collaborator Author

Looks great, I'll have a more thorough look later this week. Re metaclasses, I think given the concerns expressed about them, maybe we should consider that as a last resort option, but good that it works :)

Re the to_json[MyClass] notation, it is a bit more java-ish, though looks good and clean. However, the question is, what does it bring to the table compared to more verbose-ish Python syntax JsonSerializer[MyClass]().to_json or JsonSerializer[MyClass].to_json?

@USSX-Hares
Copy link
Collaborator

Unfortunately, GH doesn't support custom emoji reactions :c.

I prefer option 2 because it's more clean. Alternatively, we can use the following construction (or any similar):

JsonSerializer.to_json(data, model=MyClass, **options)

@george-zubrienko
Copy link
Collaborator Author

JsonSerializer.to_json(data, model=MyClass, **options)

I like this one, but I'd also like to hear what other think, so let's see what pops up during the week!

@USSX-Hares
Copy link
Collaborator

USSX-Hares commented Jul 24, 2023

Voting section

On this message, and messages directly below, vote with 👍 or 👎 . Any variations are allowed. Suggestions, as well as new options, are allowed.

Voting Option 1

JsonSerializer[MyClass]().to_json(data)

(with parenthesis)

@USSX-Hares
Copy link
Collaborator

Voting Option 2

JsonSerializer[MyClass].to_json()

(No parenthesis)

@USSX-Hares
Copy link
Collaborator

Voting Option 3

JsonSerializer.to_json(data, MyClass)

(Class passed as argument)

@s-vitaliy
Copy link
Collaborator

@USSX-Hares, I would prefer Option 2, but why do you need the generic type for converting to JSON at all? I suppose, we should have a generic from_json method that returns T and untyped to_json that accepts any dataclass.

@george-zubrienko
Copy link
Collaborator Author

george-zubrienko commented Jul 25, 2023

Also #31 should be solved in v1 API more or less - I'm switching pins so this issue gets a bit more attention when people come in :)

@george-zubrienko george-zubrienko pinned this issue Jul 25, 2023
@matt035343
Copy link
Collaborator

matt035343 commented Jul 25, 2023

@USSX-Hares, To make two first options work, there is needed some magic code that (imo) reduces the transparency of this library.

What about a fourth option:

Voting Option 4

JsonSerializer(MyClass).to_json(data)

Basically using the constructer to initialize instead of the custom getter?

@george-zubrienko
Copy link
Collaborator Author

george-zubrienko commented Jul 29, 2023

I'm actually in favor of option 4, as it is more pythonic and involves less magic method juggling.
JsonSerializer.to_json(data)
and JsonSerializer(MyClass).from_json(data)
We'll also need serializer registration as a bonus:
dataclasses_json.serializers.register(...)
In order to make the lib thread-safe, register should be scoped to current thread (no globals), but available for forking in other threads.
I'll start creating issues for implementing v1 API features from Monday :)

@USSX-Hares
Copy link
Collaborator

USSX-Hares commented Jul 29, 2023

A side note: we should allow users to force data type for to_json method for situation when they need to send data without any extra fields added by children classes.

Also, I still recommend the option 3 since it is still Pythonic and avoids confusion like "should I instantiate JsonSerializer for this particular case?" unless you can both always do that and pass additional arguments to the constructor which were per-call parameters earlier.

@george-zubrienko
Copy link
Collaborator Author

george-zubrienko commented Jul 31, 2023

Will open a PR for this one this/next week to get some code in #453

More issues to be created soon after :)

All issues labelled with api/v1 and not-currently-assigned are up for grabs once we complete the #453. I'll also put help-wanted on them after that.

@USSX-Hares
Copy link
Collaborator

@george-zubrienko and the others, I am continuing my little interrogation.

  1. How the annotation-related features should be used in the new, non-decorated approach?
    I refer to the things like dataclasses_json.CatchAll.
  2. How the settings merging should be done? Let's assume we have class CamelCaseClass, whose fields should be serialized as camel case, and a class PleasNoMore which actively declines any extra parameters; one is a field on another.
    • How they should be registered?
    • How the (de)serializer should be called?
    • When and where these parameters are defined, and what to do when a conflict occurs?
    • Can we override them per-stack-only?
  3. Should we allow complex user (de)serializer registration? Like allowing registering generic types (which should know over which they are generic), or specific rules or conditions when/how the (de)serializer should run?

@george-zubrienko
Copy link
Collaborator Author

Great questions :) below are my thoughts

  1. How the annotation-related features should be used in the new, non-decorated approach?
    I refer to the things like dataclasses_json.CatchAll.

I think most of those features should be part of "base serializer case", so we just build them into the JsonSerializer class base functionality. For example, the CatchAll behaviour can be kept as-is, if enabled with a flag on JsonSerializer constructor, or disabled/reconfigured.

Maybe we can also consider exposing those via toml file config section? Like this:

[tool.dataclasses_json.config]
undefined_fields = INCLUDE
letter_case = Camel
default_serializer = dataclasses_json.serializers.JsonSerializer
...

IMO that would be cool and clear and no need for runtime code config.

  1. How the settings merging should be done? Let's assume we have class CamelCaseClass, whose fields should be serialized as camel case, and a class PleasNoMore which actively declines any extra parameters; one is a field on another.

Excellent question, so I'd vote for this:

  • Start from lower level and work your way up: if a property type defines a custom ser behaviour (or declines one by requesting a default serializer), honor that. Meaning:
class CamelCaseClass:
   no_more: PleasNoMore

class PleaseNoMore:
   me_no_camel: int

# assume PleaseNoMore forces default ser, while CameCase wants Camel letter case

a = CamelCaseClass(no_more=PleaseNoMore(me_no_camel=1))
to_json(a) # '{"noMore": { "me_no_camel": 1 }}' 

If it does not, we inherit settings from the parent, if not from parent, then from global

  • How they should be registered?

I'm actually so in favor of the toml file, I'd suggest instead of python code registration, we move this to toml file like i did in the example. We can have smth like

[tools.dataclasses_json.config]
serializers = ["dataclasses_json.serializers.JsonSerializer", "my_package.dcj_serializers.*:, ...]

Registration reads this and adds them all to "current thread" cache. Note that "current thread" may be a bit of wishful thinking on my end re thread safety, so take that with a bit of salt :)

  • How the (de)serializer should be called?

When to_json/from_json is invoked by user code, we first take the target type they specify or, if not provided, the passed object type for to_json case. We go through fields(my_dataclass) and for each field type we do smth like

serde = dataclasses_json.config.registered_serializers.get(field.type, getattr(my_dataclass, field.name))

If we have such serializer, we apply the respecitve to_json/from_json from it, which will again, scan fields, for each field type invoke the registered serializer. Now, if we do not have the registered serializer for the type, we throw a SerializerNotDefinedError, where we specify the type we cannot find the serDe class for.

Now for the settings like letter etc, maybe this? Then each serializer instance can read them from conf easy?

[tool.dataclasses_json.config] # global section
undefined_fields = INCLUDE
letter_case = Camel
default_serializer = dataclasses_json.serializers.JsonSerializer
serializers = ["dataclasses_json.serializers.JsonSerializer", "my_package.dcj_serializers.*:, ...]

[tool.dataclasses_json.config.my_package.dcj_serializers.MyNewSerializer]
undefined_fields = IGNORE
letter_case = Kebab
...
  • When and where these parameters are defined, and what to do when a conflict occurs?

Looking at toml, just follow the levels, if not defined for this serializer, take from global.

  • Can we override them per-stack-only?

Yes, if we allow to_json/from_json to have an additional argument overriding serializer settings

  1. Should we allow complex user (de)serializer registration? Like allowing registering generic types (which should know over which they are generic), or specific rules or conditions when/how the (de)serializer should run?

That would be hard to implement, but even it was easy, I think stuff like this can lead to user code bases losing in maintainability and readability. Depends on the feature, however. Like, conditions to run or not run the specific ser behaviour can be controlled by class structure/class serializer settings. Generics, on the other hand, idk :)

@george-zubrienko
Copy link
Collaborator Author

Alright starting the work on this one finally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api/v1 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants