Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field validators broken for nested models since v2.3.2 #331

Closed
LachlanMarnham opened this issue Jul 2, 2024 · 14 comments
Closed

Field validators broken for nested models since v2.3.2 #331

LachlanMarnham opened this issue Jul 2, 2024 · 14 comments
Assignees
Labels
enhancement New feature or request feature request

Comments

@LachlanMarnham
Copy link

LachlanMarnham commented Jul 2, 2024

pydantic version: 2.8.0
OS: confirmed broken on Windows and Linux, haven't tested others
pydantic-settings version: 2.3.2

Hello, I noticed that there is a breaking change in pydantic-settings==2.3.2, which I think was introduced by PR: #309. Here is a minimal working example:

from pydantic import BaseModel, field_validator
import os
from pydantic_settings import BaseSettings, SettingsConfigDict


class Child(BaseModel):
    ATTRIBUTE: list[str]

    @field_validator("ATTRIBUTE", mode="before")
    def convert_fields(cls, input: str) -> list[str]:
        return input.split("|")


class Parent(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="PREFIX_",
        env_nested_delimiter="__",
        case_sensitive=False,
        env_file_encoding="utf-8",
    )

    child: Child


os.environ["PREFIX_CHILD__ATTRIBUTE"] = "a|b|c"
settings = Parent()
assert settings.child.ATTRIBUTE == ["a", "b", "c"]

This test passes for pydantic-settings up to and including 2.3.1, but has been broken since 2.3.2. The test doesn't simply fail: the instantiation of Parent raises with the following traceback:

Traceback (most recent call last):
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 375, in __call__  
    field_value = self.prepare_field_value(field_name, field, field_value, value_is_complex)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 568, in prepare_field_value
    env_val_built = self.explode_env_vars(field_name, field, self.env_vars)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 703, in explode_env_vars
    raise e
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 700, in explode_env_vars
    env_val = self.decode_complex_value(last_key, target_field, env_val)  # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 187, in decode_complex_value
    return json.loads(value)
           ^^^^^^^^^^^^^^^^^
  File "C:\Python\Python311\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python311\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Python311\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\software\pydantic-test\test.py", line 26, in <module>
    settings = Parent()
               ^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 141, in __init__     
    **__pydantic_self__._settings_build_values(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 311, in _settings_build_values
    return deep_update(*reversed([source() for source in sources]))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\main.py", line 311, in <listcomp>   
    return deep_update(*reversed([source() for source in sources]))
                                  ^^^^^^^^
  File "C:\proj\pypoetry\virtualenvs\pydantic-test-fL1284vF-py3.11\Lib\site-packages\pydantic_settings\sources.py", line 377, in __call__  
    raise SettingsError(
pydantic_settings.sources.SettingsError: error parsing value for field "child" from source "EnvSettingsSource"

So it's trying to JSON-decode the string "a|b|c" and failing. But my field_validator is running in "before" mode
so this shouldn't happen. Indeed, if I simply comment-out the field_validator I see exactly the same error. So I believe that decorator is either being ignored altogether, or "before" mode has been broken.

@hramezani
Copy link
Member

Thanks @LachlanMarnham for reporting this.

Actually, the example code shouldn't have worked before because ATTRIBUTE field's type is list[str], and pydantic-settings considers it as a complex field. pydantic-settings parses the values of the complex field and this is the reason that you get the error.

By fixing the bug that we had before, you don't need a field_validator anymore. you can pass a list as value and pydantic-settings parses it.

from pydantic import BaseModel
import os
from pydantic_settings import BaseSettings, SettingsConfigDict


class Child(BaseModel):
    ATTRIBUTE: list[str]


class Parent(BaseSettings):
    model_config = SettingsConfigDict(
        env_prefix="PREFIX_",
        env_nested_delimiter="__",
        case_sensitive=False,
        env_file_encoding="utf-8",
    )

    child: Child


os.environ["PREFIX_CHILD__ATTRIBUTE"] = '["a", "b", "c"]'
settings = Parent()
assert settings.child.ATTRIBUTE == ["a", "b", "c"]

@LachlanMarnham
Copy link
Author

LachlanMarnham commented Jul 2, 2024

Ok thanks @hramezani. I think this is maybe a problem of documentation, with a conflict between pydantic and pydantic-settings. To your point, the pydantic-settings documentation does say:

Complex types like list, set, dict, and sub-models are populated from the environment by treating the environment variable's value as a JSON-encoded string.

Which is fair enough. However, pydantic guarantees that I can avoid this behaviour if I want to:

Before validators run before Pydantic's internal parsing and validation (e.g. coercion of a str to an int). These are more flexible than After validators since they can modify the raw input, but they also have to deal with the raw input, which in theory could be any arbitrary object.

It does feel a bit as if pydantic-settings is breaking a contract with pydantic here, given that internal parsing and validation is in fact taking place before the before validator.

@hramezani
Copy link
Member

Good point @LachlanMarnham.

The json parsing behavior was in pydanatic-settings v1(actually pydantic. because pydantic-settings was in pydantic in V1). That's why we keep the behavior in V2. I think we can have a config flag to disable parsing and let the people parse the value by before field_validator

@LachlanMarnham
Copy link
Author

I think we can have a config flag...

Do you mean a flag in the SettingsConfigDict? I guess such a flag can't really be added to the field_validator itself since that's owned by pydantic. Personally I don't rely on any settings being JSON, but I can imagine that some people might think it is a bit too much for the two options to be "all fields must be JSON" and "JSON de-serlialisation will be switched off anywhere" (unless I am mis-understanding you).

Maybe the way I'm using pydantic-settings is a bit of an edge case, seeing as 2.3.2 was released a few weeks ago and nobody else has raised a ticket yet. If you think the current behaviour is correct, just a bit out of sync with the documentation, I can open a PR to update the docs.

Thanks

@hramezani
Copy link
Member

hramezani commented Jul 3, 2024

Yes, I meant a flag in pydantic-settings SettingsConfigDict like enable_json_parsing which has two values:

  • True - keeps the current behavior of pydantic-setting and is the default to prevent breaking change
  • False - disables the json parsing of values and pass the value to pydantic. we can have field_validator in this case to parse the values if we want.

Maybe the way I'm using pydantic-settings is a bit of an edge case, seeing as 2.3.2 was released a few weeks ago and nobody else has raised a ticket yet. If you think the current behaviour is correct, just a bit out of sync with the documentation, I can open a PR to update the docs.

Yes, I think the current behaviour is correct. what do you think? which part is wrong?

@fajpunk
Copy link

fajpunk commented Jul 30, 2024

I can imagine that some people might think it is a bit too much for the two options to be "all fields must be JSON" and "JSON de-serlialisation will be switched off anywhere"

I am "some people" here 😄 I would want to do custom parsing for only certain fields. How about some way to specify a list of fields for which to disable JSON parsing? Maybe a disable_json_parsing_for entry in SettingsConfigDict that takes a list of field names?

@hramezani
Copy link
Member

hramezani commented Jul 31, 2024

@fajpunk I updated my previous comment. the idea is to implement something like Annotated[int, Nodecode] to disable JSON parsing on the field level.

@hramezani hramezani added enhancement New feature or request feature request labels Aug 5, 2024
@Numerlor
Copy link

Numerlor commented Sep 6, 2024

The json parsing behavior was in pydanatic-settings v1(actually pydantic. because pydantic-settings was in pydantic in V1). That's why we keep the behavior in V2.

Was the behavior present in V1? I just updated a pre validator from v1 that always worked fine to a before validator and ran into this issue

@hramezani
Copy link
Member

@Numerlor I am not sure. Could you give me the validator that works in V1 and doesn't work in V2. I can probably help you if there is a way

@Numerlor
Copy link

Numerlor commented Sep 7, 2024

@Numerlor I am not sure. Could you give me the validator that works in V1 and doesn't work in V2. I can probably help you if there is a way

The behaviors seem to differ between nested models and attributes directly on BaseSettings in v1, directly on BaseSettings produces the same error, nested models are parsed without issue

repro code v2

Same with BeforeValidator in Annotated, but this is the more direct translation of the v1 code

import json
import os
import typing

import pydantic
import pydantic_settings
from pydantic_settings import SettingsConfigDict


def parse_json_or_space_delimited_list(string: str) -> list[str]:
    try:
        return typing.cast(list[str], json.loads(string))
    except json.JSONDecodeError:
        return string.split(" ")


class Model(pydantic.BaseModel):
    test: list[str] = []
    test_validator = pydantic.field_validator("test", mode="before")(parse_json_or_space_delimited_list)


class Settings(pydantic_settings.BaseSettings):
    test: Model
    model_config = SettingsConfigDict(env_nested_delimiter="__")


os.environ["test__test"] = "a b c"

Settings()
Traceback (most recent call last):
  File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 375, in __call__
    field_value = self.prepare_field_value(field_name, field, field_value, value_is_complex)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 577, in prepare_field_value
    raise e
  File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 574, in prepare_field_value
    value = self.decode_complex_value(field_name, field, value)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 187, in decode_complex_value
    return json.loads(value)
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/c/Users/numerlor/AppData/Roaming/JetBrains/PyCharm2024.2/scratches/scratch.py", line 21, in <module>
    Settings()
  File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/main.py", line 141, in __init__
    **__pydantic_self__._settings_build_values(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/main.py", line 311, in _settings_build_values
    return deep_update(*reversed([source() for source in sources]))
                                  ^^^^^^^^
  File "/home/numerlor/.../.../dist/export/python/virtualenvs/python-default/3.12.5/lib/python3.12/site-packages/pydantic_settings/sources.py", line 377, in __call__
    raise SettingsError(
pydantic_settings.sources.SettingsError: error parsing value for field "test" from source "EnvSettingsSource"
working v1
import json
import os
import typing

import pydantic.v1


def parse_json_or_space_delimited_list(string: str) -> list[str]:
    try:
        return typing.cast(list[str], json.loads(string))
    except json.JSONDecodeError:
        return string.split(" ")


class Model(pydantic.v1.BaseModel):
    test: list[str] = []
    test_validator = pydantic.v1.validator("test", pre=True)(parse_json_or_space_delimited_list)
    

class Settings(pydantic.v1.BaseSettings):
    test: Model
    class Config:
        env_nested_delimiter="__"

os.environ["test__test"] = "a b c"

Settings()

@hramezani
Copy link
Member

@Numerlor, in V2, json.loads is used for decoding complex values. So, when pydantic-settings collects values and tries to decode complex values by json.loads. Right now, it is not possible to bypass the default behavior. but you can fix your problem by a custom env source settings class like:

import os
from typing import Any, Tuple, Type

from pydantic import BaseModel
from pydantic.fields import FieldInfo

from pydantic_settings import BaseSettings, SettingsConfigDict, EnvSettingsSource, PydanticBaseSettingsSource


class MyEnvSettingsSource(EnvSettingsSource):
    def decode_complex_value(self, field_name: str, field: FieldInfo, value: Any) -> Any:
        return value.split(" ")

class NestedSettings(BaseModel):
    test: list[str]

class Settings(BaseSettings):
    test: NestedSettings

    model_config = SettingsConfigDict(env_nested_delimiter="__")

    @classmethod
    def settings_customise_sources(
        cls,
        settings_cls: Type[BaseSettings],
        init_settings: PydanticBaseSettingsSource,
        env_settings: PydanticBaseSettingsSource,
        dotenv_settings: PydanticBaseSettingsSource,
        file_secret_settings: PydanticBaseSettingsSource,
    ) -> Tuple[PydanticBaseSettingsSource, ...]:

        return init_settings, MyEnvSettingsSource(settings_cls), dotenv_settings, file_secret_settings

os.environ["test__test"] = "a b c"

print(Settings())

@mckirk
Copy link

mckirk commented Dec 2, 2024

For what it's worth, I've also just run into this issue, trying to do something like the following:

import os
from typing import Annotated

from pydantic import BeforeValidator
from pydantic_settings import BaseSettings

CommaSet = Annotated[set[str], BeforeValidator(lambda s: set(x for x in s.split(",") if x))]


class Example(BaseSettings):
    EXAMPLE: CommaSet


os.environ["EXAMPLE"] = "asdf,fdsa"

print(Example())

It really would be nice if there was a way to tell pydantic-settings what the expected 'input' type for the type is and base the 'is complex value' logic on that. There is the 'WithJsonSchema' annotator already in Pydantic for example, maybe somethinge like that. Or just something like a 'DontDeJSON' annotator that pydantic-settings respects.

@hramezani
Copy link
Member

I created #492 to add NoDecode and ForceDecode annotations. Also the enable_decoding config can control the decoding behavior for the whole model fields

@hramezani
Copy link
Member

Addressed in 3667aed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request
Projects
None yet
Development

No branches or pull requests

5 participants