Poc to demonstrate using pydantic for requests/responses #149

nabla-c0d3 · 2020-11-14T19:08:43Z

Related to freedomofpress/securedrop-client#1766 (and freedomofpress/securedrop-client#1765 and freedomofpress/securedrop-client#1764), this is to demonstrate the usage of pydantic for defining requests and responses.
Why pydantic:

The declarations are very read-able.
It works with type annotations.
It throws an exception if the parsing of the message fail (ie. if the field's type in the JSON message does not match the class declaration, or a field is missing).
I've used it extensively in other projects and it's a great library. Note that Python 3.6 is required for pydantic.

I made the change to the User object only, and if this approach is approved I could do the rest of the objects.

Other objects are going to be tricky tho due to freedomofpress/securedrop-client#1764 . For example right now the Source and Reply objects are used both to parse a response returned by the server, and to pass around a UUID when trying to fetch an object from the server (where the rest of the fields are set to a dummy value). This approach is visible here for example: #32.
So I might have to do API-breaking changes to properly fix this, but I don't know if that's an option or not.

kushaldas · 2020-12-08T16:18:22Z

After digging more into the usage of pydantic, my personal preference would be not to use it. Major reason is in two parts:

We are receiving data from our own Servers instances (no third party data source), means we do have much high confidence into the data we are receiving (note: I am not talking about safe/unsafe data, but rather the actual data types).
Right now we do have mypy checking data types in to the codebase, and that is not effecting the runtime. Means if even we do a mistake in writing wrong data type, the runtime is not getting effected and user experience is not hampered. Any runtime check in the other hand has to be 100% same and accurate for this, or else it will raise exceptions.

rmol · 2020-12-08T23:13:25Z

I'm coming to the same conclusion as Kushal, I'm afraid.

It looks like it might be difficult to integrate pydantic with other class-based validation like SQLAlchemy models or Flask forms. If the SDK is the only place we can apply it in the SecureDrop ecosystem, I'm not sure the benefit would outweigh the cognitive cost.

A bigger problem for me is that incoming data can be altered to fit the model specification, particularly considering the purpose of this project. I think it would be OK for the SDK to double-check the data it's moving, as a backup for the validation happening in the server or workstation, but it should never be silently truncating numeric values or converting between types. It's a conduit.

We have seen problems where a value of the wrong type makes its way into a column, thanks to quirks of the combination of SQLAlchemy and SQLite. The last thing we need is a new validation library being itself a source of those bugs. 🙁

If you have integrated pydantic with other validation systems, or if I've missed a way to prevent inadvertently transforming data as it's passing through the SDK, let me know. I'd be happy to take another look.

nabla-c0d3 · 2020-12-09T06:12:13Z

This will be my last post on this PR/issue as this was just a proposal, but here is another example from the SDK code.

Current version:

class API:

    def get_sources(self) -> List[Source]:
        path_query = "api/v1/sources"
        method = "GET"

        data, status_code, headers = self._send_json_request(
            method, path_query, headers=self.req_headers, timeout=self.default_request_timeout,
        )

        sources = data["sources"]
        result = []  # type: List[Source]

        for source in sources:
            s = Source(**source)
            result.append(s)

        return result

With pydantic:

class SourceField(pydantic.BaseModel):
    uuid: UUID
    add_star_url: str
    is_flagged: bool
    # etc...


class SourcesResponse(pydantic.BaseModel):
    sources: List[SourceField]


class API:

    def get_sources(self) -> List[Source]:
        path_query = "api/v1/sources"
        method = "GET"

        data, status_code, headers = self._send_json_request(
            method, path_query, headers=self.req_headers, timeout=self.default_request_timeout,
        )
        return SourcesResponse(**data)

I think that the code to parse the response is a lot easier to read in 2) and less prone to coding mistakes. Also adding a new field to the response will be a lot easier.
The same logic can be applied to any code/project that needs to send requests, or parse responses, and the pydantic-style declaration tells the reader that the types and the messages formats are 100% correct as they are enforced at runtime. The code becomes self-documenting.

Lastly, to reply to some of your comments and describe my thoughts a bit more:

Means if even we do a mistake in writing wrong data type, the runtime is not getting effected and user experience is not hampered. Any runtime check in the other hand has to be 100% same and accurate for this, or else it will raise exceptions.

I would argue that data types/formats being "100% same and accurate for this, or else it will raise exceptions" is exactly the behavior that you want. With a wrong data type, the runtime will then be operating on wrong assumptions; it might work for some time or in some cases, but there will definitely be bugs at some point. These bugs will probably be tricky to troubleshoot because the bugs will be side-effects of the initially-wrong assumptions.
Instead you want the code to crash as soon as assumptions are no longer correct, for example as soon as the developer changes a type and makes a mistake doing so. This allows the developer to catch problems for example when they run the tests, instead of letting the problem reach production.

It looks like it might be difficult to integrate pydantic with other class-based validation like SQLAlchemy models or Flask forms

I've used pydantic extensively with SQLAlchemy and didn't run into problems. I have not used with Flask tho.

That's all for pydantic... Ultimately it is of course not up to me to make a decision on this. My personal experience of using pydantic has been great, but perhaps there are better approaches for the securedrop projects.

rmol · 2020-12-09T16:12:47Z

This will be my last post on this PR/issue as this was just a proposal, but here is another example from the SDK code.

Thanks for the thought you've put into this. I understand not wanting to sink more time into something that we don't seem likely to adopt, so don't feel obligated to respond.

I've used pydantic extensively with SQLAlchemy and didn't run into problems.

Is any of that work public? I'd like to see how you integrated them.

I would argue that data types/formats being "100% same and accurate for this, or else it will raise exceptions" is exactly the behavior that you want.
I think that the code to parse the response is a lot easier to read in 2) and less prone to coding mistakes. Also adding a new field to the response will be a lot easier.

I agree with all of that. But ideally, this SDK should either convey values verbatim, or raise an exception. It's not obvious to me how to avoid pydantic's value coercion, at least right now. I did find pydantic/pydantic#1098 which suggests that better/easier strict parsing might be coming, so maybe we can take another look then, and if we've figured out how to integrate it with the other tools I mentioned, we could bring it in.

nabla-c0d3 · 2020-12-12T03:16:10Z

Our go-to web stack at work has evolved throughout the years and is now falcon + pydantic + sqlalchemy. Unfortunately, none of this code is public. The code base we have with this stack is pretty big tho.

As for the type coercion done by pydantic, I guess I don't see it as a big problem? The main thing when using pydantic is that the types in the message format/class should not be considered as "type hints", but as the source of truth, and therefore must be correct.
If a field is defined as an int but in reality it should be a float, that's a mistake in the field's declaration.

legoktm · 2023-12-20T17:42:42Z

Since this was just a POC, I've converted this into an issue at freedomofpress/securedrop-client#1750.

Switch User message to pydantic

43da074

nabla-c0d3 mentioned this pull request Nov 14, 2020

Update __init__ args to show actual type signature freedomofpress/securedrop-client#1766

Open

nabla-c0d3 added 5 commits November 14, 2020 11:13

Swich import order

dfe76ad

Run isort

6901c4d

Fix dev requirements

0ed6dc2

Fix tests on real server

67bf557

Use UUID for more accurate typing

3128ab6

legoktm mentioned this pull request Dec 20, 2023

[sdk] Use pydantic for requests/responses freedomofpress/securedrop-client#1750

Open

legoktm closed this Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poc to demonstrate using pydantic for requests/responses #149

Poc to demonstrate using pydantic for requests/responses #149

nabla-c0d3 commented Nov 14, 2020 •

edited

Loading

kushaldas commented Dec 8, 2020

rmol commented Dec 8, 2020

nabla-c0d3 commented Dec 9, 2020

rmol commented Dec 9, 2020

nabla-c0d3 commented Dec 12, 2020

legoktm commented Dec 20, 2023

Poc to demonstrate using pydantic for requests/responses #149

Poc to demonstrate using pydantic for requests/responses #149

Conversation

nabla-c0d3 commented Nov 14, 2020 • edited Loading

kushaldas commented Dec 8, 2020

rmol commented Dec 8, 2020

nabla-c0d3 commented Dec 9, 2020

rmol commented Dec 9, 2020

nabla-c0d3 commented Dec 12, 2020

legoktm commented Dec 20, 2023

nabla-c0d3 commented Nov 14, 2020 •

edited

Loading