Skip to content
This repository has been archived by the owner on Jan 7, 2024. It is now read-only.

Poc to demonstrate using pydantic for requests/responses #149

Closed

Conversation

nabla-c0d3
Copy link

@nabla-c0d3 nabla-c0d3 commented Nov 14, 2020

Related to freedomofpress/securedrop-client#1766 (and freedomofpress/securedrop-client#1765 and freedomofpress/securedrop-client#1764), this is to demonstrate the usage of pydantic for defining requests and responses.
Why pydantic:

  • The declarations are very read-able.
  • It works with type annotations.
  • It throws an exception if the parsing of the message fail (ie. if the field's type in the JSON message does not match the class declaration, or a field is missing).
  • I've used it extensively in other projects and it's a great library. Note that Python 3.6 is required for pydantic.

I made the change to the User object only, and if this approach is approved I could do the rest of the objects.

Other objects are going to be tricky tho due to freedomofpress/securedrop-client#1764 . For example right now the Source and Reply objects are used both to parse a response returned by the server, and to pass around a UUID when trying to fetch an object from the server (where the rest of the fields are set to a dummy value). This approach is visible here for example: #32.
So I might have to do API-breaking changes to properly fix this, but I don't know if that's an option or not.

@kushaldas
Copy link
Contributor

After digging more into the usage of pydantic, my personal preference would be not to use it. Major reason is in two parts:

  • We are receiving data from our own Servers instances (no third party data source), means we do have much high confidence into the data we are receiving (note: I am not talking about safe/unsafe data, but rather the actual data types).
  • Right now we do have mypy checking data types in to the codebase, and that is not effecting the runtime. Means if even we do a mistake in writing wrong data type, the runtime is not getting effected and user experience is not hampered. Any runtime check in the other hand has to be 100% same and accurate for this, or else it will raise exceptions.

@rmol
Copy link
Contributor

rmol commented Dec 8, 2020

I'm coming to the same conclusion as Kushal, I'm afraid.

It looks like it might be difficult to integrate pydantic with other class-based validation like SQLAlchemy models or Flask forms. If the SDK is the only place we can apply it in the SecureDrop ecosystem, I'm not sure the benefit would outweigh the cognitive cost.

A bigger problem for me is that incoming data can be altered to fit the model specification, particularly considering the purpose of this project. I think it would be OK for the SDK to double-check the data it's moving, as a backup for the validation happening in the server or workstation, but it should never be silently truncating numeric values or converting between types. It's a conduit.

We have seen problems where a value of the wrong type makes its way into a column, thanks to quirks of the combination of SQLAlchemy and SQLite. The last thing we need is a new validation library being itself a source of those bugs. 🙁

If you have integrated pydantic with other validation systems, or if I've missed a way to prevent inadvertently transforming data as it's passing through the SDK, let me know. I'd be happy to take another look.

@nabla-c0d3
Copy link
Author

This will be my last post on this PR/issue as this was just a proposal, but here is another example from the SDK code.

  1. Current version:
class API:

    def get_sources(self) -> List[Source]:
        path_query = "api/v1/sources"
        method = "GET"

        data, status_code, headers = self._send_json_request(
            method, path_query, headers=self.req_headers, timeout=self.default_request_timeout,
        )

        sources = data["sources"]
        result = []  # type: List[Source]

        for source in sources:
            s = Source(**source)
            result.append(s)

        return result
  1. With pydantic:
class SourceField(pydantic.BaseModel):
    uuid: UUID
    add_star_url: str
    is_flagged: bool
    # etc...


class SourcesResponse(pydantic.BaseModel):
    sources: List[SourceField]


class API:

    def get_sources(self) -> List[Source]:
        path_query = "api/v1/sources"
        method = "GET"

        data, status_code, headers = self._send_json_request(
            method, path_query, headers=self.req_headers, timeout=self.default_request_timeout,
        )
        return SourcesResponse(**data)

I think that the code to parse the response is a lot easier to read in 2) and less prone to coding mistakes. Also adding a new field to the response will be a lot easier.
The same logic can be applied to any code/project that needs to send requests, or parse responses, and the pydantic-style declaration tells the reader that the types and the messages formats are 100% correct as they are enforced at runtime. The code becomes self-documenting.

Lastly, to reply to some of your comments and describe my thoughts a bit more:

Means if even we do a mistake in writing wrong data type, the runtime is not getting effected and user experience is not hampered. Any runtime check in the other hand has to be 100% same and accurate for this, or else it will raise exceptions.

I would argue that data types/formats being "100% same and accurate for this, or else it will raise exceptions" is exactly the behavior that you want. With a wrong data type, the runtime will then be operating on wrong assumptions; it might work for some time or in some cases, but there will definitely be bugs at some point. These bugs will probably be tricky to troubleshoot because the bugs will be side-effects of the initially-wrong assumptions.
Instead you want the code to crash as soon as assumptions are no longer correct, for example as soon as the developer changes a type and makes a mistake doing so. This allows the developer to catch problems for example when they run the tests, instead of letting the problem reach production.

It looks like it might be difficult to integrate pydantic with other class-based validation like SQLAlchemy models or Flask forms

I've used pydantic extensively with SQLAlchemy and didn't run into problems. I have not used with Flask tho.

That's all for pydantic... Ultimately it is of course not up to me to make a decision on this. My personal experience of using pydantic has been great, but perhaps there are better approaches for the securedrop projects.

@rmol
Copy link
Contributor

rmol commented Dec 9, 2020

This will be my last post on this PR/issue as this was just a proposal, but here is another example from the SDK code.

Thanks for the thought you've put into this. I understand not wanting to sink more time into something that we don't seem likely to adopt, so don't feel obligated to respond.

I've used pydantic extensively with SQLAlchemy and didn't run into problems.

Is any of that work public? I'd like to see how you integrated them.

I would argue that data types/formats being "100% same and accurate for this, or else it will raise exceptions" is exactly the behavior that you want.
I think that the code to parse the response is a lot easier to read in 2) and less prone to coding mistakes. Also adding a new field to the response will be a lot easier.

I agree with all of that. But ideally, this SDK should either convey values verbatim, or raise an exception. It's not obvious to me how to avoid pydantic's value coercion, at least right now. I did find pydantic/pydantic#1098 which suggests that better/easier strict parsing might be coming, so maybe we can take another look then, and if we've figured out how to integrate it with the other tools I mentioned, we could bring it in.

@nabla-c0d3
Copy link
Author

Our go-to web stack at work has evolved throughout the years and is now falcon + pydantic + sqlalchemy. Unfortunately, none of this code is public. The code base we have with this stack is pretty big tho.

As for the type coercion done by pydantic, I guess I don't see it as a big problem? The main thing when using pydantic is that the types in the message format/class should not be considered as "type hints", but as the source of truth, and therefore must be correct.
If a field is defined as an int but in reality it should be a float, that's a mistake in the field's declaration.

@legoktm
Copy link
Member

legoktm commented Dec 20, 2023

Since this was just a POC, I've converted this into an issue at freedomofpress/securedrop-client#1750.

@legoktm legoktm closed this Dec 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants