Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a JSON type #182

Closed
brettcannon opened this issue Feb 19, 2016 · 81 comments
Closed

Define a JSON type #182

brettcannon opened this issue Feb 19, 2016 · 81 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@brettcannon
Copy link
Member

JSON is such a common interchange format it might make sense to define it as a specific type.

JSON = t.Union[str, int, float, bool, None, t.Mapping[str, 'JSON'], t.List['JSON']]

Not sure if this should go into typing or be introduces as json.JSONType instead (or if it's even worth it considering the variability of the type).

@gvanrossum
Copy link
Member

I tried to do that but a recursive type alias doesn't work in mypy right now, and I'm not sure how to make it work. In the mean time I use JsonDict = Dict[str, Any] (which is not very useful but at least clarifies that the keys are strings), and Any for places where a more general JSON type is expected.

(I'm sure you meant t.Mapping[str, 'JSON'].)

@brettcannon
Copy link
Member Author

You are right about what I meant and I fixed my comment to not confuse anyone in the future.

And I understand about the lack of recursive object support.

Would this be a better definition?

JSONValue = t.Union[str, int, float, bool, None, t.Dict[str, t.Any], t.List[t.Any]]
JSONType = t.Union[t.Dict[str, JSONValue], t.List[JSONValue]]

If you read RFC 4627 it says a JSON object must be an object or array at the top level (RFC 7159 loosens that to any JSON value, but it's not an accepted standard). If you want to play it safe with what json.loads() takes, then you can just flatten it to:

JSONType = t.Union[str, int, float, bool, None, t.Dict[str, t.Any], t.List[t.Any]]

I guess the real question is how far you want to take this, because if you assume that most JSON objects only go, e.g. 4 levels deeps, you could handcraft accuracy to that level:

_JSONType_0 = t.Union[str, int, float, bool, None, t.Dict[str, t.Any], t.List[t.Any]]
_JSONType_1 = t.Union[str, int, float, bool, None, t.Dict[str, _JSONType_0], t.List[_JSONType_0]]
_JSONType_2 = t.Union[str, int, float, bool, None, t.Dict[str, _JSONType_1], t.List[_JSONType_1]]
_JSONType_3 = t.Union[str, int, float, bool, None, t.Dict[str, _JSONType_2], t.List[_JSONType_2]]
JSONType =  t.Union[str, int, float, bool, None, t.Dict[str, _JSONType_3], t.List[_JSONType_3]]

But then again the union of those objects is pretty broad so this might not be the most useful type hint. :)

@gvanrossum
Copy link
Member

I guess that'll work, but I'm not convinced that it's very useful to
do the multiple levels.

The next question is where this would live? Would you add it to
typing.py? Or to the json module? It would have to be added both to
json.pyi as well to the actual implementation module. :-(

@brettcannon
Copy link
Member Author

OK, so JSONType = t.Union[str, int, float, bool, None, t.Dict[str, t.Any], t.List[t.Any]] seems to be the best solution.

As for where it should go, I don't have a good answer unfortunately. It's like the collections.abc issue; do we have a single place to keep all types -- i.e., typing -- or do we keep types in the specific modules that they relate to (i.e., json in this case)? I guess this would be the first type that wasn't a generic container if we add it to the stdlib somewhere, so there is no precedent to go by.

If we put it in typing then at least all types related to the stdlib are in a single location which is handy for only having to do import typing as t to get at all types. Unfortunately that doesn't work for for third-party libraries so it doesn't seem like the best way to go. So I guess my suggestion is the json module should house it and keep the name JSONType for the module attribute. If you agree I will open an issue on bugs.python.org to add the type to Python 3.6 and then also an accompanying issue for https://github.com/python/typeshed to add a json.pyi and you can close this issue. Otherwise I'll submit a PR to add the type to typing.

@gvanrossum
Copy link
Member

I think it's best to add it to the json module; we can't keep adding everything to typing.py (even the io and re types there are already questionable). Code that wants to use these in Python 3.5 or earlier can write

if False:
    from json import JSONType

(Or they can copy the definition into their own code.)

Question: should we name it JsonType or JSONType? There doesn't seem to be a strong convention here in the Python stdlib -- we have XmlListener and HTMLParser... But somehow I am beginning to prefer AcronymsAreWords.

@gvanrossum
Copy link
Member

Actually, since the json module consistently uses JSONWhatevs, it should be JSONType.

@brettcannon
Copy link
Member Author

PEP 8 says to capitalize abbreviations.

Open an issue for the stdlib at http://bugs.python.org/issue26396 and one for typeshed at python/typeshed#84.

@gvanrossum gvanrossum added this to the 3.5.2 milestone Mar 18, 2016
@gvanrossum
Copy link
Member

I'm marking this for 3.5.2 so we at least have the discussion. I'm still not sure of the solution -- add it to typeshed/.../json.pyi or to typing.py? It can't appear in the stdlib json module until 3.6 but it could appear in typing.py in 3.5.2 (since typing.py is provisional), but I'm not excited about pushing everything to typing.py. So maybe adding it to typeshed/**/json.pyi now and the stdlib json module in 3.6 would be best? If you want to use it then you'd have to write if false: from json import JsonObject.

(I've got a feeling I'm just summarizing where we ended up before but I'm currently in the mood to make the definitive list of things to discuss and/or implement before 3.5.2.)

@JukkaL
Copy link
Contributor

JukkaL commented Mar 18, 2016

I'm not a fan of having a "partial" JSON type that soon degenerates into Any. Type checkers would enforce things inconsistently. As soon as you descent into a JSON object you'd have to manually annotate the result to get type checking for the component object, and this would be hard to do consistently.

Having a recursive JSON type seems like a better idea to me, but even then I'd like to see how the type works for real-world code before including it in the PEP. I suspect that the majority of code doing JSON parsing actually doesn't perform enough isinstance checks when manipulating JSON objects for the code to type check cleanly. I wouldn't like PEP 484 to require programmers to jump through hoops to get their code to type check. For example, just today I reviewed some JSON parsing code that does not perform enough checks to pass type checking if it had used a strict type for JSON, but I think that the code was fine (@gvanrossum do you recognize what code I'm talking about?) :-)

Anyway, if programmers want to use such as partial type, they can define the alias and use it in their code even without making it official, though they may have to introduce some explicit type annotations when interacting with library code that doesn't use the type.

@gvanrossum
Copy link
Member

Two problems with adding a recursive JSON type to the PEP:

  • IIRC Brett and I tried and failed to come up with a recursive definition that worked in mypy
  • I don't believe JSON is special enough to deserve a place in the PEP or typing (re and io are borderline but they are way more fundamental than JSON)

@brettcannon
Copy link
Member Author

The summary @gvanrossum gave of where things left off was accurate. Didn't come up with a recursive type that could work.

In response to @JukkaL about usefulness, I view it as useful for specifying what json.load() returns, not what json.dump() accepts. This is what I came across in my own code when I was trying to do proper type hinting but didn't have a way better than Any to express an method parameter that was accepting a JSON object that was received from GitHub's API.

@gvanrossum
Copy link
Member

Somehow this was closed but we don't even have consensus!

@gvanrossum gvanrossum reopened this Mar 21, 2016
@JukkaL
Copy link
Contributor

JukkaL commented Mar 21, 2016

I forgot to confirm that mypy doesn't support the kinds of recursive types discussed above, and there are no concrete plans to implement them.

@brettcannon I agree that JSON values are common in programs, but I'm not convinced that having a precise type would make it easy to type check common code that processes JSON data, because before accessing any value read from a JSON object, the code needs to use isinstance to narrow down the type from the union (assuming precise type checking of union types, similar to mypy). Most code I've seen is sloppy about this. Some code could be argued to be broken, but it's also possible that there is a top-level try/except statement that handles all errors, so the code might actually mostly do the right thing. (I can find an example if you are unsure about what I mean.) Also, it's possible that the code first verifies the entire JSON data structure and then accesses it, and the latter assumes that it has the correct structure. In the latter case a structural "dictionary-as-struct" type and an explicit cast might be best.

As there are many valid ways of processing JSON data, I think that Any is a reasonable default for the library stubs. User code could then use whatever static type for JSON data they want by adding an explicit type annotation. Thus I argue that it's not a good idea to make json.load() return a statically typed value. json.dump() is a little different and there a static argument type might make sense, but we don't have the means to describe the type of the argument in a useful enough way right now.

In order to describe types of JSON values precisely, these features would be useful:

  1. General recursive types -- for arbitrary JSON values
  2. "Dictionary-as-struct" types (Type for heterogeneous dictionaries with string keys #28) -- for JSON values conforming to a particular schema

(I started writing a proposal for (2) a while ago but got distracted.)

Neither are currently defined in PEP 484. The first one would be easy to specify but potentially tricky to implement. The latter would be tricky to specify and implement, and all the use cases are not clear to me yet. I suggest that we wait until a tool implements one or the other and then we can continue this discussion at a more concrete level.

@gvanrossum
Copy link
Member

I am very tempted to drop this idea. In my own code I use this:

JsonDict = Dict[str, Any]

which happens to cover perfectly what I'm doing (even though it sounds like Jukka has found some holes :-).

That really doesn't reach the threshold for adding it to a stub file to me (the definition exists in exactly two files).

@brettcannon
Copy link
Member Author

I'm fine with closing this if recursive types aren't in the pipeline. Obviously the generic JSON type Guido and I came up with is not a very tight definition and so is of limited value. If the preference is only to worry about tight-fitting type boundaries then this doesn't really make sense.

@gvanrossum
Copy link
Member

OK, I'm closing this, because of a combination of things:

  • We can't define JsonObject in a very tight way
  • It's a simple one-liner to define a suitable JsonObject in your own code
  • It's difficult to roll out the change in a useful way because we can't add it to the stdlib json module until 3.6 and it really doesn't belong in typing.py

Maybe we can just add the non-tight version to 3.6 and worry about tightening it up if/when we ever implement recursive types.

@JukkaL
Copy link
Contributor

JukkaL commented Mar 22, 2016

I suggest that even if we add the type to the module, we wouldn't use it as the return type of the load functions, at least for now (I discussed this above).

@gvanrossum
Copy link
Member

gvanrossum commented Mar 22, 2016 via email

@JukkaL
Copy link
Contributor

JukkaL commented Mar 22, 2016

If it returns JSONType then the first thing any code needs to do is to run an isinstance check for the returned value, as due to being a union, most operations won't be valid on it. However, in some cases it's arguably okay to assume that the returned value is a dict, for example. If this an internal data file, we can be reasonably sure that the format is what we expect. I didn't think about object_hook but that might be another thing to consider.

If a user wants to do the type check they can add an annotation if the return type is Any:

data = json.load(...)  # type: JSONType
if isinstance(data, dict): 
    ...

This, for example, would be rejected if the return type is a union, but would be fine if the return type is Any:

data = json.load(...)  # type: Dict[str, Any]   # error if load() return type is union
...

@gvanrossum
Copy link
Member

OK. you've convinced me that def dump(fp) -> JSONType i a bad idea. I guess def load (fp, obj: JSONType) is still acceptable except for the hook -- but because of the hook we can't use it there either. Maybe we should just leave well enough alone. @brettcannon?

@brettcannon
Copy link
Member Author

I'm fine with tossing this whole idea out. I wasn't sure if typing was trying to have something all the time when someone was willing to specify the type or to only when the type matching was tight. It seems like the latter is how you want to treat types which is fine and makes something as variable and loose as JSON not worth worrying about.

@gvanrossum
Copy link
Member

I think a key requirement is that stubs should not reject code that is in fact correct. Better to accept code that's wrong. Unless of course the correct code is very convoluted, but I don't think that using the hook qualifies as convoluted, and there's just too much code around that reads JSON code and dives in as if it knows what's there, accepting random TypeErrors if the JSON data is wrong.

@lk-geimfari
Copy link

lk-geimfari commented Oct 22, 2017

@gvanrossum You said that you use JsonDict = Dict[str, Any], but how to be if json has similar structure:

[
    {...},
    {...},
]

Is it correct?

from typing import Dict, List, Union, Any

JSONType = Union[
    Dict[str, Any],
    List[dict, Any],
]

@pbryan
Copy link

pbryan commented Oct 29, 2022 via email

@antonagestam
Copy link

antonagestam commented Oct 29, 2022

@pbryan mypy will enable recursive aliases by default in the next release: python/mypy#13516

@joooeey
Copy link

joooeey commented Nov 19, 2022

With the newest versions of Python (3.11.0) and MyPy (0.991), it looks like the one liner proposed by @wbolster works without any additional flags even for complex JSONS.

Json = dict[str, 'Json'] | list['Json'] | str | int | float | bool | None

Examples:

example_1: Json = None
mixed-type object
null = None
false = False
true = True

# https://www.appsloveworld.com/download-sample-json-file-with-multiple-records
example_2: Json = {
        "id": 4051,
        "name": "manoj",
        "email": "[email protected]",
        "password": "Test@123",
        "about": null,
        "token": "7f471974-ae46-4ac0-a882-1980c300c4d6",
        "country": null,
        "location": null,
        "lng": 0,
        "lat": 0,
        "dob": null,
        "gender": 0,
        "userType": 1,
        "userStatus": 1,
        "profilePicture": "Images/9b291404-bc2e-4806-88c5-08d29e65a5ad.png",
        "coverPicture": "Images/44af97d9-b8c9-4ec1-a099-010671db25b7.png",
        "enablefollowme": false,
        "sendmenotifications": false,
        "sendTextmessages": false,
        "enabletagging": false,
        "createdAt": "2020-01-01T11:13:27.1107739",
        "updatedAt": "2020-01-02T09:16:49.284864",
        "livelng": 77.389849,
        "livelat": 28.6282231,
        "liveLocation": "Unnamed Road, Chhijarsi, Sector 63, Noida, Uttar Pradesh 201307, India",
        "creditBalance": 127,
        "myCash": 0
    }
nested lists and dictionaries of strings with a slice hidden to throw off the type checker
# https://grabthiscode.com/javascript/complex-json-example
example_3: Json = {
"problems": [{
    "Diabetes":[{
        "medications":[{
            "medicationsClasses":[{
                "className":[{
                    "associatedDrug":[{
                        "name":"asprin",
                        "dose":"",
                        "strength":"500 mg"
                    }],
                    "associatedDrug#2":[{
                        "name":"somethingElse",
                        "dose":"",
                        "strength":"500 mg"
                    }]
                }],
                "className2":[{
                    "associatedDrug":[{
                        "name":"asprin",
                        "dose":"",
                        "strength":"500 mg"
                    }],
                    "associatedDrug#2":[{
                        "name":"somethingElse",
                        "dose":"",
                        "strength":"500 mg",
                        "invalid": slice(1, 45)
                    }]
                }]
            }]
        }],
        "labs":[{
            "missing_field": "missing_value"
        }]
    }],
    "Asthma":[{}]
}]}

The type check of all examples together discovers the slice I hid in the last example:

mypy jsonjson.py

jsonjson.py:83: error: Dict entry 3 has incompatible type "str": "slice"; expected "str": "Union[Dict[str, Json], List[Json], str, int, float, None]"  [dict-item]
Found 1 error in 1 file (checked 1 source file)

I'm not sure what MyPy does when data passes through several functions..

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Nov 19, 2022

All major type checkers now support recursive type aliases by default, so this should largely work:

JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None

Note that because dict is invariant, you might run into some issues e.g. with dict[str, str]. For such use cases you can use cast, and if you don't need mutability, something like the following might work:

JSON_ro: TypeAlias = Mapping[str, "JSON_ro"] | Sequence["JSON_ro"] | str | int | float | bool | None

Although this can sometimes result in false negatives, for instance, see python/mypy#13786 (comment)

As always, if you return a union type, your callers will have to assert the structure of their JSON in order to type check. If this an issue, return e.g. dict[str, Any] or Any or cast or type-ignore or just bite the bullet and assert.

Given that it is trivial to define these aliases, that as you can see they're not necessarily one-size-fits-all, and that explicit is better than implicit, and that they don't need any type checker special casing, I am not in favour of adding these to CPython's typing.py. However if you like importing stuff, we may ship this and some other useful type definitions in a PyPI package at some point; there also appears to be https://github.com/kevinheavey/jsonalias. If you have opinions about this, please see python/typing_extensions#6.

In order to keep this message visible, I may lock this thread. Please strongly consider posting follow-ups at one of the above locations.

Thank you to everyone who contributed support for recursive type aliases to type checkers — cheers and happy typing!

iamleot added a commit to iamleot/transferwee that referenced this issue May 20, 2023
For each dict that we control properly annotate the type of key and values.

For JSON controlled by others (i.e. returned by API)... it's pretty
hard! For the moment just annotate all of them via a generic `dict[Any,
Any]'.

(We can define a custom JSON type annotation as documented by
<python/typing#182> but we will still have
several problems and at the end we will always return Any.)

All of this was pointed out via `mypy --strict .'.
iamleot added a commit to iamleot/transferwee that referenced this issue May 20, 2023
For each dict that we control properly annotate the type of key and values.

For JSON controlled by others (i.e. returned by API)... it's pretty
hard! For the moment just annotate all of them via a generic `dict[Any,
Any]'.

(We can define a custom JSON type annotation as documented by
<python/typing#182> but we will still have
several problems and at the end we will always return Any.)

All of this was pointed out via `mypy --strict .'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests