Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad JSON subclass can produce invalid JSON #123183

Closed
nineteendo opened this issue Aug 20, 2024 · 7 comments
Closed

Bad JSON subclass can produce invalid JSON #123183

nineteendo opened this issue Aug 20, 2024 · 7 comments
Labels
pending The issue will be closed if no feedback is provided type-bug An unexpected behavior, bug, or error

Comments

@nineteendo
Copy link
Contributor

nineteendo commented Aug 20, 2024

Bug report

Bug description:

import io
import json

class BadDict(dict):
    def __len__(self) -> int:
        return 1

class BadList(list):
    def __len__(self) -> int:
        return 1

fp = io.StringIO()
json.dump([BadDict(), BadList()], fp, indent=4)
print(fp.getvalue())
[
    {
        
    },
    
    ]
]

CPython versions tested on:

3.12

Operating systems tested on:

Windows

@nineteendo nineteendo added the type-bug An unexpected behavior, bug, or error label Aug 20, 2024
@nineteendo
Copy link
Contributor Author

Note that we have done similar fixes for int.__repr__() and float.__repr__().

@zware
Copy link
Member

zware commented Aug 20, 2024

I don't think this is worth "fixing"; if your code is lying to you, you're going to get bad results.

@zware zware added the pending The issue will be closed if no feedback is provided label Aug 20, 2024
@ericvsmith
Copy link
Member

I agree with @zware: this isn't worth "fixing". Is this causing a problem in practice?

@serhiy-storchaka
Copy link
Member

for int.__repr__() and float.__repr__().

There is a reason for this -- enums. I do not know any dict or list subclass that lies about its length.

AFAIK, this issue was already reported in the past and closed as "won't fix".

@sobolevn
Copy link
Member

I agree, broken data in - broken data out.

@sobolevn sobolevn closed this as not planned Won't fix, can't repro, duplicate, stale Aug 21, 2024
@ronaldoussoren
Copy link
Contributor

Silent errors are not ideal.

Something like this (completely untested) patch would at least raise an exception on this kind of broken data instead of silently creating a broken JSON document:

diff --git a/Lib/json/encoder.py b/Lib/json/encoder.py
index 45f5477418..b6c8adbd06 100644
--- a/Lib/json/encoder.py
+++ b/Lib/json/encoder.py
@@ -324,6 +324,10 @@ def _iterencode_list(lst, _current_indent_level):
                 else:
                     chunks = _iterencode(value, _current_indent_level)
                 yield from chunks
+
+        if first:
+            raise ValueError("non-empty list without elements")
+
         if newline_indent is not None:
             _current_indent_level -= 1
             yield '\n' + _indent * _current_indent_level

@nineteendo
Copy link
Contributor Author

This can happen with a normal list if modified by another thread and it's not that difficult to fix:

diff --git a/Lib/json/encoder.py b/Lib/json/encoder.py
index 323332f064..b621083d33 100644
--- a/Lib/json/encoder.py
+++ b/Lib/json/encoder.py
@@ -284,12 +284,12 @@ def _iterencode_list(lst, _current_indent_level):
             if markerid in markers:
                 raise ValueError("Circular reference detected")
             markers[markerid] = lst
-        buf = '['
+        yield '['
         if _indent is not None:
             _current_indent_level += 1
             newline_indent = '\n' + _indent * _current_indent_level
             separator = _item_separator + newline_indent
-            buf += newline_indent
+            yield newline_indent
         else:
             newline_indent = None
             separator = _item_separator
@@ -298,25 +298,24 @@ def _iterencode_list(lst, _current_indent_level):
             if first:
                 first = False
             else:
-                buf = separator
+                yield separator
             if isinstance(value, str):
-                yield buf + _encoder(value)
+                yield _encoder(value)
             elif value is None:
-                yield buf + 'null'
+                yield 'null'
             elif value is True:
-                yield buf + 'true'
+                yield 'true'
             elif value is False:
-                yield buf + 'false'
+                yield 'false'
             elif isinstance(value, int):
                 # Subclasses of int/float may override __repr__, but we still
                 # want to encode them as integers/floats in JSON. One example
                 # within the standard library is IntEnum.
-                yield buf + _intstr(value)
+                yield _intstr(value)
             elif isinstance(value, float):
                 # see comment above for int
-                yield buf + _floatstr(value)
+                yield _floatstr(value)
             else:
-                yield buf
                 if isinstance(value, (list, tuple)):
                     chunks = _iterencode_list(value, _current_indent_level)
                 elif isinstance(value, dict):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants