Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected exceptions raised while parsing untrusted inputs using cbor2.loads #198

Closed
2 tasks done
robertstypa opened this issue Dec 20, 2023 · 27 comments
Closed
2 tasks done
Labels

Comments

@robertstypa
Copy link

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

cbor2 version

5.5.1

Python version

3.10.12

What happened?

I have a script which is parsing untrusted data using the cbor2.loads method. This script is trying to verify if the provided data is cbor encoded.

The implementation was as follows:
try: cbor2.loads(b'\x959;{{{{{{{{{{{{{') except CBORDecodeError: print('no cbor encoded')

For some inputs, I've noticed that MemoryError is raised instead of CBORDecodeError.

To better understand the problem and ensure that this is only one strange case while parsing untrusted data I've run fuzzer against cbor2.loads method.

It seems that the cbor2.loads method is not able to parse untrusted data properly - in the worst case cbor2 is trying to allocate the whole memory - ref to the `MemoryError' case presented in the code above.

I was able to find following exceptions raised by cbor2.loads (all reproduced using cbor2 5.5.1/python 3.10.12/Ubuntu 20.4):

# OverflowError: timestamp out of range for platform time_t
cbor2.loads(b'\xc1\x1b\x9b\x9b\x9b\x00\x00\x00\x00\x00')
# OSError: OSError: [Errno 75] Value too large for defined data type
cbor2.loads(b'\xc1\x1b\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16')
# MemoryError:
cbor2.loads(b'\x959;{{{{{{{{{{{{{')
# TypeError: object of type 'int' has no len()
cbor2.loads(b'\xd8%\x00\x10`\x00\x00\x00`\x10\x00\x00\x00\x00\x00\x00')
# SystemError: <built-in function loads> returned NULL without setting an error
cbor2.loads(b'\xd8\x1e\x84\xff\xff\xff\xff')
# re.error: unbalanced parenthesis at position 0
cbor2.loads(b'\xd8#A)')
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
cbor2.loads(b'b\n\xff')

I was trying to analyze how it could be improved but it is not an easy task for somebody who does not maintain this code. Is it possible to improve it somehow?

The expected and ideal solution would be to have CBORDecodeError raised in case of not valid input cbor data.

How can we reproduce the bug?

Code to reproduce mentioned exceptions:

import cbor2
cbor2.loads(b'\xc1\x1b\x9b\x9b\x9b\x00\x00\x00\x00\x00')
cbor2.loads(b'\xc1\x1b\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16')
cbor2.loads(b'\x959;{{{{{{{{{{{{{')
cbor2.loads(b'\xd8%\x00\x10`\x00\x00\x00`\x10\x00\x00\x00\x00\x00\x00')
cbor2.loads(b'\xd8\x1e\x84\xff\xff\xff\xff')
cbor2.loads(b'\xd8#A)')
cbor2.loads(b'b\n\xff')

cbor2 has been testes using atheris fuzzer and the following code:

import sys
import atheris
import pprint

with atheris.instrument_imports():
    import cbor2


EXCEPTIONS = {}
pp = pprint.PrettyPrinter(indent=4)

def fuzz_cbor2(data):
    try:
        cbor2.loads(data)
    except cbor2.CBORError:
        # CBORError is expected for some data
        pass
    except Exception as e:
        if type(e) not in EXCEPTIONS:
            EXCEPTIONS[type(e)] = data.hex()
            print(f"Found new exception {e}")
            print("************** status *************")
            pp.pprint(EXCEPTIONS)


if __name__ == "__main__":
    atheris.Setup(sys.argv, fuzz_cbor2)
    atheris.Fuzz()
@agronholm
Copy link
Owner

The most concerning error is that SystemError that says it returned NULL without setting an error. This looks like a bug in the C decoder implementation.

@agronholm
Copy link
Owner

The second largest concern is that MemoryError, as it has the potential for a DoS attack.

@mschwager
Copy link
Contributor

Great minds think alike! I was actually fuzzing cbor2 with Atheris very recently too. However, I focused on the C implementation and looked for memory corruption bugs. I did manage to find at least one, and there may be more. I would recommend setting up regular fuzz testing for this project. Here's what I came up with.

My Dockerfile for reproduction (note some paths may have to change, like aarch64):

FROM debian:12-slim

RUN apt update && apt install -y \
    clang \
    git \
    python3-full \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

RUN python3 --version

RUN mkdir /app
WORKDIR /app

# Subject to change by upstream
# https://github.com/google/atheris/issues/36
ENV LIBFUZZER_LIB "/usr/lib/llvm-14/lib/clang/14.0.6/lib/linux/libclang_rt.fuzzer_no_main-aarch64.a"

ENV VIRTUAL_ENV "/opt/venv"
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH "$VIRTUAL_ENV/bin:$PATH"

# https://github.com/google/atheris#building-from-source
RUN python3 -m pip install --no-binary atheris atheris
RUN git clone https://github.com/google/atheris.git
RUN python3 -m pip install atheris/

# https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#step-1-compiling-your-extension
ENV CC "/usr/bin/clang"
ENV CFLAGS "-fsanitize=address,undefined,fuzzer-no-link"
ENV CXX "/usr/bin/clang++"
ENV CXXFLAGS "-fsanitize=address,undefined,fuzzer-no-link"
ENV LDSHARED "/usr/bin/clang -shared"

# https://github.com/agronholm/cbor2
ENV CBOR2_BUILD_C_EXTENSION "1"
RUN git clone https://github.com/agronholm/cbor2.git
RUN python3 -m pip install cbor2/

# Subject to change by upstream, but it's just a sanity check
RUN nm "$VIRTUAL_ENV/lib/python3.11/site-packages/_cbor2.cpython-311-aarch64-linux-gnu.so" \
    | grep asan \
    && echo "Found ASAN" \
    || echo "Missing ASAN"

# Allow Atheris to find fuzzer sanitizer shared libs
# https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#option-a-sanitizerlibfuzzer-preloads
ENV LD_PRELOAD "$VIRTUAL_ENV/lib/python3.11/site-packages/asan_with_fuzzer.so"

# Skip memory allocation failures for now
ENV ASAN_OPTIONS "allocator_may_return_null=1"

COPY fuzz.py fuzz.py
ENTRYPOINT ["python", "fuzz.py"]

And my fuzz harness:

#!/usr/bin/python3

import sys
import atheris

with atheris.instrument_imports():
    # _cbor2 ensures the C library is imported
    from _cbor2 import loads


# Inspired by: https://github.com/google/oss-fuzz/blob/master/projects/ujson/ujson_fuzzer.py
def TestOneInput(data):
    try:
        loads(data)
    except Exception:
        # We're searching for memory corruption, not Python exceptions
        pass


def main():
    # Since everything interesting in this fuzzer is in native code, we can
    # disable Python coverage to improve performance and reduce coverage noise.
    atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=False)
    atheris.Fuzz()


if __name__ == "__main__":
    main()

Build, then run the Docker image:

$ docker build -t cbor2-fuzz -f Dockerfile
$ docker run -v $(pwd):/tmp/output/ cbor2-fuzz -artifact_prefix=/tmp/output/

This then produces a crash like:

...
SUMMARY: AddressSanitizer: SEGV /usr/include/python3.11/object.h:537:9 in Py_DECREF
==1==ABORTING
MS: 2 CMP-CrossOver- DE: "\001\010\000\000"-; base unit: 1b850b729fc7234e8dcac9406224b06d235affb7
0xae,0xae,0xae,0xae,0xae,0xae,0xae,0xae,0xae,0x1,0x8,0xc2,0x98,0x43,0xd9,0x1,0x0,0xd8,0x24,0x9f,0x0,0x0,0xae,0xae,0xff,0xc2,0x6c,0xa7,0x99,
\256\256\256\256\256\256\256\256\256\001\010\302\230C\331\001\000\330$\237\000\000\256\256\377\302l\247\231
artifact_prefix='/tmp/output/'; Test unit written to /tmp/output/crash-95e879135b949a863283a11eaf98bb7b3b109783
Base64: rq6urq6urq6uAQjCmEPZAQDYJJ8AAK6u/8Jsp5k=

Which we can confirm like so:

$ echo -n "rq6urq6urq6uAQjCmEPZAQDYJJ8AAK6u/8Jsp5k=" | python -m cbor2.tool -d
Segmentation fault: 11

This appears at the following location:

==1==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0xffff96e7cc38 bp 0xfffffec31660 sp 0xfffffec31580 T0)
==1==The signal is caused by a READ memory access.
==1==Hint: address points to the zero page.
    #0 0xffff96e7cc38 in Py_DECREF /usr/include/python3.11/object.h:537:9
    #1 0xffff96e7cc38 in decode_definite_string /app/cbor2/source/decoder.c:653:9
    #2 0xffff96e7cc38 in decode_string /app/cbor2/source/decoder.c:718:15
    #3 0xffff96e79778 in decode /app/cbor2/source/decoder.c:1735:27
...

Which seems to be this code:

Py_DECREF(ret);

I'm not sure about exploitability here. Memory corruption in C code has more potential for exploitation than Python exceptions. I also did notice this big warning in the Py_DECREF docs. I'm not sure if that's applicable in this situation, but again, it's cause for concern.

@agronholm
Copy link
Owner

I just pushed a fix for the SystemError, but I bet there are plenty of others. Also those errors need to be turned into decoding errors later.

@agronholm
Copy link
Owner

I also fixed that segfault now.

@mschwager
Copy link
Contributor

Found another one, this one appears to be a stack overflow:

$ echo -n "wcq/v9gcytgdABYAZR0AFgBlJQ==" | base64 --decode | python3 -c 'import cbor2, sys; cbor2.loads(sys.stdin.buffer.read())'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==72==ERROR: AddressSanitizer: stack-overflow on address 0xffffeaec5fe0 (pc 0x000000481d80 bp 0xffffeaec6030 sp 0xffffeaec6030 T0)
    #0 0x481d80 in _PyObject_GC_NewVar (/usr/bin/python3.11+0x481d80) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #1 0x48ba14  (/usr/bin/python3.11+0x48ba14) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #2 0x48a80c  (/usr/bin/python3.11+0x48a80c) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #3 0x563724 in _Py_BuildValue_SizeT (/usr/bin/python3.11+0x563724) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #4 0xffffb96a8440 in CBORTag_hash /app/cbor2/source/tags.c:145:21
    #5 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #6 0xffffb96a844c in CBORTag_hash /app/cbor2/source/tags.c:149:21
    #7 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #8 0xffffb96a844c in CBORTag_hash /app/cbor2/source/tags.c:149:21
    #9 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #10 0xffffb96a844c in CBORTag_hash /app/cbor2/source/tags.c:149:21
    #11 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #12 0xffffb96a844c in CBORTag_hash /app/cbor2/source/tags.c:149:21
    #13 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
...

This appears to be coming from:

cbor2/source/tags.c

Lines 142 to 152 in af84761

static Py_hash_t
CBORTag_hash(CBORTagObject *self)
{
PyObject *tmp = Py_BuildValue("(KO)", self->tag, self->value);
if(!tmp){ // if tmp is NULL, Py_BuildValue has already set the error
return -1;
}
Py_hash_t ret = PyObject_Hash(tmp);
Py_DECREF(tmp);
return ret;
}

@agronholm
Copy link
Owner

Found another one, this one appears to be a stack overflow:

$ echo -n "wcq/v9gcytgdABYAZR0AFgBlJQ==" | base64 --decode | python3 -c 'import cbor2, sys; cbor2.loads(sys.stdin.buffer.read())'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==72==ERROR: AddressSanitizer: stack-overflow on address 0xffffeaec5fe0 (pc 0x000000481d80 bp 0xffffeaec6030 sp 0xffffeaec6030 T0)
    #0 0x481d80 in _PyObject_GC_NewVar (/usr/bin/python3.11+0x481d80) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #1 0x48ba14  (/usr/bin/python3.11+0x48ba14) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #2 0x48a80c  (/usr/bin/python3.11+0x48a80c) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #3 0x563724 in _Py_BuildValue_SizeT (/usr/bin/python3.11+0x563724) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #4 0xffffb96a8440 in CBORTag_hash /app/cbor2/source/tags.c:145:21
    #5 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #6 0xffffb96a844c in CBORTag_hash /app/cbor2/source/tags.c:149:21
    #7 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #8 0xffffb96a844c in CBORTag_hash /app/cbor2/source/tags.c:149:21
    #9 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #10 0xffffb96a844c in CBORTag_hash /app/cbor2/source/tags.c:149:21
    #11 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
    #12 0xffffb96a844c in CBORTag_hash /app/cbor2/source/tags.c:149:21
    #13 0x493e34  (/usr/bin/python3.11+0x493e34) (BuildId: 15a1b7b17a3e246ca60bac3646ced99af27ca711)
...

This appears to be coming from:

cbor2/source/tags.c

Lines 142 to 152 in af84761

static Py_hash_t
CBORTag_hash(CBORTagObject *self)
{
PyObject *tmp = Py_BuildValue("(KO)", self->tag, self->value);
if(!tmp){ // if tmp is NULL, Py_BuildValue has already set the error
return -1;
}
Py_hash_t ret = PyObject_Hash(tmp);
Py_DECREF(tmp);
return ret;
}

This seems to happen when there's a CBOR tag that points to itself (self is self.value), so when Python tries to compute the hash, it causes infinite recursion. This happens with the Python implementation as well.

@agronholm
Copy link
Owner

The problem is now that we either shouldn't accept hashable values in CBORTag, or we should remove the __hash__() method, making CBORTags ineligible for use as set items or dict keys.

@agronholm
Copy link
Owner

I genuinely don't know how to solve this conflict. The test suite blows up either way: if I remove the hash method, test_nested_dict() fails, and if I disallow non-hashable values, several more tests fail. I can't even look to other Python CBOR implementations for clues, as this is the only one supporting CBOR tags.

@agronholm
Copy link
Owner

agronholm commented Dec 27, 2023

Requiring hashable values for CBORTag seems to be a non-starter, as some semantic tags like 261 require maps (dict) as values.

@Changaco
Copy link
Contributor

Why not just modify the C code to detect the infinite recursion and raise an exception?

@agronholm
Copy link
Owner

This is precisely what I'm doing now. That said, this isn't easy for me as I didn't write the C extension and I have precious little experience working with the Python C API. As doing this safely requires the use of thread locals, I don't even know where to put the threadlocals object so that it can be garbage collected when the module is unloaded.

@agronholm
Copy link
Owner

For reference, this is the Python implementation I came up with:

thread_locals = threading.local()

@total_ordering
class CBORTag:
	...

    def __hash__(self) -> int:
        self_id = id(self)
        try:
            running_hashes = thread_locals.running_hashes
        except AttributeError:
            running_hashes = thread_locals.running_hashes = set()

        if self_id in running_hashes:
            raise RuntimeError(
                "This CBORTag is not hashable because it contains a reference to itself"
            )

        running_hashes.add(self_id)
        try:
            return hash((self.tag, self.value))
        finally:
            running_hashes.remove(self_id)
            if not running_hashes:
                del thread_locals.running_hashes

@Changaco
Copy link
Contributor

Have you considered setting an attribute on the specific CBORTag object being hashed instead of using a global running_hashes set? It should be easier and more memory-efficient.

To fully support concurrency and parallelism, a ContextVar should probably be used instead of a thread local.

@agronholm
Copy link
Owner

Have you considered setting an attribute on the specific CBORTag object being hashed instead of using a global running_hashes set? It should be easier and more memory-efficient.

What if multiple threads are trying to compute the hash on the same CBORTag object?

To fully support concurrency and parallelism, a ContextVar should probably be used instead of a thread local.

Why? What can a ContextVar do in this case that threadlocals can't?

@Changaco
Copy link
Contributor

Context variables are a higher-level interface built on top of the lower-level thread locals. Using a thread local means that it's only safe to run tasks one after the other in a thread, whereas using a context variable means that it's safe to run multiple tasks concurrently in the same thread.

@agronholm
Copy link
Owner

I have a candidate fix for that stack overflow issue in #202.

@agronholm
Copy link
Owner

Context variables are a higher-level interface built on top of the lower-level thread locals. Using a thread local means that it's only safe to run tasks one after the other in a thread, whereas using a context variable means that it's safe to run multiple tasks concurrently in the same thread.

Except that in this case there won't be any task switching since __hash__() is synchronous, so context variables do nothing that thread locals wouldn't.

@Changaco
Copy link
Contributor

Changaco commented Dec 28, 2023

Task switching can happen anywhere with an extension like greenlet. Your __hash__ method is calling the self.value.__hash__ method, which is unknown code that can do anything, including directly or indirectly call greenlet.

@agronholm
Copy link
Owner

The possibility of someone switching tasks in __hash__() seems so remote to me that we can probably ignore it altogether. The Python @reprlib.recursive_repr decorator has the same issue and it has no measures whatsoever for thread safety.

@agronholm
Copy link
Owner

I'm already uncomfortable with the complexity that had to be added to CBORTag.__hash__(). Maybe in a future major release we could make it unhashable, or prevent the mutation of the fields so that hashability could be computed in the initializer.

@agronholm
Copy link
Owner

The MemoryError comes from the C implementation trying to allocate 8286760429 gigabytes of memory for a string.

@agronholm
Copy link
Owner

That should hopefully be the last PR to fix the big issues.

@agronholm
Copy link
Owner

I've fixed the problems originally reported here. I believe that, to fix all the problems thoroughly, a rewrite would be needed, but I don't have the bandwidth for that, and I have to draw the line somewhere in order to move on to other projects. I've released v5.6.0 which contains these fixes.

@mschwager
Copy link
Contributor

Should v5.6.0 be considered a security release? Considering that there appear to be potential memory corruption bugs.

@agronholm
Copy link
Owner

What difference would that make, and to whom?

@mschwager
Copy link
Contributor

What difference would that make, and to whom?

Good question, I would suspect that downstream consumers of this library would want to know if they're running a potentially insecure version (i.e. < v5.6.0). The easiest way to accomplish that would probably be issuing a security advisory, which would then automatically be picked up by tools like Dependabot and Snyk 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants