-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected exceptions raised while parsing untrusted inputs using cbor2.loads #198
Comments
The most concerning error is that |
The second largest concern is that |
Great minds think alike! I was actually fuzzing My FROM debian:12-slim
RUN apt update && apt install -y \
clang \
git \
python3-full \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
RUN python3 --version
RUN mkdir /app
WORKDIR /app
# Subject to change by upstream
# https://github.com/google/atheris/issues/36
ENV LIBFUZZER_LIB "/usr/lib/llvm-14/lib/clang/14.0.6/lib/linux/libclang_rt.fuzzer_no_main-aarch64.a"
ENV VIRTUAL_ENV "/opt/venv"
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH "$VIRTUAL_ENV/bin:$PATH"
# https://github.com/google/atheris#building-from-source
RUN python3 -m pip install --no-binary atheris atheris
RUN git clone https://github.com/google/atheris.git
RUN python3 -m pip install atheris/
# https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#step-1-compiling-your-extension
ENV CC "/usr/bin/clang"
ENV CFLAGS "-fsanitize=address,undefined,fuzzer-no-link"
ENV CXX "/usr/bin/clang++"
ENV CXXFLAGS "-fsanitize=address,undefined,fuzzer-no-link"
ENV LDSHARED "/usr/bin/clang -shared"
# https://github.com/agronholm/cbor2
ENV CBOR2_BUILD_C_EXTENSION "1"
RUN git clone https://github.com/agronholm/cbor2.git
RUN python3 -m pip install cbor2/
# Subject to change by upstream, but it's just a sanity check
RUN nm "$VIRTUAL_ENV/lib/python3.11/site-packages/_cbor2.cpython-311-aarch64-linux-gnu.so" \
| grep asan \
&& echo "Found ASAN" \
|| echo "Missing ASAN"
# Allow Atheris to find fuzzer sanitizer shared libs
# https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#option-a-sanitizerlibfuzzer-preloads
ENV LD_PRELOAD "$VIRTUAL_ENV/lib/python3.11/site-packages/asan_with_fuzzer.so"
# Skip memory allocation failures for now
ENV ASAN_OPTIONS "allocator_may_return_null=1"
COPY fuzz.py fuzz.py
ENTRYPOINT ["python", "fuzz.py"] And my fuzz harness: #!/usr/bin/python3
import sys
import atheris
with atheris.instrument_imports():
# _cbor2 ensures the C library is imported
from _cbor2 import loads
# Inspired by: https://github.com/google/oss-fuzz/blob/master/projects/ujson/ujson_fuzzer.py
def TestOneInput(data):
try:
loads(data)
except Exception:
# We're searching for memory corruption, not Python exceptions
pass
def main():
# Since everything interesting in this fuzzer is in native code, we can
# disable Python coverage to improve performance and reduce coverage noise.
atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=False)
atheris.Fuzz()
if __name__ == "__main__":
main() Build, then run the Docker image:
This then produces a crash like:
Which we can confirm like so:
This appears at the following location:
Which seems to be this code: Line 653 in 850545c
I'm not sure about exploitability here. Memory corruption in C code has more potential for exploitation than Python exceptions. I also did notice this big warning in the |
I just pushed a fix for the |
I also fixed that segfault now. |
Found another one, this one appears to be a stack overflow:
This appears to be coming from: Lines 142 to 152 in af84761
|
This seems to happen when there's a CBOR tag that points to itself ( |
The problem is now that we either shouldn't accept hashable values in |
I genuinely don't know how to solve this conflict. The test suite blows up either way: if I remove the hash method, |
Requiring hashable values for CBORTag seems to be a non-starter, as some semantic tags like 261 require maps (dict) as values. |
Why not just modify the C code to detect the infinite recursion and raise an exception? |
This is precisely what I'm doing now. That said, this isn't easy for me as I didn't write the C extension and I have precious little experience working with the Python C API. As doing this safely requires the use of thread locals, I don't even know where to put the threadlocals object so that it can be garbage collected when the module is unloaded. |
For reference, this is the Python implementation I came up with: thread_locals = threading.local()
@total_ordering
class CBORTag:
...
def __hash__(self) -> int:
self_id = id(self)
try:
running_hashes = thread_locals.running_hashes
except AttributeError:
running_hashes = thread_locals.running_hashes = set()
if self_id in running_hashes:
raise RuntimeError(
"This CBORTag is not hashable because it contains a reference to itself"
)
running_hashes.add(self_id)
try:
return hash((self.tag, self.value))
finally:
running_hashes.remove(self_id)
if not running_hashes:
del thread_locals.running_hashes |
Have you considered setting an attribute on the specific To fully support concurrency and parallelism, a |
What if multiple threads are trying to compute the hash on the same CBORTag object?
Why? What can a ContextVar do in this case that threadlocals can't? |
Context variables are a higher-level interface built on top of the lower-level thread locals. Using a thread local means that it's only safe to run tasks one after the other in a thread, whereas using a context variable means that it's safe to run multiple tasks concurrently in the same thread. |
I have a candidate fix for that stack overflow issue in #202. |
Except that in this case there won't be any task switching since |
Task switching can happen anywhere with an extension like greenlet. Your |
The possibility of someone switching tasks in |
I'm already uncomfortable with the complexity that had to be added to |
The |
That should hopefully be the last PR to fix the big issues. |
I've fixed the problems originally reported here. I believe that, to fix all the problems thoroughly, a rewrite would be needed, but I don't have the bandwidth for that, and I have to draw the line somewhere in order to move on to other projects. I've released v5.6.0 which contains these fixes. |
Should |
What difference would that make, and to whom? |
Good question, I would suspect that downstream consumers of this library would want to know if they're running a potentially insecure version (i.e. < |
Things to check first
I have searched the existing issues and didn't find my bug already reported there
I have checked that my bug is still present in the latest release
cbor2 version
5.5.1
Python version
3.10.12
What happened?
I have a script which is parsing untrusted data using the
cbor2.loads
method. This script is trying to verify if the provided data is cbor encoded.The implementation was as follows:
try: cbor2.loads(b'\x959;{{{{{{{{{{{{{') except CBORDecodeError: print('no cbor encoded')
For some inputs, I've noticed that
MemoryError
is raised instead ofCBORDecodeError
.To better understand the problem and ensure that this is only one strange case while parsing untrusted data I've run fuzzer against cbor2.loads method.
It seems that the
cbor2.loads
method is not able to parse untrusted data properly - in the worst case cbor2 is trying to allocate the whole memory - ref to the `MemoryError' case presented in the code above.I was able to find following exceptions raised by cbor2.loads (all reproduced using cbor2 5.5.1/python 3.10.12/Ubuntu 20.4):
I was trying to analyze how it could be improved but it is not an easy task for somebody who does not maintain this code. Is it possible to improve it somehow?
The expected and ideal solution would be to have
CBORDecodeError
raised in case of not valid input cbor data.How can we reproduce the bug?
Code to reproduce mentioned exceptions:
cbor2 has been testes using atheris fuzzer and the following code:
The text was updated successfully, but these errors were encountered: