bpo-43086: Added handling for excess data in binascii.a2b_base64 #24402

idan22moral · 2021-01-31T18:18:47Z

Currently, when providing binascii.a2b_base64() base-64 input with excess data after the padding (=/==), the excess data is ignored.

Example:

import binascii
binascii.a2b_base64(b'aGVsbG8=')       # b'hello' (valid)
binascii.a2b_base64(b'aGVsbG8==')      # b'hello' (ignoring data)
binascii.a2b_base64(b'aGVsbG8=python') # b'hello' (ignoring data)

Note: MANY libraries (such as the all-time favorite base64) use this function as their decoder.

Why is it problematic:

User input can contain additional data after base64 data, which can lead to unintended behavior in products.
Well-crafted user input can be used to bypass conditions in code (example in the referenced tweet).
Can be used to target vulnerable libraries and bypass authentication mechanism such as JWT (potentially).

The logic behind my fix PR on GitHub:

Before deciding to finish the function (after knowing the fact that we passed the data padding),
we should check if there's no more data after the padding.
If excess data exists, we should raise an error, free the allocated writer, and return null.
Else, everything's fine, and we can proceed to the function's end as previously.

Though not publicly disclosed, this behavior can lead to security issues in heavily-used projects.
Preventing this behavior sounds more beneficial than harmful, since there's no known good usage for this behavior.

From what I read, the python implementation in not so close (when speaking about this case of course) to the base64 RFC.
(link: https://tools.ietf.org/html/rfc4648#section-3.3)

Thanks to Ori Damari for bringing this behavior up,
and thanks to Ryan Mast, and many of the other great guys for discussing the problem in the comments.

Link to the tweet

Idan Moral
Twitter: https://twitter.com/idan_moral
GitHub: https://github.com/idan22moral

https://bugs.python.org/issue43086

the-knights-who-say-ni · 2021-01-31T18:18:50Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@idan22moral

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

github-actions · 2021-03-05T00:30:30Z

This PR is stale because it has been open for 30 days with no activity.

idan22moral · 2021-03-06T01:59:15Z

Any feedback? 😅

gpshead

Can you add a regression test to Lib/test/test_binascii.py?

Modules/binascii.c

bedevere-bot · 2021-03-13T09:43:47Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

gpshead · 2021-03-13T09:44:59Z

In addition to the above, this PR as is causes a bunch of test suite failures that need to be addressed.

idan22moral · 2021-03-13T14:38:43Z

@gpshead While writing the tests for my PR I noticed something:
binascii.b2a_base64 encodes a given payload and adds a newline at the end (no idea what's the reason behind it).
When trying to decode ab==\n, the Excess data after padding ... error raises, which makes the relation between b2a and a2b broken.

I thought about ignoring newlines (or any other character in table_a2b_base64 that's mapped to -1) after the padding,
but the same security problem raises, since it's still excess data.

So, I'm opening this to discussion - should the b2a method be re-implemented too?

gpshead · 2021-03-13T20:23:27Z

b2a_base64 has always added a newline by default. (digging into why would require code history spelunking probably back to the 1990s; I could guess but it doesn't really matter). The ability to disable adding the newline was added in 3.6 via https://bugs.python.org/issue25357 as the base64 module does not want the newline.

Decoder wise it is a good idea for a2b_base64 to accept what b2a_base64 produces by default so I recommend we continue to accept a single trailing \n if present. We're still much better off from a security perspective being strict and only having a mere two possible variants of input ascii data that lead to the same bytes output instead of today's infinite possibilities by ignoring all trailing data after padding = signs.

I suggest adding an additional keyword only parameter as a flag when implementing this:

# Ignores a single newline at the end, regardless of `=`.
a2b_base64(ascii, ignore_trailing_newline=True)  # Make this the default.
# and
a2b_base64(ascii, ignore_trailing_newline=False)   # No extra data is allowed ever (raises an Error).

There will be a minority of applications out there that want the extra data ignoring behavior. This allows them to retain that without implementing their function wrapper to discard it first. If I'm understanding the current behavior correctly, people who want their code to have today's existing behavior of ignoring all trailing data can use and be compatible with the library before and after this fix:

binary = a2b_base64(ascii.split('=', 1)[0])

gpshead · 2021-03-13T20:47:22Z

Adding such a ignore_trailing_newline=True parameter to allow disabling the newline behavior is probably best done in a followup PR for this issue as the new parameter can't be backported to stable releases.

The reason I recommend that default behavior at all is to maintain compatibility with the expectations of existing code including the round trip from b2a -> a2b working.

If we ever want to change the default newline behavior in the future, it should be done on both b2a and a2b with our usual "couple of release cycles" deprecation period. I'm not convinced the mere newline behavior default warrants changing.

gpshead · 2021-03-13T21:10:15Z

It is also plausible that other users of a2b_base64 in the stdlib may want to ignore more padding and should be updated to use the split trick or similar. For example https://tools.ietf.org/html/rfc4648#section-3.3 suggests MIME and thus the email package is likely one that prefers leniency. Code inspection to find out would be required if no tests explicitly cover these situations.

idan22moral · 2021-03-29T14:25:19Z

@gpshead Finally had the time to come back to this subject.
I thought about it for a few more days and here are some more insights:

In the current implementation, every character that isn't one of the following is ignored:

''.join([chr(_) if (table_a2b_base64[_] == -1) else '' for _ in range(128)])
# '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*,-.:;<>?@[\\]^_`{|}~\x7f

Which can lead to the same problem that we talked about earlier.
For example:

a2b_base64(b'ab==') # b'i'
a2b_base64(b'a:(){:|:&};:b==') # b'i'

So, if we combine your solution with this "bigger" problem, the solution is much simpler,
and it is to raise an exception when:

if (strict_mode && table_a2b_base64[ascii_data[i]] == -1) {
    binascii_state *state = PyModule_GetState(module);
    if (state) {
        PyErr_SetString(state->Error, "Only base64 data is allowed when using strict mode");
    }
    ...
}

So the parameter would be strict_mode, which by default will be False.
(Strict Mode - non-base64 data in not allowed)
What's your opinion on this matter?

idan22moral · 2021-03-29T17:19:05Z

Forgot to mention that the strict_mode flag controls the excess data behavior too,
so when it's set to True and there's excess data (even newlines) - an exception will raise.

The counterpart function (b2a) will have the same flag to respond to the new update, ofc.
Edit: b2a already has the newline flag, so strict_mode is not necessary there.

Also documented the new argument.

idan22moral · 2021-03-30T22:19:53Z

Figured that implementing it will save some time later if the idea is approved, so I did it.

Some examples:

>>> from binascii import a2b_base64
>>> a2b_base64("Yb==\n")
b'a'
>>> a2b_base64("Yb==")
b'a'
>>> a2b_base64("Yb :(){:|:&};: ==")
b'a'
>>> a2b_base64("Yb :(){:|:&};: ==", strict_mode=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
binascii.Error: Only base64 data is allowed when using strict mode
>>> a2b_base64("Y b==", strict_mode=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
binascii.Error: Only base64 data is allowed when using strict mode
>>> a2b_base64("Yb==\n", strict_mode=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
binascii.Error: Excess data after padding is not allowed when using strict mode
>>> a2b_base64("Y\x00b==", strict_mode=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
binascii.Error: Only base64 data is allowed when using strict mode
>>> a2b_base64("Y\nb==", strict_mode=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
binascii.Error: Only base64 data is allowed when using strict mode

Documentation:

a2b_base64(data, /, *, strict_mode=False)
    Decode a line of base64 data.
    
    strict_mode
      When set to True, bytes that are not part of the base64 standard are not allowed.
      The same applies to excess data after padding (= / ==).

gpshead · 2021-03-31T18:50:02Z

I haven't had a chance to look this over yet, but I like where you've headed with this and the addition of strict_mode.

idan22moral · 2021-04-01T22:06:07Z

Awesome!
Let me know how we proceed with this issue when you have the time 🙂

Also, made the tests more generic.

idan22moral · 2021-07-14T16:23:31Z

@gpshead can you re-review my changes?

…raions under the same scope. This should fix the build errors.

No need to mention "strict mode" within the error message.

gpshead · 2021-07-19T16:47:09Z

FWIW I believe my change of the error message from 'Malformed padding' to 'Leading padding' in this PR was half incorrect. I overlooked the label and goto that caused that message to be used in multiple circumstances. The later goto malformed_padding should probably just be turned into one raising its own unique error message.

not a huge deal, but a potentially misleading error message in the = in the middle of data circumstance.

* origin/main: (1146 commits) bpo-42064: Finalise establishing sqlite3 global state (pythonGH-27155) bpo-44678: Separate error message for discontinuous padding in binascii.a2b_base64 strict mode (pythonGH-27249) correct spelling (pythonGH-27076) bpo-44524: Add missed __name__ and __qualname__ to typing module objects (python#27237) bpo-27513: email.utils.getaddresses() now handles Header objects (python#13797) Clean up comma usage in Doc/library/functions.rst (python#27083) bpo-42238: Fix small rst issue in NEWS.d/. (python#27238) bpo-41972: Tweak fastsearch.h string search algorithms (pythonGH-27091) bpo-44340: Add support for building with clang full/thin lto (pythonGH-27231) bpo-44661: Update property_descr_set to use vectorcall if possible. (pythonGH-27206) bpo-44645: Check for interrupts on any potentially backwards edge (pythonGH-27216) bpo-41546: make pprint (like print) not write to stdout when it is None (pythonGH-26810) bpo-44554: refactor pdb targets (and internal tweaks) (pythonGH-26992) bpo-43086: Add handling for out-of-spec data in a2b_base64 (pythonGH-24402) bpo-44561: Update hyperlinks in Doc/distributing/index.rst (python#27032) bpo-42355: symtable.get_namespace() now checks whether there are multiple or any namespaces found (pythonGH-23278) bpo-44654: Do not export the union type related symbols (pythonGH-27223) bpo-44633: Fix parameter substitution of the union type with wrong types. (pythonGH-27218) bpo-44654: Refactor and clean up the union type implementation (pythonGH-27196) bpo-20291: Fix MSVC warnings in getargs.c (pythonGH-27211) ...

Added handling for excess data in binascii.a2b_base64

52d83e9

the-knights-who-say-ni added the CLA not signed label Jan 31, 2021

bedevere-bot added the awaiting review label Jan 31, 2021

📜🤖 Added by blurb_it.

557bce8

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Feb 1, 2021

github-actions bot added the stale Stale PR or inactive for long period of time. label Mar 5, 2021

github-actions bot removed the stale Stale PR or inactive for long period of time. label Mar 7, 2021

gpshead self-assigned this Mar 13, 2021

gpshead requested changes Mar 13, 2021

View reviewed changes

Modules/binascii.c Outdated Show resolved Hide resolved

bedevere-bot added awaiting changes and removed awaiting review labels Mar 13, 2021

Added if-state guard

6ed5193

idan22moral added 3 commits March 30, 2021 21:57

Merge branch 'master' of https://github.com/idan22moral/cpython

4ee90f5

Implemented the strict mode logic

c7b723f

Also documented the new argument.

Merge branch 'master' of https://github.com/python/cpython

93f497a

idan22moral added 5 commits July 10, 2021 13:18

Added tests for padding-only input

3c5758b

Added tests to validate the default behavior.

5f8df5b

Also, made the tests more generic.

Described the changed of this pull request

644dbaf

Modified syntax of RST

464484c

Updated the docs to explain the strict_mode parameter

e1ccf8a

idan22moral requested a review from gpshead July 10, 2021 19:09

idan22moral and others added 9 commits July 16, 2021 19:10

Moved declaration of state to the beginning to prevent multiple decla…

d6a5cbf

…raions under the same scope. This should fix the build errors.

Merge branch 'main' into master

5abc68f

Corrected the RST syntax for argument in docs (italic)

08aa26c

Corrected the RST syntax for argument in news (italic)

2f1990e

Removed whitespace that lead to build failure

fa959bd

use self.assertEqual instead of assert ==

a60a8c6

remove leadg | characters in NEWS

0307272

Simplify the error messages.

d26e1eb

No need to mention "strict mode" within the error message.

update test for error message (Leading vs Malformed)

652e7f4

gpshead approved these changes Jul 18, 2021

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels Jul 18, 2021

gpshead merged commit 35b98e3 into python:main Jul 19, 2021

bedevere-bot removed the awaiting merge label Jul 19, 2021

This was referenced Jul 19, 2021

Master #27248

Closed

bpo-44678: Separate error message for discontinuous padding in binascii.a2b_base64 strict mode #27249

Merged

idan22moral mentioned this pull request Aug 21, 2021

bpo-44690: Adopt binacii.a2b_base64's strict mode in base64.b64decode #27272

Merged

idan22moral mentioned this pull request Nov 4, 2022

binascii.a2b_base64 ignores padding inside a string #94703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-43086: Added handling for excess data in binascii.a2b_base64 #24402

bpo-43086: Added handling for excess data in binascii.a2b_base64 #24402

idan22moral commented Jan 31, 2021 •

edited by bedevere-bot

Loading

the-knights-who-say-ni commented Jan 31, 2021

github-actions bot commented Mar 5, 2021

idan22moral commented Mar 6, 2021

gpshead left a comment

bedevere-bot commented Mar 13, 2021

gpshead commented Mar 13, 2021

idan22moral commented Mar 13, 2021

gpshead commented Mar 13, 2021

gpshead commented Mar 13, 2021

gpshead commented Mar 13, 2021

idan22moral commented Mar 29, 2021

idan22moral commented Mar 29, 2021 •

edited

Loading

idan22moral commented Mar 30, 2021

gpshead commented Mar 31, 2021

idan22moral commented Apr 1, 2021

idan22moral commented Jul 14, 2021 •

edited

Loading

gpshead commented Jul 19, 2021

bpo-43086: Added handling for excess data in binascii.a2b_base64 #24402

bpo-43086: Added handling for excess data in binascii.a2b_base64 #24402

Conversation

idan22moral commented Jan 31, 2021 • edited by bedevere-bot Loading

the-knights-who-say-ni commented Jan 31, 2021

Recognized GitHub username

github-actions bot commented Mar 5, 2021

idan22moral commented Mar 6, 2021

gpshead left a comment

Choose a reason for hiding this comment

bedevere-bot commented Mar 13, 2021

gpshead commented Mar 13, 2021

idan22moral commented Mar 13, 2021

gpshead commented Mar 13, 2021

gpshead commented Mar 13, 2021

gpshead commented Mar 13, 2021

idan22moral commented Mar 29, 2021

idan22moral commented Mar 29, 2021 • edited Loading

idan22moral commented Mar 30, 2021

gpshead commented Mar 31, 2021

idan22moral commented Apr 1, 2021

idan22moral commented Jul 14, 2021 • edited Loading

gpshead commented Jul 19, 2021

idan22moral commented Jan 31, 2021 •

edited by bedevere-bot

Loading

idan22moral commented Mar 29, 2021 •

edited

Loading

idan22moral commented Jul 14, 2021 •

edited

Loading