-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-43086: Added handling for excess data in binascii.a2b_base64 #24402
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). Recognized GitHub usernameWe couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames: This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
This PR is stale because it has been open for 30 days with no activity. |
Any feedback? 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a regression test to Lib/test/test_binascii.py
?
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
In addition to the above, this PR as is causes a bunch of test suite failures that need to be addressed. |
@gpshead While writing the tests for my PR I noticed something: I thought about ignoring newlines (or any other character in So, I'm opening this to discussion - should the b2a method be re-implemented too? |
Decoder wise it is a good idea for I suggest adding an additional keyword only parameter as a flag when implementing this:
There will be a minority of applications out there that want the extra data ignoring behavior. This allows them to retain that without implementing their function wrapper to discard it first. If I'm understanding the current behavior correctly, people who want their code to have today's existing behavior of ignoring all trailing data can use and be compatible with the library before and after this fix:
|
Adding such a The reason I recommend that default behavior at all is to maintain compatibility with the expectations of existing code including the round trip from b2a -> a2b working. If we ever want to change the default newline behavior in the future, it should be done on both b2a and a2b with our usual "couple of release cycles" deprecation period. I'm not convinced the mere newline behavior default warrants changing. |
It is also plausible that other users of a2b_base64 in the stdlib may want to ignore more padding and should be updated to use the |
@gpshead Finally had the time to come back to this subject. In the current implementation, every character that isn't one of the following is ignored: ''.join([chr(_) if (table_a2b_base64[_] == -1) else '' for _ in range(128)])
# '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*,-.:;<>?@[\\]^_`{|}~\x7f Which can lead to the same problem that we talked about earlier. a2b_base64(b'ab==') # b'i'
a2b_base64(b'a:(){:|:&};:b==') # b'i' So, if we combine your solution with this "bigger" problem, the solution is much simpler, if (strict_mode && table_a2b_base64[ascii_data[i]] == -1) {
binascii_state *state = PyModule_GetState(module);
if (state) {
PyErr_SetString(state->Error, "Only base64 data is allowed when using strict mode");
}
...
} So the parameter would be |
Forgot to mention that the The counterpart function (b2a) will have the same flag to respond to the new update, ofc. |
Figured that implementing it will save some time later if the idea is approved, so I did it. Some examples: >>> from binascii import a2b_base64
>>> a2b_base64("Yb==\n")
b'a'
>>> a2b_base64("Yb==")
b'a'
>>> a2b_base64("Yb :(){:|:&};: ==")
b'a'
>>> a2b_base64("Yb :(){:|:&};: ==", strict_mode=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
binascii.Error: Only base64 data is allowed when using strict mode
>>> a2b_base64("Y b==", strict_mode=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
binascii.Error: Only base64 data is allowed when using strict mode
>>> a2b_base64("Yb==\n", strict_mode=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
binascii.Error: Excess data after padding is not allowed when using strict mode
>>> a2b_base64("Y\x00b==", strict_mode=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
binascii.Error: Only base64 data is allowed when using strict mode
>>> a2b_base64("Y\nb==", strict_mode=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
binascii.Error: Only base64 data is allowed when using strict mode Documentation:
|
I haven't had a chance to look this over yet, but I like where you've headed with this and the addition of strict_mode. |
Awesome! |
@gpshead can you re-review my changes? |
…raions under the same scope. This should fix the build errors.
No need to mention "strict mode" within the error message.
FWIW I believe my change of the error message from 'Malformed padding' to 'Leading padding' in this PR was half incorrect. I overlooked the label and goto that caused that message to be used in multiple circumstances. The later goto malformed_padding should probably just be turned into one raising its own unique error message. not a huge deal, but a potentially misleading error message in the |
* origin/main: (1146 commits) bpo-42064: Finalise establishing sqlite3 global state (pythonGH-27155) bpo-44678: Separate error message for discontinuous padding in binascii.a2b_base64 strict mode (pythonGH-27249) correct spelling (pythonGH-27076) bpo-44524: Add missed __name__ and __qualname__ to typing module objects (python#27237) bpo-27513: email.utils.getaddresses() now handles Header objects (python#13797) Clean up comma usage in Doc/library/functions.rst (python#27083) bpo-42238: Fix small rst issue in NEWS.d/. (python#27238) bpo-41972: Tweak fastsearch.h string search algorithms (pythonGH-27091) bpo-44340: Add support for building with clang full/thin lto (pythonGH-27231) bpo-44661: Update property_descr_set to use vectorcall if possible. (pythonGH-27206) bpo-44645: Check for interrupts on any potentially backwards edge (pythonGH-27216) bpo-41546: make pprint (like print) not write to stdout when it is None (pythonGH-26810) bpo-44554: refactor pdb targets (and internal tweaks) (pythonGH-26992) bpo-43086: Add handling for out-of-spec data in a2b_base64 (pythonGH-24402) bpo-44561: Update hyperlinks in Doc/distributing/index.rst (python#27032) bpo-42355: symtable.get_namespace() now checks whether there are multiple or any namespaces found (pythonGH-23278) bpo-44654: Do not export the union type related symbols (pythonGH-27223) bpo-44633: Fix parameter substitution of the union type with wrong types. (pythonGH-27218) bpo-44654: Refactor and clean up the union type implementation (pythonGH-27196) bpo-20291: Fix MSVC warnings in getargs.c (pythonGH-27211) ...
Currently, when providing
binascii.a2b_base64()
base-64 input with excess data after the padding (=
/==
), the excess data is ignored.Example:
Note: MANY libraries (such as the all-time favorite
base64
) use this function as their decoder.Why is it problematic:
The logic behind my fix PR on GitHub:
we should check if there's no more data after the padding.
Though not publicly disclosed, this behavior can lead to security issues in heavily-used projects.
Preventing this behavior sounds more beneficial than harmful, since there's no known good usage for this behavior.
From what I read, the python implementation in not so close (when speaking about this case of course) to the base64 RFC.
(link: https://tools.ietf.org/html/rfc4648#section-3.3)
Thanks to Ori Damari for bringing this behavior up,
and thanks to Ryan Mast, and many of the other great guys for discussing the problem in the comments.
Link to the tweet
Idan Moral
Twitter: https://twitter.com/idan_moral
GitHub: https://github.com/idan22moral
https://bugs.python.org/issue43086