-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getPastEvents throws uncaught exception when processing invalid UTF-8 characters [1.0.0-beta.34] #1610
Comments
utf8 throws an uncatchable exception when it cannot decode hex. From UTF8:
For my implementation purposes I have patched line 218 of utils: With
commit (don't mind the meta) I'm not going to be submitting a PR because this is not a universally safe patch, but works for my implementation. Leaving open. here's the hex that broke web3 for me: Cheers. |
Maybe https://github.com/mathiasbynens/wtf-8 might be useful for that case? |
@hexagon6 Agreed, but this would be a bigger change than the above hack. |
This would appear to be a serious bug that really needs to be fixed, as any web3 dependent service using this library, which takes any form of user input, could be taken down from this bug. |
I am considering adding an “unsafeString” type or something to ethers.js that would help web3 deal with this issue, which would allow for invalid surrogate pairs, overlong sequences and invalid ordinal bits set. It is a hard problem to solve, because not sanitizing the output can lead to serious security implications for dapps; if you were to hash the string, it would not match, for example, which may be used in a commit-reveal. If I can trick you into committing to a wrong hash, you cannot reveal, which may mean you cannot properly challenge, for example. I would also recommend that Web3.js also add UTF8 sanitization to all contract calls, so it fails before it hits the blockchain. That said, there are still multiple ways to try interpreting invalid UTF8 sequences. The current UTF8 library allows a “allowInvalid” flag (which is never set to true, but it was easy to add and potentially useful) which uses the “ignore” strategy (which simply drops the naughty bits), but for this I think the replace strategy makes more sense, replacing the invalid sequence with the placeholder, And continues to the next octet with the ordinal but cleared... I believe (please correct me if I’m wrong), that’s what the native JS string does. I think for now, you could also use the “bytes” type for decoding, which doesn’t care about data representation. But that needs to be further in, since the contract sighash needs the string “string” to match. Still thinking about this though... |
Agreed 👌
I totally agree with you on this!
Yep, this should work. I was checking if there are other libs who do have the replacement logic as for example python does have (@ricmoo showed me the functionalities python has). But sadly there is currently no lib which does handle this as I would expect. Luckily is @ricmoo already creating such an improved UTF-8 lib based on already existing code 💪 I will wait until Richard has finished his lib and will use it for the |
Copying over my comments on how an API design change might look to work with this. I assume the use of the TextEncoder standard in the comments, but probably the Referring to Add an option, acknowledging that parsing event data might fail due to string processing, maybe called In particular, |
Just a note: we tried to reproduce this bug in #3497 without success. If anyone has a reproduction case, please advise. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions |
@cgewecke At one point I had the exact block number and event that caused |
@cgewecke In the original thread I mentioned a hex that brought up the issue for me. You could include that hex string in a unit test against whatever module is presently implemented for
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. If you believe this was a mistake, please comment. |
Problem
utf8.decode()
utilized by Web3 throws an uncatchable error if there is an invalid utf-8 character.Impact
Contracts with lack of or improper user input validation on public methods.
What I expect to happen
Web3js catches the error and skips the record
What actually happens
Irrecoverable error
Possible Solution
Catch the error and skip the record
The text was updated successfully, but these errors were encountered: