-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible DOS due to hash collision #864
Comments
Author here: Happy to help with this. |
Indeed, it's old and known problem. See e.g. https://oleg.fi/gists/posts/2021-05-19-dont-default-to-hashmap.html Yet, I'm not keen at changing |
With all due respect, that makes this worse, not better.
That's a much bigger breaking change and would make it entirely new library. |
@NorfairKing In the post you indicated a rejected PR, could you link that here now that the issue is publicly discussed? @phadej Among the proposed paths forward, is there a short-term solution you'd be in favor of? This issue deserves a short term fix, even if it isn't the desired end state. |
It's linked in the post but here's the link again: haskell-unordered-containers/unordered-containers#217 |
EDIT: bonus points for providing |
Just a suggestion, but maybe I have suspicion that cooking up collisions for Blake2 (even with a small digest size) would be prohibitively expensive for the purpose of DoS attack |
Another solution could be using some map data structure specialized to This would have the advantage (apart from not being susceptible to hash DoS attacks obviously) that it doesn't (potentially) cause lots of string comparisons (this could be a problem with |
@konsumlamm my solution is to make |
More concretely, something like: https://github.com/haskell/aeson/compare/textmap The rest is just I hope someone will complete that and make a PR. |
I'm going to try and push that through @phadej. |
@phadej What did you have in mind with the |
I often work with YAML files where the order of object properties is important. Humans edit YAML files, and a logical order of object properties maximizes the usability. I do not know of a way to do this in Haskell using existing libraries, and this issue unfortunately sometimes results in having to use a different language. This comment is in agreement with the current direction, but I am adding it as a concrete example of a real-world issue that Oleg's suggestion would resolve, in addition to the hash collision issue. |
@Boarders, thanks for working on this!
look at the diff, probably something like lift (TextMap m) = [| TextMap (H.fromList . map (first pack) $ m') |]
where m' = map (first unpack) . H.toList $ m It would be good to first introduce |
Probably this is relevant |
I realise the following probably isn't advisable because of That said, |
I completely agree with this, as the order issue has been a significant paint point for me with the current implementation. I brainstormed about Aeson Object Design in my blog. Here are my ideas in brief, which address the The type is named type ObjectEntries = Vector (Text, Value) For working with other types, an class Object o where
lookup :: Text -> o -> Maybe Value
(.:) :: (FromJSON a, Object o) => o -> Text -> Parser a
... Instances can be provided for built-in types, and users are free to define instances for other types: instance Object ObjectEntries where
lookup k = fmap snd . Vector.find ((== k) . fst)
instance Object (HashMap Text Value) where
lookup = HashMap.lookup
... The "with" function is named withObjectEntries
:: String
-> (ObjectEntries -> Parser a)
-> Value
-> Parser a Helper functions can be provided to perform conversions to other types: asHashMap
:: (HashMap Text Value -> Parser a)
-> ObjectEntries
-> Parser a
asHashMap f = f . HashMap.fromList . Vector.toList The example from the documentation would be written as follows: instance FromJSON Coord where
parseJSON = withObjectEntries "Coord" . asHashMap $ \v -> Coord
<$> v .: "x"
<*> v .: "y" This allows the user to decide how to implement each instance. For example, a public API may prioritize security while an internal or administrative API may prioritize performance.
|
For my own codebases I'm not convinced that |
Isn't this issue about supposedly O(1) operation degenerating to O(n) (i.e. linear)... |
Indeed. (The DoS vulnerability is in the construction of the The current fix allows users to globally choose between In some cases, it would even be possible to use the list (better than Would delaying the construction of the ( Thank you very much for your time and patience. |
Just wanted to chime in that I agree it'll be great for the Haskell ecosystem if whatever solution comes out of this discussion lets us keep the order of JSON object properties. Regardless of how wrong it may be to rely on the order of the properties, many tools and standards still do it, and it's a nightmare to work with them using Haskell. Feels suddenly like the entire ecosystem is taken away from you (i.e. not only |
I ended up running some benchmarks (blog entry, code). On my laptop (UPDATED after feedback from Talyor):
In the context of the current fix, using |
@TravisCardwell: Did you run those benchmarks with I wouldn't expect Once you get to n=32, My takeaway is that you should basically never use |
Indeed I did! Thank you very much for catching that! I have updated my code and blog entry to fix the mistake. I also updated the stats in my comment above to make sure people are not misled by the problematic stats. Thanks for sharing your benchmark code. I look forward to using gauge in the future!
That sounds good. On my (old and slow) laptop, |
@phadej, if it's an old and known problem, why hasn't it been resolved? Is it too difficult? Too time-consuming? Not enough interest? Not in agreement there is a problem? |
If anyone has an API-compatible fork that fixes this issue, please let me know. I'll replace all my uses of aeson with it, even if performance is 10 times worse. |
|
I changed the default of Though there is no guarantee it won't be changed again: that's the point of it being internal. If it is important, please pin the flag explicitly in your |
@phadej, for us laymen, can you explain how this is resolved and what level of certainty we can place in the solution? |
@ketzacoatl upgrade to EDIT i.e. the plan as in #864 (comment) was implemented, but with different module/type names. |
@phadej, thank you for clarifying! Are there tests that exist or could be added to this package as a regression test or otherwise help us confirm this is resolved? |
@ketzacoatl What's a good way to test this? We could generate colliding inputs and make sure that the resulting map gets constructed in reasonable time, but that only prevents reverting back to unordered-containers specifically, doesn't that seem overly narrow? If the lesson learned is "don't use vulnerable hashmaps" I think enough noise was made about this issue to make maintainers vigilant for years to come at least as far as aeson is concerned. I'm unsure about this. I could imagine abusing this line of thinking to justify not testing for any bugs, but at the same time I'm not convinced this issue in particular is one that can easily be tested against. |
Testing performance issues is tricky. I'll be happy to learn how to make non-flaky performance tests. |
When it's about the difference between O(n) and O(n^2), you can set n to a big number like n=10k and it's about detecting a 10k order of magnitude difference. That doesn't require much sophistication, does it? Make a test run in 0.1s without collisions, but 100s with collisions, set the cut-off at 1s. |
for Doable, but I don't have time to spend on making that work reliably. Such complexity testing tool would be nice though. |
@phadej; this might fit your use-case: https://github.com/nh2/haskell-cpu-instruction-counter |
Sadly, it's not on Hackage, nh2/haskell-cpu-instruction-counter#8 |
I just explained why that's not the case. You do not need to actually measure the curve, just find one point where the discrepancy is large enough for all reasonable machines where one might run the test:
How do you expect 0.1s on one machine ever take more than 1s on another machine? (and these numbers are just for the sake of example, if a 10x discrepancy is actually not enough slack you can easily turn it up to 100x or 1000x) You might get a false positive on a very busy machine but then you're likely to be in much more trouble already anyway. |
Why no curve? To test pass the curve should be e.g. n log n with some confidence. It is easy to make a test which will report true negatives, but we want a test which doesn't report false positives, i.e. doesn't tell that everything is ok when it isn't. |
You can test two points: if processing a tenfold data increases time 50x, then it's highly unlikely to be |
I think @phadej wants to test a stronger hypothesis to find out which asymptotics we are in: linearifmic or quadratic. But @Lysxia makes a (perfectly valid imo) point that a much easier test -- to catch a difference between the two asymptotics (e.g. with explicit |
@ulysses4ever but we have only one implementation at the time to test, there is nothing to compare to. Maybe I'm failing to explain, so I'd welcome the patch of your suggestions, so we can discuss something concrete. |
I could be misunderstanding you, but like all regression tests, I thought the point is to distinguish the implementation before and the implementation after the bug was fixed. @NorfairKing Your opinion would be valuable on this. Is a regression test worth adding to this library, considering that it could realistically only catch a regression to a specific hashtable implementation with a specific hashing function, which is likely to end up being obsolete if people start to act on the issue in the unordered-containers library (haskell-unordered-containers/unordered-containers#319)? |
I don't think so. |
(I think it would be interesting to see any benchmark comparison(s) of 1.5 and 2.0 anyway) |
As detailed in a blog, Aeson is susceptible to DOS if given maliciously-crafted JSON. This is likely a tired conversation for many developers, but with the public posting today it seems good to have a public issue tracking the discussion and resolution.
The text was updated successfully, but these errors were encountered: