-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DoS vulnerability in Scala 2.12 HashMap #11203
Comments
How do you suggest we test whether something is comparable. Even if it is an instance of Also, that would fix it for Strings, but not for any custom case class that contains a String, because those would suffer from the same poor hash, without being Comparable anyway. |
Another option would be to use a random seed: https://doc.rust-lang.org/std/collections/struct.HashMap.html |
(some background on SipHash: http://131002.net/siphash/siphash.pdf, see in particular section 7) |
... but I guess any kind of randomness wouldn't play well with sending things across the wire through serialization though |
Here's on overview of the current situation (as of 2.13.0-M5): All our hash maps and hash sets are affected (and they all need to be fixed individually)
|
BTW, we use different collision resolution schemes in our implementation. I haven't looked at all the immutable maps and sets yet but I just started working on mutable HashMaps and HashSets (independently of this ticket). |
I think the Java solution was probably expedient for Java, but it isn't inherently the right way to fix the issue. If you want a Instead, we need a new type of map where the keys are orderable but are not necessarily ordered. (Likewise with sets.) Then we have the compile-time guarantee (so long as the ordering is sensible) that DOS cannot occur. This is far better for people wishing to produce robust systems, I think. Alternatively, if we want a quick hacky fix, I'd just intercept the hashing of |
discussion at https://gitter.im/scala/contributors?at=5bc1dbe41e23486b93b70784 ("Do people expect hash maps to be secure against collisions of untrusted data?") |
Customizable hashing is another option. If hash collections alloweduser-defined hashing methods you could use a more secure seeded hash for security-critical applications. |
And no matter which way we go, 2.12.8 as a target is probably not possible (at least not for all affected collection types). Even with the magical behind-the-scenes hack that Java uses you need to change the internal data structures, which are not actually internal. They are exposed with package-private visibility and actively used by libraries like https://github.com/scala/scala-java8-compat/ |
If this is only (mostly) about the DOS attack vectors, could we just limit the number of allowed collisions and fail if there's a (configurable) suspicious ratio of collisions? |
You mean, under the expectation that you can compare hashCodes across JVM instances? Otherwise, you don't really have to transfer hash codes with serialization and just recalculate them when deserializing. |
I really don't want my hashmaps to have arbitrary implementation defined failures. if inserting certain elements starts failing in some way then my application is crippled, and this can still cause denial of services |
Yes, no clue what would break if this assumption is violated |
It's not the same failure mode, though. It's like rejecting inputs that would lead to an OOM preventively with an exception. In this case, it would throw an exception if input was detected which has characteristics that don't fit the assumptions about your data when you chose a HashMap in the first place (i.e. that hashCodes are uniformly distributed). In which way would that cripple the application on innocent inputs? |
Imagine a persistent hashmaps shared between multiple users, if I fill it with enough malicious data, then any attempt by other users to add something to it would throw an exception |
Probably not, because while adding malicious data would push the ratio (of collisions per size) towards the limit adding any other data would in average move the ratio away from the limit. |
…ray#277 The problem is that with String's hashCode implementation it is too simple to create synthetic collisions. This allows an attacker to create an object with keys that all collide which leads to a performance drop for the HashMap just for creating the map in the first place. See scala/bug#11203 for more information about the underlying HashMap issue. For the time being, it seems safer to use a TreeMap which uses String ordering. Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects up to 100 keys. Benchmark (_size) (parser) Mode Cnt Score Error Units ExtractFieldsBenchmark.readSpray 1 HashMap thrpt 5 1195832.262 ± 64366.605 ops/s ExtractFieldsBenchmark.readSpray 1 TreeMap thrpt 5 1342009.641 ± 17307.555 ops/s ExtractFieldsBenchmark.readSpray 10 HashMap thrpt 5 237173.327 ± 70341.742 ops/s ExtractFieldsBenchmark.readSpray 10 TreeMap thrpt 5 233510.618 ± 69638.750 ops/s ExtractFieldsBenchmark.readSpray 100 HashMap thrpt 5 23202.016 ± 1514.763 ops/s ExtractFieldsBenchmark.readSpray 100 TreeMap thrpt 5 21899.072 ± 823.225 ops/s ExtractFieldsBenchmark.readSpray 1000 HashMap thrpt 5 2073.754 ± 66.093 ops/s ExtractFieldsBenchmark.readSpray 1000 TreeMap thrpt 5 1793.329 ± 43.603 ops/s ExtractFieldsBenchmark.readSpray 10000 HashMap thrpt 5 208.160 ± 7.466 ops/s ExtractFieldsBenchmark.readSpray 10000 TreeMap thrpt 5 160.349 ± 5.809 ops/s
…ray#277 The problem is that with String's hashCode implementation it is too simple to create synthetic collisions. This allows an attacker to create an object with keys that all collide which leads to a performance drop for the HashMap just for creating the map in the first place. See scala/bug#11203 for more information about the underlying HashMap issue. For the time being, it seems safer to use a TreeMap which uses String ordering. Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects up to 100 keys. Benchmark for non-colliding keys: Benchmark (_size) (parser) Mode Cnt Score Error Units ExtractFieldsBenchmark.readSpray 1 HashMap thrpt 5 1195832.262 ± 64366.605 ops/s ExtractFieldsBenchmark.readSpray 1 TreeMap thrpt 5 1342009.641 ± 17307.555 ops/s ExtractFieldsBenchmark.readSpray 10 HashMap thrpt 5 237173.327 ± 70341.742 ops/s ExtractFieldsBenchmark.readSpray 10 TreeMap thrpt 5 233510.618 ± 69638.750 ops/s ExtractFieldsBenchmark.readSpray 100 HashMap thrpt 5 23202.016 ± 1514.763 ops/s ExtractFieldsBenchmark.readSpray 100 TreeMap thrpt 5 21899.072 ± 823.225 ops/s ExtractFieldsBenchmark.readSpray 1000 HashMap thrpt 5 2073.754 ± 66.093 ops/s ExtractFieldsBenchmark.readSpray 1000 TreeMap thrpt 5 1793.329 ± 43.603 ops/s ExtractFieldsBenchmark.readSpray 10000 HashMap thrpt 5 208.160 ± 7.466 ops/s ExtractFieldsBenchmark.readSpray 10000 TreeMap thrpt 5 160.349 ± 5.809 ops/s
…ray#277 The problem is that with String's hashCode implementation it is too simple to create synthetic collisions. This allows an attacker to create an object with keys that all collide which leads to a performance drop for the HashMap just for creating the map in the first place. See scala/bug#11203 for more information about the underlying HashMap issue. For the time being, it seems safer to use a TreeMap which uses String ordering. Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects up to 100 keys. Benchmark for non-colliding keys: Benchmark (_size) (parser) Mode Cnt Score Error Units ExtractFieldsBenchmark.readSpray 1 HashMap thrpt 5 1195832.262 ± 64366.605 ops/s ExtractFieldsBenchmark.readSpray 1 TreeMap thrpt 5 1342009.641 ± 17307.555 ops/s ExtractFieldsBenchmark.readSpray 10 HashMap thrpt 5 237173.327 ± 70341.742 ops/s ExtractFieldsBenchmark.readSpray 10 TreeMap thrpt 5 233510.618 ± 69638.750 ops/s ExtractFieldsBenchmark.readSpray 100 HashMap thrpt 5 23202.016 ± 1514.763 ops/s ExtractFieldsBenchmark.readSpray 100 TreeMap thrpt 5 21899.072 ± 823.225 ops/s ExtractFieldsBenchmark.readSpray 1000 HashMap thrpt 5 2073.754 ± 66.093 ops/s ExtractFieldsBenchmark.readSpray 1000 TreeMap thrpt 5 1793.329 ± 43.603 ops/s ExtractFieldsBenchmark.readSpray 10000 HashMap thrpt 5 208.160 ± 7.466 ops/s ExtractFieldsBenchmark.readSpray 10000 TreeMap thrpt 5 160.349 ± 5.809 ops/s
@SethTisue Shouldn't this be kept open until we at least have proper user documentation on which maps are "DoS-safe" and which aren't ? |
I wouldn't be opposed to be re-opening it, but I don't think it's a blocker for 2.12.8. @lrytz? |
Could reschedule to 2.12.9. |
It's not blocking 2.12.8. I think @szeiger is working on a ordering-based implementation for 2.13? |
scala/scala#7633 was merged and included in 2.13.0 perhaps a pull request backporting |
@SethTisue I have created a ticket on behalf of you: scala/scala-collection-compat#234 |
…et` call that isn't safe: scala/bug#11203
…et` call that isn't safe: scala/bug#11203
…t an internal `toSet` call that isn't safe: scala/bug#11203)
Currently any Scala collection is vulnerable when affected maps or sets are used internally in methods like: |
…et` call that isn't safe: scala/bug#11203
…et` call that isn't safe: scala/bug#11203
* Yet more safe and efficient removing of keys without an internal `toSet` call that isn't safe: scala/bug#11203 * Avoid usage of vulnerable methods even in tests and docs
In 2011, a vulnerability was raised against Java application servers about a DoS possibility that exploited Java String's vulnerability to collisions. This vulnerability was so widespread and fundamental that it was fixed not in the application servers, but in the JDK's
HashMap
implementation.Scala's
HashMap
has the same vulnerability. This has been brought to our attention through this issue raised against play-json:playframework/play-json#186
This vulnerability doesn't just affect play-json. It affects anything that uses Scala's default map implementation to store String keyed data where the keys are controlled remotely. So, HTTP headers, HTTP forms, JSON, any library that uses Scala's Map for any of these is vulnerable, so that includes Play, Akka HTTP, and many, many other Scala libraries.
The fix that the JDK did is quite simple, when buckets in the hash table got too big due to poor hashing (or malicious collisions), it reverted to essentially using a
TreeMap
in the bucket, and if, determined by reflection, the keys are Comparable, it uses the compareTo method to compare them.Currently, we use
ListMap
in the case of collisions:Obviously that comment is wrong, if you're under attack, they won't be rare at all. I think a simple solution here would be to modify
HashMapCollision1
such that it has both aListMap
and aTreeMap
, anything that implementsComparable
can be put in and queried from theTreeMap
, and everything else in theListMap
.I don't think there's much consequence to doing this, the biggest impact will be a potential change in iteration order of colliding elements when you merge two maps - if the keys implement
Comparable
, the ordering will change from keys from the first map followed by keys in the second map to lexical ordering, and if you're mixingComparable
and nonComparable
keys, the ordering gets a bit weirder. But it's still stable. And, who really depends on ordering in hash maps? And it only affects ordering when there are collisions. There's also a slight increase in space used, but again, it's only for collisions, for 99.9999% of the world, it will have no impact.The text was updated successfully, but these errors were encountered: