Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DoS vulnerability in Scala 2.12 HashMap #11203

Closed
jroper opened this issue Oct 12, 2018 · 28 comments
Closed

DoS vulnerability in Scala 2.12 HashMap #11203

jroper opened this issue Oct 12, 2018 · 28 comments

Comments

@jroper
Copy link

jroper commented Oct 12, 2018

In 2011, a vulnerability was raised against Java application servers about a DoS possibility that exploited Java String's vulnerability to collisions. This vulnerability was so widespread and fundamental that it was fixed not in the application servers, but in the JDK's HashMap implementation.

Scala's HashMap has the same vulnerability. This has been brought to our attention through this issue raised against play-json:

playframework/play-json#186

This vulnerability doesn't just affect play-json. It affects anything that uses Scala's default map implementation to store String keyed data where the keys are controlled remotely. So, HTTP headers, HTTP forms, JSON, any library that uses Scala's Map for any of these is vulnerable, so that includes Play, Akka HTTP, and many, many other Scala libraries.

The fix that the JDK did is quite simple, when buckets in the hash table got too big due to poor hashing (or malicious collisions), it reverted to essentially using a TreeMap in the bucket, and if, determined by reflection, the keys are Comparable, it uses the compareTo method to compare them.

Currently, we use ListMap in the case of collisions:

  // 32-bit hash collision (rare, but not impossible)
  new HashMapCollision1(hash, ListMap.empty.updated(this.key,this.value).updated(key,value))

Obviously that comment is wrong, if you're under attack, they won't be rare at all. I think a simple solution here would be to modify HashMapCollision1 such that it has both a ListMap and a TreeMap, anything that implements Comparable can be put in and queried from the TreeMap, and everything else in the ListMap.

I don't think there's much consequence to doing this, the biggest impact will be a potential change in iteration order of colliding elements when you merge two maps - if the keys implement Comparable, the ordering will change from keys from the first map followed by keys in the second map to lexical ordering, and if you're mixing Comparable and non Comparable keys, the ordering gets a bit weirder. But it's still stable. And, who really depends on ordering in hash maps? And it only affects ordering when there are collisions. There's also a slight increase in space used, but again, it's only for collisions, for 99.9999% of the world, it will have no impact.

@sjrd
Copy link
Member

sjrd commented Oct 12, 2018

How do you suggest we test whether something is comparable. Even if it is an instance of Comparable, one cannot know what are valid arguments to the compareTo method. Invalid arguments will throw ClassCastExceptions, so you can't simply create a TreeMap.

Also, that would fix it for Strings, but not for any custom case class that contains a String, because those would suffer from the same poor hash, without being Comparable anyway.

@SethTisue SethTisue added this to the 2.12.8 milestone Oct 12, 2018
@smarter
Copy link
Member

smarter commented Oct 12, 2018

Another option would be to use a random seed: https://doc.rust-lang.org/std/collections/struct.HashMap.html

@smarter
Copy link
Member

smarter commented Oct 12, 2018

(some background on SipHash: http://131002.net/siphash/siphash.pdf, see in particular section 7)

@smarter
Copy link
Member

smarter commented Oct 12, 2018

... but I guess any kind of randomness wouldn't play well with sending things across the wire through serialization though

@szeiger
Copy link

szeiger commented Oct 12, 2018

Here's on overview of the current situation (as of 2.13.0-M5): All our hash maps and hash sets are affected (and they all need to be fixed individually)

[info] Benchmark                              (comparable)  (size)  Mode  Cnt           Score           Error  Units
[info] HashCollisionBenchmark.champHashMap           false      10  avgt   10        3087.112 ±       196.344  ns/op
[info] HashCollisionBenchmark.champHashMap           false     100  avgt   10       82826.466 ±      1242.476  ns/op
[info] HashCollisionBenchmark.champHashMap           false    1000  avgt   10     6008195.291 ±     83021.908  ns/op
[info] HashCollisionBenchmark.champHashMap           false   10000  avgt   10   799354669.450 ±  42798033.928  ns/op
[info] HashCollisionBenchmark.champHashMap            true      10  avgt   10        3206.333 ±        66.366  ns/op
[info] HashCollisionBenchmark.champHashMap            true     100  avgt   10       83229.784 ±      1723.577  ns/op
[info] HashCollisionBenchmark.champHashMap            true    1000  avgt   10     6023586.625 ±     99315.284  ns/op
[info] HashCollisionBenchmark.champHashMap            true   10000  avgt   10   801143110.200 ±  37457371.571  ns/op
[info] HashCollisionBenchmark.champHashSet           false      10  avgt   10        2661.455 ±        34.165  ns/op
[info] HashCollisionBenchmark.champHashSet           false     100  avgt   10       47023.016 ±       502.364  ns/op
[info] HashCollisionBenchmark.champHashSet           false    1000  avgt   10     4734575.751 ±     49540.034  ns/op
[info] HashCollisionBenchmark.champHashSet           false   10000  avgt   10   451249571.900 ±   4053862.848  ns/op
[info] HashCollisionBenchmark.champHashSet            true      10  avgt   10        2682.528 ±        59.833  ns/op
[info] HashCollisionBenchmark.champHashSet            true     100  avgt   10       47751.985 ±       751.482  ns/op
[info] HashCollisionBenchmark.champHashSet            true    1000  avgt   10     4830274.370 ±     87407.950  ns/op
[info] HashCollisionBenchmark.champHashSet            true   10000  avgt   10   460022326.500 ±   5370622.228  ns/op
[info] HashCollisionBenchmark.javaHashMap            false      10  avgt   10         435.207 ±         4.483  ns/op
[info] HashCollisionBenchmark.javaHashMap            false     100  avgt   10       78052.174 ±      1002.036  ns/op
[info] HashCollisionBenchmark.javaHashMap            false    1000  avgt   10     6690602.312 ±    132158.499  ns/op
[info] HashCollisionBenchmark.javaHashMap            false   10000  avgt   10  1064348588.700 ±  21725697.542  ns/op
[info] HashCollisionBenchmark.javaHashMap             true      10  avgt   10         435.106 ±         5.557  ns/op
[info] HashCollisionBenchmark.javaHashMap             true     100  avgt   10       10339.026 ±       199.182  ns/op
[info] HashCollisionBenchmark.javaHashMap             true    1000  avgt   10      138461.502 ±      5488.686  ns/op
[info] HashCollisionBenchmark.javaHashMap             true   10000  avgt   10     1383198.954 ±     19537.305  ns/op
[info] HashCollisionBenchmark.javaHashSet            false      10  avgt   10         431.236 ±         5.965  ns/op
[info] HashCollisionBenchmark.javaHashSet            false     100  avgt   10       78141.561 ±      1103.317  ns/op
[info] HashCollisionBenchmark.javaHashSet            false    1000  avgt   10     6740530.754 ±     90071.057  ns/op
[info] HashCollisionBenchmark.javaHashSet            false   10000  avgt   10  1065812482.700 ±  21146822.799  ns/op
[info] HashCollisionBenchmark.javaHashSet             true      10  avgt   10         429.406 ±         8.252  ns/op
[info] HashCollisionBenchmark.javaHashSet             true     100  avgt   10       10467.322 ±       252.287  ns/op
[info] HashCollisionBenchmark.javaHashSet             true    1000  avgt   10      142079.811 ±      3196.623  ns/op
[info] HashCollisionBenchmark.javaHashSet             true   10000  avgt   10     1667129.061 ±    413304.829  ns/op
[info] HashCollisionBenchmark.mutableHashMap         false      10  avgt   10         322.683 ±         5.309  ns/op
[info] HashCollisionBenchmark.mutableHashMap         false     100  avgt   10       19328.574 ±       604.139  ns/op
[info] HashCollisionBenchmark.mutableHashMap         false    1000  avgt   10     1335647.124 ±     34504.419  ns/op
[info] HashCollisionBenchmark.mutableHashMap         false   10000  avgt   10   130574061.538 ±   2898111.233  ns/op
[info] HashCollisionBenchmark.mutableHashMap          true      10  avgt   10         334.757 ±        30.975  ns/op
[info] HashCollisionBenchmark.mutableHashMap          true     100  avgt   10       19717.810 ±       880.020  ns/op
[info] HashCollisionBenchmark.mutableHashMap          true    1000  avgt   10     1349523.554 ±     61111.205  ns/op
[info] HashCollisionBenchmark.mutableHashMap          true   10000  avgt   10   136440902.025 ±   8937760.447  ns/op
[info] HashCollisionBenchmark.mutableHashSet         false      10  avgt   10         210.637 ±        10.214  ns/op
[info] HashCollisionBenchmark.mutableHashSet         false     100  avgt   10       51278.552 ±       862.355  ns/op
[info] HashCollisionBenchmark.mutableHashSet         false    1000  avgt   10     7254599.344 ±     77418.016  ns/op
[info] HashCollisionBenchmark.mutableHashSet         false   10000  avgt   10   582626129.000 ±  22068501.128  ns/op
[info] HashCollisionBenchmark.mutableHashSet          true      10  avgt   10         207.018 ±         3.522  ns/op
[info] HashCollisionBenchmark.mutableHashSet          true     100  avgt   10       50332.322 ±       722.652  ns/op
[info] HashCollisionBenchmark.mutableHashSet          true    1000  avgt   10     7361789.516 ±    153332.591  ns/op
[info] HashCollisionBenchmark.mutableHashSet          true   10000  avgt   10   581922084.150 ±  10862902.903  ns/op
[info] HashCollisionBenchmark.oldHashMap             false      10  avgt   10        1509.231 ±        92.854  ns/op
[info] HashCollisionBenchmark.oldHashMap             false     100  avgt   10       59816.860 ±     29110.914  ns/op
[info] HashCollisionBenchmark.oldHashMap             false    1000  avgt   10     8008288.752 ±     82106.263  ns/op
[info] HashCollisionBenchmark.oldHashMap             false   10000  avgt   10  1180388228.050 ± 554803591.812  ns/op
[info] HashCollisionBenchmark.oldHashMap              true      10  avgt   10        1447.969 ±        33.201  ns/op
[info] HashCollisionBenchmark.oldHashMap              true     100  avgt   10       59973.289 ±     28878.087  ns/op
[info] HashCollisionBenchmark.oldHashMap              true    1000  avgt   10     8123624.173 ±    177455.498  ns/op
[info] HashCollisionBenchmark.oldHashMap              true   10000  avgt   10  1117844772.850 ± 476160439.425  ns/op
[info] HashCollisionBenchmark.oldHashSet             false      10  avgt   10         341.151 ±         6.199  ns/op
[info] HashCollisionBenchmark.oldHashSet             false     100  avgt   10       15054.079 ±       659.371  ns/op
[info] HashCollisionBenchmark.oldHashSet             false    1000  avgt   10     4705176.794 ±     92712.245  ns/op
[info] HashCollisionBenchmark.oldHashSet             false   10000  avgt   10   469873977.200 ±   8556272.917  ns/op
[info] HashCollisionBenchmark.oldHashSet              true      10  avgt   10         336.080 ±         5.549  ns/op
[info] HashCollisionBenchmark.oldHashSet              true     100  avgt   10       15147.252 ±      1076.673  ns/op
[info] HashCollisionBenchmark.oldHashSet              true    1000  avgt   10     4761012.752 ±     77525.665  ns/op
[info] HashCollisionBenchmark.oldHashSet              true   10000  avgt   10   463202453.300 ±  37754187.404  ns/op

Code in scala/scala@2.13.x...szeiger:wip/hash-collision

@szeiger
Copy link

szeiger commented Oct 12, 2018

BTW, we use different collision resolution schemes in our implementation. I haven't looked at all the immutable maps and sets yet but I just started working on mutable HashMaps and HashSets (independently of this ticket). mutable.HashMap uses separate chaining with linked lists. This could be changed to tree-based chaining while keeping the general performance/memory characteristics. mutable.HashSet uses open addressing where the proposed fix doesn't apply. We'd have to change it to use separate chaining, too.

@Ichoran
Copy link

Ichoran commented Oct 12, 2018

I think the Java solution was probably expedient for Java, but it isn't inherently the right way to fix the issue. If you want a HashMap which is robust to an attack on the hash values of the keys, you can't guarantee success if you put off until runtime the attempt to use compareTo.

Instead, we need a new type of map where the keys are orderable but are not necessarily ordered. (Likewise with sets.)

Then we have the compile-time guarantee (so long as the ordering is sensible) that DOS cannot occur. This is far better for people wishing to produce robust systems, I think.

Alternatively, if we want a quick hacky fix, I'd just intercept the hashing of String and use a better hash on it to reduce the risk. Changing our map datastructures to include a tree that won't even work, and doesn't play nice with things that can be ordered via a typeclass but don't have the trait baked in, is not a direction that I think we should go.

@SethTisue
Copy link
Member

discussion at https://gitter.im/scala/contributors?at=5bc1dbe41e23486b93b70784 ("Do people expect hash maps to be secure against collisions of untrusted data?")

@szeiger
Copy link

szeiger commented Oct 15, 2018

Customizable hashing is another option. If hash collections alloweduser-defined hashing methods you could use a more secure seeded hash for security-critical applications.

@szeiger
Copy link

szeiger commented Oct 15, 2018

And no matter which way we go, 2.12.8 as a target is probably not possible (at least not for all affected collection types). Even with the magical behind-the-scenes hack that Java uses you need to change the internal data structures, which are not actually internal. They are exposed with package-private visibility and actively used by libraries like https://github.com/scala/scala-java8-compat/

@jrudolph
Copy link

If this is only (mostly) about the DOS attack vectors, could we just limit the number of allowed collisions and fail if there's a (configurable) suspicious ratio of collisions?

@jrudolph
Copy link

... but I guess any kind of randomness wouldn't play well with sending things across the wire through serialization though

You mean, under the expectation that you can compare hashCodes across JVM instances?

Otherwise, you don't really have to transfer hash codes with serialization and just recalculate them when deserializing.

@smarter
Copy link
Member

smarter commented Oct 23, 2018

I really don't want my hashmaps to have arbitrary implementation defined failures. if inserting certain elements starts failing in some way then my application is crippled, and this can still cause denial of services

@smarter
Copy link
Member

smarter commented Oct 23, 2018

You mean, under the expectation that you can compare hashCodes across JVM instances?

Yes, no clue what would break if this assumption is violated

@jrudolph
Copy link

I really don't want my hashmaps to have arbitrary implementation defined failures. if inserting certain elements starts failing in some way then my application is crippled, and this can still cause denial of services

It's not the same failure mode, though. It's like rejecting inputs that would lead to an OOM preventively with an exception. In this case, it would throw an exception if input was detected which has characteristics that don't fit the assumptions about your data when you chose a HashMap in the first place (i.e. that hashCodes are uniformly distributed). In which way would that cripple the application on innocent inputs?

@smarter
Copy link
Member

smarter commented Oct 23, 2018

Imagine a persistent hashmaps shared between multiple users, if I fill it with enough malicious data, then any attempt by other users to add something to it would throw an exception

@jrudolph
Copy link

Imagine a persistent hashmaps shared between multiple users, if I fill it with enough malicious data, then any attempt by other users to add something to it would throw an exception

Probably not, because while adding malicious data would push the ratio (of collisions per size) towards the limit adding any other data would in average move the ratio away from the limit.

jrudolph added a commit to jrudolph/spray-json that referenced this issue Oct 30, 2018
…ray#277

The problem is that with String's hashCode implementation it is too simple to
create synthetic collisions. This allows an attacker to create an object with
keys that all collide which leads to a performance drop for the HashMap
just for creating the map in the first place. See scala/bug#11203
for more information about the underlying HashMap issue.

For the time being, it seems safer to use a TreeMap which uses String ordering.
Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects
up to 100 keys.

Benchmark                         (_size)  (parser)   Mode  Cnt        Score       Error  Units
ExtractFieldsBenchmark.readSpray        1   HashMap  thrpt    5  1195832.262 ± 64366.605  ops/s
ExtractFieldsBenchmark.readSpray        1   TreeMap  thrpt    5  1342009.641 ± 17307.555  ops/s
ExtractFieldsBenchmark.readSpray       10   HashMap  thrpt    5   237173.327 ± 70341.742  ops/s
ExtractFieldsBenchmark.readSpray       10   TreeMap  thrpt    5   233510.618 ± 69638.750  ops/s
ExtractFieldsBenchmark.readSpray      100   HashMap  thrpt    5    23202.016 ±  1514.763  ops/s
ExtractFieldsBenchmark.readSpray      100   TreeMap  thrpt    5    21899.072 ±   823.225  ops/s
ExtractFieldsBenchmark.readSpray     1000   HashMap  thrpt    5     2073.754 ±    66.093  ops/s
ExtractFieldsBenchmark.readSpray     1000   TreeMap  thrpt    5     1793.329 ±    43.603  ops/s
ExtractFieldsBenchmark.readSpray    10000   HashMap  thrpt    5      208.160 ±     7.466  ops/s
ExtractFieldsBenchmark.readSpray    10000   TreeMap  thrpt    5      160.349 ±     5.809  ops/s
jrudolph added a commit to jrudolph/spray-json that referenced this issue Oct 30, 2018
…ray#277

The problem is that with String's hashCode implementation it is too simple to
create synthetic collisions. This allows an attacker to create an object with
keys that all collide which leads to a performance drop for the HashMap
just for creating the map in the first place. See scala/bug#11203
for more information about the underlying HashMap issue.

For the time being, it seems safer to use a TreeMap which uses String ordering.
Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects
up to 100 keys.

Benchmark for non-colliding keys:

Benchmark                         (_size)  (parser)   Mode  Cnt        Score       Error  Units
ExtractFieldsBenchmark.readSpray        1   HashMap  thrpt    5  1195832.262 ± 64366.605  ops/s
ExtractFieldsBenchmark.readSpray        1   TreeMap  thrpt    5  1342009.641 ± 17307.555  ops/s
ExtractFieldsBenchmark.readSpray       10   HashMap  thrpt    5   237173.327 ± 70341.742  ops/s
ExtractFieldsBenchmark.readSpray       10   TreeMap  thrpt    5   233510.618 ± 69638.750  ops/s
ExtractFieldsBenchmark.readSpray      100   HashMap  thrpt    5    23202.016 ±  1514.763  ops/s
ExtractFieldsBenchmark.readSpray      100   TreeMap  thrpt    5    21899.072 ±   823.225  ops/s
ExtractFieldsBenchmark.readSpray     1000   HashMap  thrpt    5     2073.754 ±    66.093  ops/s
ExtractFieldsBenchmark.readSpray     1000   TreeMap  thrpt    5     1793.329 ±    43.603  ops/s
ExtractFieldsBenchmark.readSpray    10000   HashMap  thrpt    5      208.160 ±     7.466  ops/s
ExtractFieldsBenchmark.readSpray    10000   TreeMap  thrpt    5      160.349 ±     5.809  ops/s
jrudolph added a commit to jrudolph/spray-json that referenced this issue Oct 30, 2018
…ray#277

The problem is that with String's hashCode implementation it is too simple to
create synthetic collisions. This allows an attacker to create an object with
keys that all collide which leads to a performance drop for the HashMap
just for creating the map in the first place. See scala/bug#11203
for more information about the underlying HashMap issue.

For the time being, it seems safer to use a TreeMap which uses String ordering.
Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects
up to 100 keys.

Benchmark for non-colliding keys:

Benchmark                         (_size)  (parser)   Mode  Cnt        Score       Error  Units
ExtractFieldsBenchmark.readSpray        1   HashMap  thrpt    5  1195832.262 ± 64366.605  ops/s
ExtractFieldsBenchmark.readSpray        1   TreeMap  thrpt    5  1342009.641 ± 17307.555  ops/s
ExtractFieldsBenchmark.readSpray       10   HashMap  thrpt    5   237173.327 ± 70341.742  ops/s
ExtractFieldsBenchmark.readSpray       10   TreeMap  thrpt    5   233510.618 ± 69638.750  ops/s
ExtractFieldsBenchmark.readSpray      100   HashMap  thrpt    5    23202.016 ±  1514.763  ops/s
ExtractFieldsBenchmark.readSpray      100   TreeMap  thrpt    5    21899.072 ±   823.225  ops/s
ExtractFieldsBenchmark.readSpray     1000   HashMap  thrpt    5     2073.754 ±    66.093  ops/s
ExtractFieldsBenchmark.readSpray     1000   TreeMap  thrpt    5     1793.329 ±    43.603  ops/s
ExtractFieldsBenchmark.readSpray    10000   HashMap  thrpt    5      208.160 ±     7.466  ops/s
ExtractFieldsBenchmark.readSpray    10000   TreeMap  thrpt    5      160.349 ±     5.809  ops/s
@smarter
Copy link
Member

smarter commented Nov 21, 2018

@SethTisue Shouldn't this be kept open until we at least have proper user documentation on which maps are "DoS-safe" and which aren't ?

@SethTisue
Copy link
Member

I wouldn't be opposed to be re-opening it, but I don't think it's a blocker for 2.12.8. @lrytz?

@He-Pin
Copy link
Contributor

He-Pin commented Nov 22, 2018

Could reschedule to 2.12.9.

@lrytz
Copy link
Member

lrytz commented Nov 22, 2018

It's not blocking 2.12.8. I think @szeiger is working on a ordering-based implementation for 2.13?

@SethTisue
Copy link
Member

I think @szeiger is working on a ordering-based implementation for 2.13?

scala/scala#7633 was merged and included in 2.13.0

perhaps a pull request backporting CollisionProofHashMap to 2.12+2.11 would be accepted as an addition to https://github.com/scala/scala-collection-compat

@plokhotnyuk
Copy link

@SethTisue I have created a ticket on behalf of you: scala/scala-collection-compat#234

plokhotnyuk added a commit to plokhotnyuk/dijon that referenced this issue Aug 17, 2019
plokhotnyuk added a commit to plokhotnyuk/dijon that referenced this issue Aug 17, 2019
plokhotnyuk added a commit to plokhotnyuk/dijon that referenced this issue Aug 17, 2019
@plokhotnyuk
Copy link

plokhotnyuk commented Aug 17, 2019

Currently any Scala collection is vulnerable when affected maps or sets are used internally in methods like: toMap, toSet, keys, keySet, distinct, groupBy, --, etc.

plokhotnyuk added a commit to plokhotnyuk/dijon that referenced this issue Aug 17, 2019
plokhotnyuk added a commit to plokhotnyuk/dijon that referenced this issue Aug 17, 2019
plokhotnyuk added a commit to jvican/dijon that referenced this issue Aug 26, 2019
* Yet more safe and efficient removing of keys without an internal `toSet` call that isn't safe: scala/bug#11203

* Avoid usage of vulnerable methods even in tests and docs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests