Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteration over Tables does not randomize the order #12070

Closed
dumblob opened this issue Aug 27, 2019 · 24 comments
Closed

Iteration over Tables does not randomize the order #12070

dumblob opened this issue Aug 27, 2019 · 24 comments

Comments

@dumblob
Copy link

dumblob commented Aug 27, 2019

Iterating over Tables is insecure (prone to attacks) - see the discussion rust-lang/rust#36481 .

Example

N/A

Current Output

N/A

Expected Output

N/A

Possible Solution

Potentially good solution in the comment rust-lang/rust#36481 (comment) . Some languages (e.g. Dao) additionally provide for performance reasons a choice at instantiation time whether the order shall be undefined or random (or the insertion order, but for that there is a different type in Nim).

Additional Information

Would be wise to solve/define before the final 1.0.

@Araq
Copy link
Member

Araq commented Aug 27, 2019

I've said it before, people are much better off using tree structures for these things instead of playing whack-a-mole with hash algorithms.

@dumblob
Copy link
Author

dumblob commented Aug 28, 2019

In the Rust discussion I've linked there are two topics - first the hash map algorithm itself and second the randomization when iterating. Randomization is the thing I'd definitely like to have by default in Nim. The hash map algorithm itself is a different beast and I don't want to bring it in discussion here.

Would you consider specifying in the Nim 1.0 that the iteration order on Tables must be randomized?

@krux02
Copy link
Contributor

krux02 commented Aug 28, 2019

@Araq

I've said it before, people are much better off using tree structures for these things instead of playing whack-a-mole with hash algorithms.

I think you miss the entire point here. The reason for iteration randomization is to prevent people from playing "whack-a-mole with hash algorithms". I've said it before, Go does the same thing and I think it is a good solution to prevent people from depending on the iteration order of hash tables.
How

@Araq
Copy link
Member

Araq commented Aug 28, 2019

Yeah, we discussed this before, I think Go's solution is terrible. Python's is much better.

@dumblob
Copy link
Author

dumblob commented Aug 28, 2019

Just to clarify - Nim has Table and OrderedTable whereas the iteration order for Table is "undefined" and for OrderedTable defined (it's the insertion order). I'd like to change the "undefined" at Table to defined while maintaining security - thus involving randomization. That's it, nothing more or less.

@Araq
Copy link
Member

Araq commented Aug 28, 2019

And how do you prove its security? That is far from being easy. If security is a concern, use a BTree, maybe an OrderedTable and stay away from "O(1) fast hashing". The O(1) on average is what creates all the trouble.

@dumblob
Copy link
Author

dumblob commented Aug 28, 2019

The O(1) on average is what creates all the trouble.

I'd say the reasons aren't that technical, but I'm rather concerned with all the issues which e.g. the switch from python2 to python3 (and analogically from golang 0.x to 1.x) caused - just because people relied upon "iterating the map twice in a row yields the same order" and often even "iterating a map once is actually like iterating an ordered map with preserved insertion order". I don't know about any compile-time way how to catch such wrong assumptions (which are way more common than one would guess) - therefore this runtime solution.

@dawkot
Copy link

dawkot commented Aug 28, 2019

If it really is common, the module's documentation yelling at users not to do so in the first paragraph is the easiest solution.

@dumblob
Copy link
Author

dumblob commented Aug 28, 2019

If it really is common, the module's documentation yelling at users not to do so in the first paragraph is the easiest solution.

This never worked - it has to yell at "edition time" or compile time or at the latest in runtime. Yelling in "edition time" could be achieved by making the default Table be actually an alias to OrderedTable and to rename the current Table to UnorderedTable. This would be a different story in practice...

@Araq
Copy link
Member

Araq commented Aug 29, 2019

We could also deprecate the table's iterators. :-)

@dumblob
Copy link
Author

dumblob commented Aug 29, 2019

I'm confused - I meant my words seriously, not as a joke. Feel free to close this issue if the topic is not a priority for Nim.

@cooldome
Copy link
Member

I think we just need a separate SecureTable that users can use. Let's keep Table they way it is, just document the behavior properly.

@Araq
Copy link
Member

Araq commented Aug 29, 2019

I'm confused - I meant my words seriously, not as a joke. Feel free to close this issue if the topic is not a priority for Nim.

I was serious too. items and other iterators for Table could produce a warning that instead an OrderedTable should be used.

@dumblob
Copy link
Author

dumblob commented Aug 29, 2019

items and other iterators for Table could produce a warning that instead an OrderedTable should be used.

Sounds good to me (assuming a compile-time warning).

@alehander92
Copy link
Contributor

@dumblob but that's not quite good: iterating is a completely valid usage of even unordered tables, iterating and using the order is not really: the second one is basically impossible to catch tho in an automatic way as it's related to the intentions/assumptions of the user .

I guess warning might be ok, but it seems wrong to me

@dumblob
Copy link
Author

dumblob commented Aug 29, 2019

@alehander42 sure, iteration is valid even on unordered maps - but if we don't want to change Table to OrderedTable while renaming the current Table to UnorderedTable (as suggested in #12070 (comment) ), then the iteration warning is IMHO the second best option (the warning can be easily disabled by a pragma).

@hashbackup
Copy link

It seems there are 2 issues raised here:

Iterating over Tables is insecure (prone to attacks)

and

issues which e.g. the switch from python2 to python3 (and analogically from golang 0.x to 1.x) caused - just because people relied upon "iterating the map twice in a row yields the same order"

An easy way to solve the 2nd issue is to start iterating at a random slot instead of always starting at slot 0. Something like:

  let offset = rand(high(t.data))
  for h in 0 .. high(t.data):
    let slot = if h + offset > high(t.data): h + offset - len(t.data) else: h + offset
    ... use t.data[slot] instead of t.data[h] ...

I tried this in the tables.nim code - works fine. It doesn't address the attack scenario, but does prevent users from relying on a specific iteration order, which could be a good thing to avoid perceived compatibility problems if a table implementation or hash function changes in a future release of Nim, as you mentioned.

@ghost ghost added the Standard Library label Oct 22, 2020
@timotheecour
Copy link
Member

IMO we should close this issue as a wont-fix, unless someone can show a benchmark illustrating the problem with nim's current implementation (or even #13440) which differ from rust's implementation IIUC and shouldn't be affected by the "Without an attacker" case at least

  • the proposal doesn't specify how the iteration should be randomized
  • the randomization would incur a performance penalty
  • even if randomized (but deterministic), it wouldn't prevent adversarial attacks of the kind mentioned in Exposure of HashMap iteration order allows for O(n²) blowup. rust-lang/rust#36481
  • randomizing iteration order (in particular non-deterministic) would impact other things, eg json serialization (which is currently deterministic) or compiler code that relies on tables

Uses cases where you care about robustness against adversarial inputs are not the common case, and shouldn't impact the common case.

@dumblob
Copy link
Author

dumblob commented Jun 30, 2021

Thanks @timotheecour for the heads up. I see you did a lot of work on Table since this issue was opened. Thank you for the work!

I still feel there is too much low hanging fruit here compared to radically saying "won't fix".

  1. Make OrderedTable a default (I'm deliberately not specifying what everything a "default" means - I can imagine that even just making a shorter more readable name would encourage users to prefer it over the unordered table as mentioned below).
  2. Rename the current unordered Table so that it make it explicit it's unordered and visually less appealing than ordered table (e.g. to UnorderedTable).
  3. In the documentation of the unordered table mention the risk of attacks.
  4. Or maybe do what Araq proposed - rethink iterators and possibly deprecate them in favor of an alternative (I'm deliberately not specifying how the alternative could look like - it might be anything from just a slight shift in iterators' semantics or it might require a new iterators architecture which distinguishes between ordered and unordered iteration).

Btw. it seems to me some of the changes you made to Table actually seem to kind of prevent the attacks in rust-lang/rust#36481 (judging based on your measurements of maximal slowdown). Am I right or do you think there is still potential for more than 10x amplification (this is how Java does the pseudo-prevention - they use a tree if the buckets or whatever should get too long)?

@timotheecour
Copy link
Member

I'm closing this because there is no reasonable actionable item, feel free to re-open this issue if you have an update involving one or a reproducible bug involving afore mentioned attack vector but involving nim code instead of rust code; the algorithms used differ.

the changes recommended here involve breaking changes (eg renaming tables) that don't provide value. The attack mentioned revolved around a library in rust, not in nim.

See #13440 for a better alternative, with benchmarks to support it.

@dumblob
Copy link
Author

dumblob commented Jul 7, 2021

See #13440 for a better alternative, with benchmarks to support it.

Yep, this is what I meant with saying you did a lot of very valuable work on Table.

So again - it seems to me some of the changes you made to Table actually seem to kind of prevent the attacks in rust-lang/rust#36481 (judging based on your measurements of maximal slowdown). Am I right or do you think there is still potential for more than 10x amplification (this is how Java does the pseudo-prevention - they use a tree if the buckets or whatever should get too long)?

@timotheecour
Copy link
Member

Am I right or do you think there is still potential for more than 10x amplification

indeed, both #13440 and the (more expensive) hashWangYi1 should prevent against the sort of attacks mentioned in rust-lang/rust#36481 ("Without an attacker') as well as more generally than presented in that issue against slowdowns due to collision during insert/lookup.

Dealing against adversarial attacks is much harder and needs a threat model in order to do something about it; I don't think it can be done without sacrificing something else (eg performance for the common case not involving adversarial attack). The "With an attacker" scenario described in rust-lang/rust#36481 doesn't seem realistic though; if a production grade server returns all the data to a client for some requests, it should have safeguards in place against leaking internal logic. Lots of mitigation strategies can be used, eg hashing with a salt, or adding a randomized order iteration in tables.nim (which shouldn't be the default for obvious reasons).

Note that with privateAccess(Table) and import tables {.all.}, client code that desperately needs it can even do it without waiting for stdlib to provide it, since it'd have access to the underlying container so can iterate in randomized order.

@dumblob
Copy link
Author

dumblob commented Jul 7, 2021

Note that with privateAccess(Table) and import tables {.all.}, client code that desperately needs it can even do it without waiting for stdlib to provide it, since it'd have access to the underlying container so can iterate in randomized order.

Didn't think of this - could this also be mentioned in the doc of Table?

@timotheecour
Copy link
Member

could this also be mentioned in the doc of Table?

no reason it should be mentioned in Table, because of separation of concerns. The feature is general purpose and applicable in all cases where you need private access and don't care about having your code broken by upstream changes (in particular is applicable for private code, or for visibility violations within the same project for which you control the code, or debugging, as example use cases)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants