Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrderedTable crashes on del when there are repeated keys #13500

Open
timotheecour opened this issue Feb 25, 2020 · 6 comments
Open

OrderedTable crashes on del when there are repeated keys #13500

timotheecour opened this issue Feb 25, 2020 · 6 comments

Comments

@timotheecour
Copy link
Member

timotheecour commented Feb 25, 2020

OrderedTable crashes on del when there are repeated keys

Example

when true:
  import tables
  var t: OrderedTable[int, int]
  # var t = newOrderedTable[int, int]() # ditto
  # var t: Table[int, int] # this works
  t.add 1, 10
  t.add 1, 11
  t[2] = 12
  echo t # ok
  t.del(2) # BUG

Current Output

{1: 10, 1: 11, 2: 12}
/Users/timothee/git_clone/nim/timn/tests/nim/all/t10256.nim(102) t10256
/Users/timothee/.choosenim/toolchains/nim-1.0.6/lib/pure/collections/tables.nim(1514) del
/Users/timothee/.choosenim/toolchains/nim-1.0.6/lib/pure/collections/tables.nim rawInsert
/Users/timothee/.choosenim/toolchains/nim-1.0.6/lib/system/fatal.nim(39) sysFatal
Error: unhandled exception: index -2 not in 0 .. 63 [IndexError]

Expected Output

{1: 10, 1: 11, 2: 12}

Additional Information

  • broken in 1.06 and devel 3dad130 1.1.1

note

see note I just added in #13418

OrderedTable.del is O(n) (including before PR), because every del rebuilds whole table; instead, using tombstones as introduced in #13418 would make it O(1)

@narimiran
Copy link
Member

I'll just repeat my comment from #12934, regarding repeated keys:

I'm still of opinion that tables.add is bad, shouldn't have made it to v1.0, and 98% of users don't need it (and if they use it, they probably don't know what it really does - it might cause unexpected behaviour).

@hashbackup
Copy link

I am trying to get my bearings on tables before contributing to them instead of just complaining. :-)

I recently posted a slight improvement to OrderedTables.del: #14971 (comment) It still is O(n) and that needs to be fixed, but it is 3x faster and uses half as much memory. And with this change, del doesn't crash on duplicate keys.

I will work on a pull request for OT, my first ever on GH, so not great with it and may need some help. But it is bothering me that when OT.del() was first implemented, it did not copy the table. It did have a bug, because t.last was not updated, but it didn't make a new table. I couldn't find the commit where OT.del was changed to create a new table. Can anyone fill me in why this was done? Is it something related to immutability, del breaking iterators, or something else? I'm just wondering if copying the table is required for some reason.

@Clyybber
Copy link
Contributor

Clyybber commented Jul 14, 2020

OT.del was changed to create a new table

Does it? I think it just takes a var Table and its implementation doesn't replace that table by a new one either AFAICT

@hashbackup
Copy link

I'm new to Nim so might not understand what I'm looking at, but it:

  • creates a new sequence the size of the existing table
  • swaps that with t.data (I sort of get what that means)
  • traverses the old table by the linked list
  • ignores keys that match (the ones being deleted)
  • call rawInsert on all other keys to move items to the new table
  • I'm guessing the GC gets rid of the old sequence
proc del*[A, B](t: var OrderedTable[A, B], key: A) =
  ## Deletes ``key`` from hash table ``t``. Does nothing if the key does not exist.
  ##
  ## O(n) complexity.
  ##
  ## See also:
  ## * `pop proc<#pop,OrderedTable[A,B],A,B>`_
  ## * `clear proc<#clear,OrderedTable[A,B]>`_ to empty the whole table
  runnableExamples:
    var a = {'a': 5, 'b': 9, 'c': 13}.toOrderedTable
    a.del('a')
    doAssert a == {'b': 9, 'c': 13}.toOrderedTable
    a.del('z')
    doAssert a == {'b': 9, 'c': 13}.toOrderedTable

  if t.counter == 0: return
  var n: OrderedKeyValuePairSeq[A, B]
  newSeq(n, len(t.data))
  var h = t.first
  t.first = -1
  t.last = -1
  swap(t.data, n)
  let hc = genHash(key)
  while h >= 0:
    var nxt = n[h].next
    if isFilled(n[h].hcode):
      if n[h].hcode == hc and n[h].key == key:
        dec t.counter
      else:
        var j = -1 - rawGetKnownHC(t, n[h].key, n[h].hcode)
        rawInsert(t, t.data, move n[h].key, move n[h].val, n[h].hcode, j)
    h = nxt

Also, while the test program in this PR now works, I found another one that adds 1000 duplicate keys. I modified it to delete all the keys and with the old code it segv's, but with my changes it hangs after a few are deleted. So there's still a bug in my changes I think.

@Clyybber
Copy link
Contributor

Oh yeah, sorry you are right. I was looking at the wrong proc :D

@hashbackup
Copy link

I think the reason the table was copied is because there is potentially a backshift in Table.del(). If that occurs, the entry moves and the linked list of OT is no longer valid. That's why the test program I have is hanging. The test program adds 2 duplicate keys then a non-duplicate. The 2 dups hash to the same slot. When the dup key is deleted, the first one is deleted from slot 61 and that moves what was in slot 62 to 61. Then OT.del accesses the next key at 62, but it really is at 61. The next field is trash (0) and the program loops from 0 to 0 forever.

I was planning on adding a prev field to every OT item to make a doubly-linked list so deletes would be O(1). But that won't work if nodes are moving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants