Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collections: BinaryHeap has different result when implement top-k-frequent-elements #2522

Closed
char8x opened this issue Aug 13, 2022 · 7 comments
Labels
bug Something isn't working needs triage

Comments

@char8x
Copy link

char8x commented Aug 13, 2022

Describe the bug
BinaryHeap has different result when implement top-k-frequent-elements

Steps to Reproduce
https://github.com/char8x/binary-heap-different-result

Expected behavior

the result should be close to using array or using Heap

Environment

  • OS: macOS 11.6.8
  • deno version: v1.24.3
  • std version: 0.152.0
@char8x char8x added bug Something isn't working needs triage labels Aug 13, 2022
@char8x
Copy link
Author

char8x commented Aug 13, 2022

@KyleJune Could you help look at this problem ?

@KyleJune
Copy link
Contributor

I'll take a look at it in the next few hours.

@KyleJune
Copy link
Contributor

I think the sorted array is expected to be different than both binary heap implementations since there are so many values in the array that have the same count.

I'm working on identifying exactly what is causing a difference in behavior. There does appear to be a bug in the Deno implementation I made for std. I think I should be able to fix it this weekend. Once I identify the cause I'll make another comment describing it.

@KyleJune
Copy link
Contributor

I'm still investigating this, but I thought I'd report back that I found a bug in the heap node module's clone function. When you clone the heap it will return a new one with the default compare function instead of the same one as the original heap object being cloned. This will result in the returned values being different when popping them off a cloned heap. My deno binary heap implementation doesn't have that issue. In the output for the below test file, you can see it returns { key: 2, count: 29 } second instead of last in the cloned node heap.

import { BinaryHeap } from "./binary_heap.ts";
import Heap from "https://esm.sh/[email protected]";
import { assertEquals } from "../testing/asserts.ts";

const compareObjects = (a, b) => a.count - b.count;

Deno.test("both heaps return values in the correct order", () => {
  const denoMinHeap = new BinaryHeap(compareObjects);
  const nodeMinHeap = new Heap(compareObjects);

  const values = [
    { key: 0, count: 23 },
    { key: 1, count: 19 },
    { key: 2, count: 29 },
    { key: 3, count: 16 },
  ];
  for (const value of values) {
    denoMinHeap.push(value);
    nodeMinHeap.push(value);
  }

  const denoValues = [];
  const nodeValues = [];
  for (let i = 0; i < values.length; i++) {
    denoValues.push(denoMinHeap.pop());
    nodeValues.push(nodeMinHeap.pop());
  }

  const expected = [
    { key: 3, count: 16 },
    { key: 1, count: 19 },
    { key: 0, count: 23 },
    { key: 2, count: 29 },
  ];
  console.log("Deno:", denoValues);
  console.log("Node:", nodeValues);
  assertEquals(denoValues, expected);
  assertEquals(nodeValues, expected);
});

Deno.test("cloned node heap returns values in the wrong order", () => {
  let denoMinHeap = new BinaryHeap(compareObjects);
  let nodeMinHeap = new Heap(compareObjects);

  const values = [
    { key: 0, count: 23 },
    { key: 1, count: 19 },
    { key: 2, count: 29 },
    { key: 3, count: 16 },
  ];
  for (const value of values) {
    denoMinHeap.push(value);
    nodeMinHeap.push(value);
  }

  denoMinHeap = BinaryHeap.from(denoMinHeap);
  nodeMinHeap = nodeMinHeap.clone();

  const denoValues = [];
  const nodeValues = [];
  for (let i = 0; i < values.length; i++) {
    denoValues.push(denoMinHeap.pop());
    nodeValues.push(nodeMinHeap.pop());
  }

  const expected = [
    { key: 3, count: 16 },
    { key: 1, count: 19 },
    { key: 0, count: 23 },
    { key: 2, count: 29 },
  ];
  console.log("Deno:", denoValues);
  console.log("Node:", nodeValues);
  assertEquals(denoValues, expected);
  assertEquals(nodeValues, expected);
});

Output:

running 2 tests from ./example_test.js
both heaps return values in the correct order ...
------- output -------
Deno: [
  { key: 3, count: 16 },
  { key: 1, count: 19 },
  { key: 0, count: 23 },
  { key: 2, count: 29 }
]
Node: [
  { key: 3, count: 16 },
  { key: 1, count: 19 },
  { key: 0, count: 23 },
  { key: 2, count: 29 }
]
----- output end -----
both heaps return values in the correct order ... ok (9ms)
cloned node heap returns values in the wrong order ...
------- output -------
Deno: [
  { key: 3, count: 16 },
  { key: 1, count: 19 },
  { key: 0, count: 23 },
  { key: 2, count: 29 }
]
Node: [
  { key: 3, count: 16 },
  { key: 2, count: 29 },
  { key: 1, count: 19 },
  { key: 0, count: 23 }
]
----- output end -----
cloned node heap returns values in the wrong order ... FAILED (7ms)

 ERRORS 

cloned node heap returns values in the wrong order => ./example_test.js:41:6
error: AssertionError: Values are not equal:


    [Diff] Actual / Expected


    [
      {
        count: 16,
        key: 3,
      },
      {
-       count: 29,
-       key: 2,
-     },
-     {
        count: 19,
        key: 1,
      },
      {
        count: 23,
        key: 0,
+     },
+     {
+       count: 29,
+       key: 2,
      },
    ]

  throw new AssertionError(message);
        ^
    at assertEquals (file:///home/kyle/Projects/deno/deno_std/testing/asserts.ts:269:9)
    at file:///home/kyle/Projects/deno/deno_std/collections/example_test.js:75:3

 FAILURES 

cloned node heap returns values in the wrong order => ./example_test.js:41:6

FAILED | 1 passed | 1 failed (46ms)

error: Test failed

@KyleJune
Copy link
Contributor

KyleJune commented Aug 14, 2022

I believe the bug is with the deno's binary heap push functions implementation. If I look at the internal structure and compare it against that of the heap npm package, it looks like deno's binary heap push implementation is doing an additional swap it shouldn't be doing.

Push: 29, 33

Both Heaps:
      29
   33

Push: 28

Node Heap:
      28
   33    29

Deno Heap:
      28
   29    33

This issue can end up affecting extraction, causing deno to return some values out of order. Below is a test case I wrote for demonstrating it. In it, it ends up incorrectly returning 5 before 4 when popping values off the heap.

Deno.test("[collections/BinaryHeap] edge case 1", () => {
  const minHeap = new BinaryHeap<number>(ascend);
  minHeap.push(4, 2, 8, 1, 10, 7, 3, 6, 5);
  assertEquals(minHeap.pop(), 1);
  minHeap.push(9);

  const expected = [2, 3, 4, 5, 6, 7, 8, 9, 10];
  assertEquals([...minHeap], expected);
});
 ERRORS 

[collections/BinaryHeap] edge case 1 => ./binary_heap_test.ts:260:6
error: AssertionError: Values are not equal:


    [Diff] Actual / Expected


    [
      2,
      3,
-     5,
      4,
+     5,
      6,
      7,
      8,
      9,
      10,
    ]

  throw new AssertionError(message);
        ^
    at assertEquals (file:///home/kyle/Projects/deno/deno_std/testing/asserts.ts:269:9)
    at file:///home/kyle/Projects/deno/deno_std/collections/binary_heap_test.ts:267:3

I'm working on fixing this issue.

@KyleJune
Copy link
Contributor

There were 2 issues and I fixed both in PR #2525.

  1. push was using the wrong algorithm for finding the parent index. It would end up comparing and swapping with the wrong index in some cases. The first test case below pins the correct behavior. The test case passes for the heap npm module too.
  2. pop would swap with the wrong child if both were less than the parent value. It would always swap with the left instead of the one that is lowest.

Below are 2 quotes from the Binary Heap Wikipedia page.

Steps for insertion:

  1. Add the element to the bottom level of the heap at the leftmost open space.
  2. Compare the added element with its parent; if they are in the correct order, stop.
  3. If not, swap the element with its parent and return to the previous step.

Steps for extraction:

  1. Replace the root of the heap with the last element on the last level.
  2. Compare the new root with its children; if they are in the correct order, stop.
  3. If not, swap the element with one of its children and return to the previous step. (Swap with its smaller child in a min-heap and its larger child in a max-heap.)

Here are all 3 edge case tests I wrote for collections/BinaryHeap. They all pass now with the fixes I've made to push and pop.

Deno.test("[collections/BinaryHeap] edge case 1", () => {
  const minHeap = new BinaryHeap<number>(ascend);
  minHeap.push(4, 2, 8, 1, 10, 7, 3, 6, 5);
  assertEquals(minHeap.pop(), 1);
  minHeap.push(9);

  const expected = [2, 3, 4, 5, 6, 7, 8, 9, 10];
  assertEquals([...minHeap], expected);
});

Deno.test("[collections/BinaryHeap] edge case 2", () => {
  interface Point {
    x: number;
    y: number;
  }
  const minHeap = new BinaryHeap<Point>((a, b) => ascend(a.x, b.x));
  minHeap.push({ x: 0, y: 1 }, { x: 0, y: 2 }, { x: 0, y: 3 });

  const expected = [{ x: 0, y: 1 }, { x: 0, y: 3 }, { x: 0, y: 2 }];
  assertEquals([...minHeap], expected);
});

Deno.test("[collections/BinaryHeap] edge case 3", () => {
  interface Point {
    x: number;
    y: number;
  }
  const minHeap = new BinaryHeap<Point>((a, b) => ascend(a.x, b.x));
  minHeap.push(
    { x: 0, y: 1 },
    { x: 1, y: 2 },
    { x: 1, y: 3 },
    { x: 2, y: 4 },
    { x: 2, y: 5 },
    { x: 2, y: 6 },
    { x: 2, y: 7 },
  );

  const expected = [
    { x: 0, y: 1 },
    { x: 1, y: 2 },
    { x: 1, y: 3 },
    { x: 2, y: 5 },
    { x: 2, y: 4 },
    { x: 2, y: 6 },
    { x: 2, y: 7 },
  ];
  assertEquals([...minHeap], expected);
});

Here are what the 3 different heaps from these test cases look like before the values are removed for comparing their pop order. The heaps look the same for both the std/collections BinaryHeap and the heap npm module.

Edge case 1 heap:
  Before pop:
               1
          2        3
      4     10   8   7
    6   5
  After pop:
               2
          4        3
      5     10   8   7
    6
  After push:
               2
          4        3
      5     10   8   7
    6   9


Edge case 2 heap:
      0,1
  0,2     0,3

Edge case 3 heap:
              0,1
      1,2             1,3
  2,4     2,5     2,6     2,7

Below are the equivalent test cases for the heap npm module. The first edge case passes but the second 2 fail because heap is incorrectly doing swaps when the parent and child nodes are equal when compared.

Reason why for edge case 2 failing:

When you pop off the 0,1 point, the 0,3 point gets swapped to the top. Then you are suppose to compare the children. Since the child and parent are the same since we are just comparing x, they shouldn't be swapped. The node module is incorrectly swapping them resulting in 0,2 ending up on top instead of 0,3.

Reason why for edge case 3 failing:

When you pop off the 0,1 point, the 2,7 point gets swapped to the top. Then you are suppose to compare with the children. Since both children are the same and lower than the parent, it should swap the left child with the parent. it ends up swapping it with the right child instead. Next it repeats the same issue as what happens in edge case 2 when the parent and child are the same but it swaps them anyway.

import { assertEquals } from "https://deno.land/[email protected]/testing/asserts.ts";
import { ascend } from "https://deno.land/[email protected]/collections/binary_heap.ts";
import Heap from "https://esm.sh/[email protected]";

Deno.test("[npm/Heap] edge case 1", () => {
  const minHeap = new Heap<number>(ascend);
  const values = [4, 2, 8, 1, 10, 7, 3, 6, 5];
  for (const value of values) {
    minHeap.push(value);
  }
  assertEquals(minHeap.pop(), 1);
  minHeap.push(9);

  const actual = [];
  for (let i = 0; i < values.length; i++) {
    actual.push(minHeap.pop());
  }
  const expected = [2, 3, 4, 5, 6, 7, 8, 9, 10];
  assertEquals(actual, expected);
});

Deno.test("[npm/Heap] edge case 2", () => {
  interface Point {
    x: number;
    y: number;
  }
  const minHeap = new Heap<Point>((a, b) => ascend(a.x, b.x));
  const values = [{ x: 0, y: 1 }, { x: 0, y: 2 }, { x: 0, y: 3 }];
  for (const value of values) {
    minHeap.push(value);
  }

  const actual = [];
  for (let i = 0; i < values.length; i++) {
    actual.push(minHeap.pop());
  }
  const expected = [{ x: 0, y: 1 }, { x: 0, y: 3 }, { x: 0, y: 2 }];
  assertEquals(actual, expected);
});

Deno.test("[npm/Heap] edge case 3", () => {
  interface Point {
    x: number;
    y: number;
  }
  const minHeap = new Heap<Point>((a, b) => ascend(a.x, b.x));
  const values = [
    { x: 0, y: 1 },
    { x: 1, y: 2 },
    { x: 1, y: 3 },
    { x: 2, y: 4 },
    { x: 2, y: 5 },
    { x: 2, y: 6 },
    { x: 2, y: 7 },
  ];
  for (const value of values) {
    minHeap.push(value);
  }

  const actual = [];
  for (let i = 0; i < values.length; i++) {
    actual.push(minHeap.pop());
  }
  const expected = [
    { x: 0, y: 1 },
    { x: 1, y: 2 },
    { x: 1, y: 3 },
    { x: 2, y: 5 },
    { x: 2, y: 4 },
    { x: 2, y: 6 },
    { x: 2, y: 7 },
  ];
  assertEquals(actual, expected);
});
$ deno test example_test.ts 
running 3 tests from ./example_test.ts
[npm/Heap] edge case 1 ... ok (6ms)
[npm/Heap] edge case 2 ... FAILED (7ms)
[npm/Heap] edge case 3 ... FAILED (5ms)

 ERRORS 

[npm/Heap] edge case 2 => ./example_test.ts:22:6
error: AssertionError: Values are not equal:


    [Diff] Actual / Expected


    [
      {
        x: 0,
        y: 1,
      },
      {
        x: 0,
-       y: 2,
+       y: 3,
      },
      {
        x: 0,
-       y: 3,
+       y: 2,
      },
    ]

  throw new AssertionError(message);
        ^
    at assertEquals (https://deno.land/[email protected]/testing/asserts.ts:183:9)
    at file:///home/kyle/Projects/deno/deno_std/collections/example_test.ts:38:3

[npm/Heap] edge case 3 => ./example_test.ts:41:6
error: AssertionError: Values are not equal:


    [Diff] Actual / Expected


    [
      {
        x: 0,
        y: 1,
      },
      {
        x: 1,
-       y: 3,
+       y: 2,
      },
      {
        x: 1,
-       y: 2,
+       y: 3,
      },
      {
        x: 2,
-       y: 6,
+       y: 5,
      },
      {
        x: 2,
-       y: 7,
+       y: 4,
      },
      {
        x: 2,
-       y: 5,
+       y: 6,
      },
      {
        x: 2,
-       y: 4,
+       y: 7,
      },
    ]

  throw new AssertionError(message);
        ^
    at assertEquals (https://deno.land/[email protected]/testing/asserts.ts:183:9)
    at file:///home/kyle/Projects/deno/deno_std/collections/example_test.ts:73:3

 FAILURES 

[npm/Heap] edge case 2 => ./example_test.ts:22:6
[npm/Heap] edge case 3 => ./example_test.ts:41:6

FAILED | 1 passed | 2 failed (51ms)

error: Test failed

@KyleJune
Copy link
Contributor

Describe the bug BinaryHeap has different result when implement top-k-frequent-elements

Steps to Reproduce https://github.com/char8x/binary-heap-different-result

Expected behavior

the result should be close to using array or using Heap

All 3 implementations will have different results after my fixes.

  • The reason BinaryHeap will be different from the heap npm module is that when removing values, the heap npm module is incorrectly swapping values when they are the same. See my last comment for more details on that.
  • There are a lot of entries with the same count. If the heap is full the top value will be popped off and the values in the heap will get shifted around a bit. This results in the values that are the same not getting popped in the same order they were entered when using a heap.
  • The array implementation just sorts the array in place then slices off the last values. Depending on the sort algorithm used, the values with the same count will be in a different order. If there are values with the same count that would end up in the first k values and outside the first k values, those values that make it in the first k will be different. This is why I would expect the array implementation to get you different results than the heap implementation you have in some cases. It would only be consistent when all values with the lowest count that can fit in the first k values, actually all fit in the first k values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

2 participants