-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collections: BinaryHeap has different result when implement top-k-frequent-elements #2522
Comments
@KyleJune Could you help look at this problem ? |
I'll take a look at it in the next few hours. |
I think the sorted array is expected to be different than both binary heap implementations since there are so many values in the array that have the same count. I'm working on identifying exactly what is causing a difference in behavior. There does appear to be a bug in the Deno implementation I made for std. I think I should be able to fix it this weekend. Once I identify the cause I'll make another comment describing it. |
I'm still investigating this, but I thought I'd report back that I found a bug in the heap node module's clone function. When you clone the heap it will return a new one with the default compare function instead of the same one as the original heap object being cloned. This will result in the returned values being different when popping them off a cloned heap. My deno binary heap implementation doesn't have that issue. In the output for the below test file, you can see it returns import { BinaryHeap } from "./binary_heap.ts";
import Heap from "https://esm.sh/[email protected]";
import { assertEquals } from "../testing/asserts.ts";
const compareObjects = (a, b) => a.count - b.count;
Deno.test("both heaps return values in the correct order", () => {
const denoMinHeap = new BinaryHeap(compareObjects);
const nodeMinHeap = new Heap(compareObjects);
const values = [
{ key: 0, count: 23 },
{ key: 1, count: 19 },
{ key: 2, count: 29 },
{ key: 3, count: 16 },
];
for (const value of values) {
denoMinHeap.push(value);
nodeMinHeap.push(value);
}
const denoValues = [];
const nodeValues = [];
for (let i = 0; i < values.length; i++) {
denoValues.push(denoMinHeap.pop());
nodeValues.push(nodeMinHeap.pop());
}
const expected = [
{ key: 3, count: 16 },
{ key: 1, count: 19 },
{ key: 0, count: 23 },
{ key: 2, count: 29 },
];
console.log("Deno:", denoValues);
console.log("Node:", nodeValues);
assertEquals(denoValues, expected);
assertEquals(nodeValues, expected);
});
Deno.test("cloned node heap returns values in the wrong order", () => {
let denoMinHeap = new BinaryHeap(compareObjects);
let nodeMinHeap = new Heap(compareObjects);
const values = [
{ key: 0, count: 23 },
{ key: 1, count: 19 },
{ key: 2, count: 29 },
{ key: 3, count: 16 },
];
for (const value of values) {
denoMinHeap.push(value);
nodeMinHeap.push(value);
}
denoMinHeap = BinaryHeap.from(denoMinHeap);
nodeMinHeap = nodeMinHeap.clone();
const denoValues = [];
const nodeValues = [];
for (let i = 0; i < values.length; i++) {
denoValues.push(denoMinHeap.pop());
nodeValues.push(nodeMinHeap.pop());
}
const expected = [
{ key: 3, count: 16 },
{ key: 1, count: 19 },
{ key: 0, count: 23 },
{ key: 2, count: 29 },
];
console.log("Deno:", denoValues);
console.log("Node:", nodeValues);
assertEquals(denoValues, expected);
assertEquals(nodeValues, expected);
}); Output:
|
I believe the bug is with the deno's binary heap push functions implementation. If I look at the internal structure and compare it against that of the heap npm package, it looks like deno's binary heap push implementation is doing an additional swap it shouldn't be doing.
This issue can end up affecting extraction, causing deno to return some values out of order. Below is a test case I wrote for demonstrating it. In it, it ends up incorrectly returning 5 before 4 when popping values off the heap. Deno.test("[collections/BinaryHeap] edge case 1", () => {
const minHeap = new BinaryHeap<number>(ascend);
minHeap.push(4, 2, 8, 1, 10, 7, 3, 6, 5);
assertEquals(minHeap.pop(), 1);
minHeap.push(9);
const expected = [2, 3, 4, 5, 6, 7, 8, 9, 10];
assertEquals([...minHeap], expected);
});
I'm working on fixing this issue. |
There were 2 issues and I fixed both in PR #2525.
Below are 2 quotes from the Binary Heap Wikipedia page. Steps for insertion:
Steps for extraction:
Here are all 3 edge case tests I wrote for collections/BinaryHeap. They all pass now with the fixes I've made to push and pop. Deno.test("[collections/BinaryHeap] edge case 1", () => {
const minHeap = new BinaryHeap<number>(ascend);
minHeap.push(4, 2, 8, 1, 10, 7, 3, 6, 5);
assertEquals(minHeap.pop(), 1);
minHeap.push(9);
const expected = [2, 3, 4, 5, 6, 7, 8, 9, 10];
assertEquals([...minHeap], expected);
});
Deno.test("[collections/BinaryHeap] edge case 2", () => {
interface Point {
x: number;
y: number;
}
const minHeap = new BinaryHeap<Point>((a, b) => ascend(a.x, b.x));
minHeap.push({ x: 0, y: 1 }, { x: 0, y: 2 }, { x: 0, y: 3 });
const expected = [{ x: 0, y: 1 }, { x: 0, y: 3 }, { x: 0, y: 2 }];
assertEquals([...minHeap], expected);
});
Deno.test("[collections/BinaryHeap] edge case 3", () => {
interface Point {
x: number;
y: number;
}
const minHeap = new BinaryHeap<Point>((a, b) => ascend(a.x, b.x));
minHeap.push(
{ x: 0, y: 1 },
{ x: 1, y: 2 },
{ x: 1, y: 3 },
{ x: 2, y: 4 },
{ x: 2, y: 5 },
{ x: 2, y: 6 },
{ x: 2, y: 7 },
);
const expected = [
{ x: 0, y: 1 },
{ x: 1, y: 2 },
{ x: 1, y: 3 },
{ x: 2, y: 5 },
{ x: 2, y: 4 },
{ x: 2, y: 6 },
{ x: 2, y: 7 },
];
assertEquals([...minHeap], expected);
}); Here are what the 3 different heaps from these test cases look like before the values are removed for comparing their pop order. The heaps look the same for both the std/collections BinaryHeap and the heap npm module.
Below are the equivalent test cases for the heap npm module. The first edge case passes but the second 2 fail because heap is incorrectly doing swaps when the parent and child nodes are equal when compared. Reason why for edge case 2 failing: When you pop off the Reason why for edge case 3 failing: When you pop off the import { assertEquals } from "https://deno.land/[email protected]/testing/asserts.ts";
import { ascend } from "https://deno.land/[email protected]/collections/binary_heap.ts";
import Heap from "https://esm.sh/[email protected]";
Deno.test("[npm/Heap] edge case 1", () => {
const minHeap = new Heap<number>(ascend);
const values = [4, 2, 8, 1, 10, 7, 3, 6, 5];
for (const value of values) {
minHeap.push(value);
}
assertEquals(minHeap.pop(), 1);
minHeap.push(9);
const actual = [];
for (let i = 0; i < values.length; i++) {
actual.push(minHeap.pop());
}
const expected = [2, 3, 4, 5, 6, 7, 8, 9, 10];
assertEquals(actual, expected);
});
Deno.test("[npm/Heap] edge case 2", () => {
interface Point {
x: number;
y: number;
}
const minHeap = new Heap<Point>((a, b) => ascend(a.x, b.x));
const values = [{ x: 0, y: 1 }, { x: 0, y: 2 }, { x: 0, y: 3 }];
for (const value of values) {
minHeap.push(value);
}
const actual = [];
for (let i = 0; i < values.length; i++) {
actual.push(minHeap.pop());
}
const expected = [{ x: 0, y: 1 }, { x: 0, y: 3 }, { x: 0, y: 2 }];
assertEquals(actual, expected);
});
Deno.test("[npm/Heap] edge case 3", () => {
interface Point {
x: number;
y: number;
}
const minHeap = new Heap<Point>((a, b) => ascend(a.x, b.x));
const values = [
{ x: 0, y: 1 },
{ x: 1, y: 2 },
{ x: 1, y: 3 },
{ x: 2, y: 4 },
{ x: 2, y: 5 },
{ x: 2, y: 6 },
{ x: 2, y: 7 },
];
for (const value of values) {
minHeap.push(value);
}
const actual = [];
for (let i = 0; i < values.length; i++) {
actual.push(minHeap.pop());
}
const expected = [
{ x: 0, y: 1 },
{ x: 1, y: 2 },
{ x: 1, y: 3 },
{ x: 2, y: 5 },
{ x: 2, y: 4 },
{ x: 2, y: 6 },
{ x: 2, y: 7 },
];
assertEquals(actual, expected);
});
|
All 3 implementations will have different results after my fixes.
|
Describe the bug
BinaryHeap has different result when implement top-k-frequent-elements
Steps to Reproduce
https://github.com/char8x/binary-heap-different-result
Expected behavior
the result should be close to using
array
or using HeapEnvironment
The text was updated successfully, but these errors were encountered: