Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10510: [Rust] [DataFusion] Benchmark COUNT(DISTINCT) queries. #8606

Closed
wants to merge 1 commit into from

Conversation

drusso
Copy link
Contributor

@drusso drusso commented Nov 6, 2020

ARROW-10510

This change adds benchmarks for COUNT(DISTINCT) queries. This is a small follow-up to ARROW-10043 / #8222. In that PR, a number of implementation ideas were discussed for follow-ups, and having benchmarks will help evaluate them.


There are two benchmarks added:

  • wide: all of the values are distinct; this is looking at worst-case performance
  • narrow: only a handful of distinct values; this is closer to best-case performance

The wide benchmark runs ~ 7x slower than the narrow benchmark.

@github-actions
Copy link

github-actions bot commented Nov 6, 2020

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nevi-me nevi-me closed this in eb42c50 Nov 7, 2020
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
[ARROW-10510](https://issues.apache.org/jira/browse/ARROW-10510)

This change adds benchmarks for `COUNT(DISTINCT)` queries. This is a small follow-up to [ARROW-10043](https://issues.apache.org/jira/browse/ARROW-10043) / apache#8222. In that PR, a number of implementation ideas were discussed for follow-ups, and having benchmarks will help evaluate them.

---

There are two benchmarks added:

* wide: all of the values are distinct; this is looking at worst-case performance
* narrow: only a handful of distinct values; this is closer to best-case performance

The wide benchmark runs ~ 7x slower than the narrow benchmark.

Closes apache#8606 from drusso/ARROW-10510

Authored-by: Daniel Russo <[email protected]>
Signed-off-by: Neville Dipale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants