Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Single Druid dimension spec error (part 2) #3920

Merged

Conversation

Mogball
Copy link
Contributor

@Mogball Mogball commented Nov 21, 2017

issue: #3908

As a follow up to #3796, where I forgot to change the condition on the second branch, which is just an else now.

Curious, why is there a branch to specifically handle a single dimension (which can't be a spec), with no having_filters and order_desc = True?

@kkalyan
Copy link
Contributor

kkalyan commented Nov 21, 2017

This fix works for me, thanks, @Mogball.

@mistercrunch
Copy link
Member

@fabianmenges touched this last, any comments? But there's a history of tweaks around edge cases here...

The PR seems right as we should end the if block with a catchall else: that falls back on a groupby query.

@fabianmenges
Copy link
Contributor

fabianmenges commented Nov 22, 2017

Curious, why is there a branch to specifically handle a single dimension (which can't be a spec), with no having_filters and order_desc = True?

I can bring some light into this. TopN queries in Druid are much faster than groupBy queries, which is why if you could use either one, your definitely want to use TopN (it gets mentioned in the groupBy spec).

TopN queries are much more limited than groupBy queries and also don't guarantee correctness. Their main limitations are that they can only handle a single dimension and you cannot specify the sort order.

The much better performance and the broad use case of TopN queries justify their use case. We run a very large Druid cluster (>300 machines) with very large individual datasets (>30TB) and it makes a huge difference.

@Mogball
Copy link
Contributor Author

Mogball commented Nov 22, 2017

Wasn't aware of that. Thanks ms for the info!

@mistercrunch
Copy link
Member

fixes #3943

@mistercrunch mistercrunch merged commit 6cbe0e6 into apache:master Nov 28, 2017
@ghulands
Copy link

I just pulled in this change and found that if you don't have a dimension in the group by, it will throw an exception.

2017-11-28 20:58:09,407:ERROR:root:'dimensions'
Traceback (most recent call last):
  File "/app/.heroku/python/lib/python2.7/site-packages/superset/viz.py", line 275, in get_payload2017-11-28T20:58:09.408059+00:00 orchardapp820973[app_siri-ops_superset-druid-monitoring.web.v14]:   File "/app/.heroku/python/lib/python2.7/site-packages/superset/viz.py", line 97, in get_df
    df = self.get_df()
    self.results = self.datasource.query(query_obj)
  File "/app/.heroku/python/lib/python2.7/site-packages/superset/connectors/druid/models.py", line 1077, in query
    client=client, query_obj=query_obj, phase=2)
  File "/app/.heroku/python/lib/python2.7/site-packages/superset/connectors/druid/models.py", line 865, in get_query_str
    return self.run_query(client=client, phase=phase, **query_obj)
  File "/app/.heroku/python/lib/python2.7/site-packages/superset/connectors/druid/models.py", line 1054, in run_query
    qry['dimensions'],
KeyError: 'dimensions'

michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.21.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.21.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants