-
-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_facet_array should work against views #448
Comments
It may be that some facet implementations ( |
I think this SQL recipe may work instead: select
*
from
ads_with_targets
where
'people_who_match:interests:African-American Civil Rights Movement (1954—68)' in (
select
value
from
json_each(target_names)
)
and 'interests:Martin Luther King III' in (
select
value
from
json_each(target_names)
) |
I think this code is wrong in the Lines 357 to 364 in 502c02f
|
select
j.value as value,
count(*) as count
from
(
select
id,
file,
clicks,
impressions,
text,
url,
spend_amount,
spend_currency,
created,
ended,
target_names
from
ads_with_targets
where
:p0 in (
select
value
from
json_each([ads_with_targets].[target_names])
)
)
join json_each(target_names) j
group by
j.value
order by
count desc,
value
limit
31 How can I return a count of the number of documents containing each tag, but not the number of total tags that match including duplicates? |
This looks like it might work: with inner as (
select
*
from
ads_with_targets
where
:p0 in (
select
value
from
json_each([ads_with_targets].[target_names])
)
),
deduped_array_items as (
select
distinct j.value,
inner.*
from
json_each([inner].[target_names]) j
join inner
)
select
value,
count(*)
from
deduped_array_items
group by
value
order by
count(*) desc |
It uses a CTE which were introduced in SQLite 3.8 - and AWS Lambda Python 3.9 still provides 3.7 - but I've checked and I can use |
I tried this and it seems to work correctly: for source_and_config in self.get_configs():
config = source_and_config["config"]
source = source_and_config["source"]
column = config.get("column") or config["simple"]
facet_sql = """
with inner as ({sql}),
deduped_array_items as (
select
distinct j.value,
inner.*
from
json_each([inner].{col}) j
join inner
)
select
value as value,
count(*) as count
from
deduped_array_items
group by
value
order by
count(*) desc limit {limit}
""".format(
col=escape_sqlite(column), sql=self.sql, limit=facet_size + 1
) The queries are very slow though - I had to bump up to 2s time limit even against only a view returning 3,499 rows. |
Actually with the cache warmed up it looks like the facet query is taking 150ms which is good enough. |
Also note that this demo data is using a SQL view to create the JSON arrays - the view is defined as such: CREATE VIEW ads_with_targets as
select
ads.*,
json_group_array(targets.name) as target_names
from
ads
join ad_targets on ad_targets.ad_id = ads.id
join targets on ad_targets.target_id = targets.id
group by
ad_targets.ad_id; So running JSON faceting on top of that view is a pretty big ask! |
Tests are failing and I think it's because the facets come back in different orders, need a tie-breaker. https://github.com/simonw/datasette/runs/4219325197?check_suite_focus=true |
I created this view: https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads-8dbda00/ads_with_targets
When I try to apply faceting by array it appears to work at first: https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads/ads_with_targets?_facet_array=target_names
But actually it's doing the wrong thing - the SQL for the facets uses rowid, but rowid is not present on views at all! These results are incorrect, and clicking to select a facet will fail to produce any rows: https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads/ads_with_targets?_facet_array=target_names&target_names__arraycontains=people_who_match%3Ainterests%3AAfrican-American+Civil+Rights+Movement+%281954%E2%80%9468%29
Here's the SQL it should be using when you select a facet (note that it does not use a rowid):
https://json-view-facet-bug-demo-j7hipcg4aq-uc.a.run.app/russian-ads?sql=select+*+from+ads_with_targets+where+id+in+%28%0D%0A++++++++++++select+ads_with_targets.id+from+ads_with_targets%2C+json_each%28ads_with_targets.target_names%29+j%0D%0A++++++++++++where+j.value+%3D+%3Ap0%0D%0A++++++++%29+limit+101&p0=people_who_match%3Ainterests%3ABlack+%28Color%29
So we need to do something a lot smarter here. I'm not sure what the fix will look like, or even if it's feasible given that views don't have a rowid to hook into so the JSON faceting SQL may have to be completely rewritten.
The text was updated successfully, but these errors were encountered: