Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"View all" option for facets, to provide a (paginated) list of ALL of the facet counts plus a link to view them #1080

Open
simonw opened this issue Nov 2, 2020 · 7 comments
Labels
enhancement faceting feature maybe-not-1.0 Things that might not need to block 1.0 after all
Milestone

Comments

@simonw
Copy link
Owner

simonw commented Nov 2, 2020

Can use /database/-/... namespace from #296

@simonw
Copy link
Owner Author

simonw commented Nov 2, 2020

URL design:

/database/table/-/facet/colname

And for other types of facet (to be supported later):

/database/table/-/facet/colname?_type=m2m

@simonw simonw added the feature label Nov 2, 2020
@simonw
Copy link
Owner Author

simonw commented Nov 2, 2020

Implementing pagination for facets will be interesting. Would be easier if I had a nicer reusable internal pagination mechanism, which is also needed for #856 (pagination of canned queries).

@simonw
Copy link
Owner Author

simonw commented Nov 2, 2020

Maybe this ends up being code that defers to a simulated canned query, rendered using the existing query.html template.

@simonw
Copy link
Owner Author

simonw commented Nov 2, 2020

For regular column faceting, here's the query that is used:

facet_sql = """
select {col} as value, count(*) as count from (
{sql}
)
where {col} is not null
group by {col} order by count desc, value limit {limit}
""".format(
col=escape_sqlite(column), sql=self.sql, limit=facet_size + 1
)

Since it uses order by count desc, value maybe those values could be used to implement cursor-based pagination.

That wouldn't be robust in the face of changing data, but I'm not sure it's possible to implement paginated faceting in a way that survives ongoing changes to the underlying data.

@simonw
Copy link
Owner Author

simonw commented Nov 2, 2020

Also relevant to this issue: #830 - redesigning the facet plugin hook in preparation for Datasette 1.0. And #972 supporting faceting against arbitrary queries.

@simonw
Copy link
Owner Author

simonw commented Feb 4, 2022

Had a new, different idea for how this could work: support a ?_group_count=colname parameter to the table view, which turns the page into a select colname, count(*) ... group by colname query - but keeps things like the filter interface, facet selection, search box and so on.

@simonw
Copy link
Owner Author

simonw commented Feb 4, 2022

An initial prototype of that in my local group-count branch quickly started running into problems:

diff --git a/datasette/views/table.py b/datasette/views/table.py
index be9e9c3..d30efe1 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -105,8 +105,12 @@ class RowTableShared(DataView):
                 type_ = "integer"
                 notnull = 0
             else:
-                type_ = column_details[r[0]].type
-                notnull = column_details[r[0]].notnull
+                try:
+                    type_ = column_details[r[0]].type
+                    notnull = column_details[r[0]].notnull
+                except KeyError: # Probably count(*)
+                    type_ = "integer"
+                    notnull = False
             columns.append(
                 {
                     "name": r[0],
@@ -613,6 +617,15 @@ class TableView(RowTableShared):
             offset=offset,
         )
 
+        # If ?_group_count we convert the SQL query here
+        group_count = request.args.getlist("_group_count")
+        if group_count:
+            wrapped_sql = "select {cols}, count(*) from ({sql}) group by {cols}".format(
+                cols=", ".join(group_count),
+                sql=sql,
+            )
+            sql = wrapped_sql
+
         if request.args.get("_timelimit"):
             extra_args["custom_time_limit"] = int(request.args.get("_timelimit"))

Resulted in errors like this one:

    pk_path = path_from_row_pks(row, pks, not pks, False)
  File "/Users/simon/Dropbox/Development/datasette/datasette/utils/__init__.py", line 82, in path_from_row_pks
    bits = [
  File "/Users/simon/Dropbox/Development/datasette/datasette/utils/__init__.py", line 83, in <listcomp>
    row[pk]["value"] if isinstance(row[pk], dict) else row[pk] for pk in pks
IndexError: No item with that key

@simonw simonw added the maybe-not-1.0 Things that might not need to block 1.0 after all label Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement faceting feature maybe-not-1.0 Things that might not need to block 1.0 after all
Projects
None yet
Development

No branches or pull requests

1 participant