-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataTable sort by function (or other callable) #3090
DataTable sort by function (or other callable) #3090
Conversation
The `DataTable` widget now takes the `by` argument instead of `columns`, allowing the table to also be sorted using a custom function (or other callable). This is a breaking change since it requires all calls to the `sort` method to include an iterable of key(s) (or a singular function/callable). Covers Textualize#2261 using [suggested function signature](Textualize#2512 (comment)) from @darrenburns on PR Textualize#2512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - looks good to me!
@willmcgugan thoughts on the API?
FWIW, I really like this idea. Right now I just have a 0 width hidden column that I change to adjust sort order, but that's awfully clunky. |
@misterek Very smart workaround, don't think I'd have thought of that 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A suggestion re the API for your consideration.
src/textual/widgets/_data_table.py
Outdated
@@ -2107,13 +2107,15 @@ def _get_fixed_offset(self) -> Spacing: | |||
|
|||
def sort( | |||
self, | |||
*columns: ColumnKey | str, | |||
by: Iterable[ColumnKey | str] | Callable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than a breaking change, could we keep the positional arguments, but add a key
argument. This feels like it would be closer to sorting lists, and would be less astonishing to the dev.
If both columns and a key function are specified, then I would think that would be an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both columns and a key function are specified, then I would think that would be an error.
This feels confusing to me to be honest - it means we'll have a method with two parameters controlling the same thing (how the data is sorted) that can't be supplied together. If we keep them as a single param then the method signature/typing conveys that they can't both be supplied. By having separate params, we're opening up an avenue for people to use the API incorrectly which didn't exist before. If we really want to avoid a breaking change my vote would be for a new method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explore how this change may work.
With this change:
table.sort()
Sort by default column order.
table.sort(("foo",))
Sort by single column (feels clumsy, even if it was a list and not a tuple of one.
table.sort(("foo", "bar"))
Still feels clumsy to me. The double nesting of parenthesis would be a double take.
table.sort(my_key)
# Sort by key, clear enough.
With positional args, and key parameter:
table.sort()
table.sort("foo")
I think this is clear that it sorts one column.
table.sort("foo", "bar")
Sorts two columns
table.sort(key=my_key)
Sorts by key.
I hear what you are saying about the ambiguity of specifying both columns and a key.
How about, if you specify columns and a key, then the key gets a tuple of the specified columns, and not the entire row.
I feel this is actually useful, and not a gimmick to get around @darrenburns objection.
It will allow us to have a key function that isn't tied to the number of columns. At the moment, if we add a column or change their order, our key function could become outdated. Consider this:
def sort_foo_bar(row:tuple[str, str]) -> str:
return "".join(row).lower()
table.sort("foo", "bar", key=sort_foo_bar)
Since we have specified the columns for the key function, it won't change the behaviour if we change the order of the columns, or add more columns.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a really good idea, I like it - good point about the key function potentially becoming outdated after adding columns too 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huzzah!
@joshbduncan Are you ok to make that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willmcgugan, I like this solution as well and can get it implemented. Thanks!
Changed back to orinal `columns` argument and added a new `key` argument which takes a function (or other callable). This allows the PR to NOT BE a breaking change.
@willmcgugan and @darrenburns, I have made the updates you suggested. The def sort(
self,
*columns: ColumnKey | str,
key: Callable | None = None,
reverse: bool = False,
) -> Self:
... When both _key = key
if key and columns:
def _key(row):
return key(itemgetter(*columns)(row[1]))
ordered_rows = sorted(
self._data.items(),
key=_key if _key else sort_by_column_keys,
reverse=reverse,
) Lastly, did make a change to the following line. Since the method now accepts self._row_locations = TwoWayDict(
{row_key: new_index for new_index, (row_key, _) in enumerate(ordered_rows)}
) |
- Updated the example file for the docs to better show the functionality of the change (especially when using `columns` and `key` together). - Added one new tests to cover a similar situation to the example changes
- the sort by clicked column function was bloat in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly there. In nit-picking territory now.
src/textual/widgets/_data_table.py
Outdated
@@ -2108,12 +2108,17 @@ def _get_fixed_offset(self) -> Spacing: | |||
def sort( | |||
self, | |||
*columns: ColumnKey | str, | |||
key: Callable | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we narrow the typing for this Callable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since key
can accept and return such a wide range of types were you thinking something like key: Callable[..., Any]
or key: Callable[[Any], Any]
? Neither are very "narrowing" but seem to be pretty common practice from what I have seen elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the key function take a row, which is given as a list of values? In that case, wouldn't this be more precise?
Callable[list[Any], Any]
I was hoping there was some kind of typing.SupportsCompare
, but there doesn't seem to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the args going to key
won't be a list as args come in as a tuple. And no matter the type, it seems the first type in the Callable typing has to be inside of brackets unless it is ...
. So, Callable[[Any], Any]
, or Callable[[list[Any], Any]
, or `Callable[[tuple[Any]], Any], and so on...
And to further complicate things, since itemgetter is used when columns are provided, either a single value CellType
or tuple of multiple values CellType
(when someone passes multiple columns to the sort
function) are getting passed to the function making me lean towards Any
.
Unless I am missing something (which is very possible), I think the options are Callable[[Any], Any]
of Callable[[tuple[Any]], Any]
. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please forgive my ignorance if this is out of line, would an @overload
be appropriate to help narrow the type? Roughly mocked, it could look like the following:
@overload
def sort(
self,
columns: ColumnKey | str,
*,
key: Callable[[CellType], Any] | None = None,
reverse: bool = False,
) -> Self:
...
@overload
def sort(
self,
*columns: ColumnKey | str,
key: Callable[[tuple[CellType, ...]], Any] | None = None,
reverse: bool = False,
) -> Self:
...
def sort(
self,
*columns: ColumnKey | str,
key: Callable[[Any], Any] | None = None,
reverse: bool = False,
) -> Self:
# Implementation
Though the type hints I'm receiving in test_data_table.py
are appropriately enhanced and the new/updated test_sort_...()
tests are still passing on my machine, there are two potential issues with my naive approach that I can see immediately:
- These overloads are just fancy lies and I don't think the relocation of the
*
in the fist/singular@overload
is good behavior. My type checker is yelling at me in the_data_table.py
file on the implementation ofsort()
thatOverloaded implementation is not consistent with signature of overload 1
due to the movement of the*
between the singular overload and the implementation. This motion hides the fact that the singularcolumns: ColumnKey | str
param is actually getting wrapped up into a tuple by the implementation's*columns: ColumnKey | str
. - The first
@overload
, in its singular form, may become the first completion suggestion and how a new user first interacts withsort()
. This may lead them to the incorrect conclusion that, despitecolumns
being plural and clear documentation,sort()
is limited to a single column input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly haven't used the @overload
decorator much and that would probably be a directional decision for the maintainers.
As for the Callable
typing... I have looked into it (too much) and there doesn't seem to be a general consensus. After lots of GitHub sleuthing, I have found most packages just go with Callable[[Any], Any]
or Callable[..., Any]
which is equivalent to just Callable
. And I can't come up with anything better...
On the inputs/parameters side of the callable type, CellType
is loosely defined so no matter if you use that, it just resolves to Any
anyway.
And on the return side of the callable, since the method is called sort
, I think it is implied that the return values should be "comparable", but that obviously leaves many options (which is probably best typed with Any
).
So, all that to say, 🤷♂️...
src/textual/widgets/_data_table.py
Outdated
_key = key | ||
if key and columns: | ||
|
||
def _key(row): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So row
is a tuple of row_key and row data?
Do we need the row key in there? I'm struggling to imagine a scenario where that might be neccesary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow... The original code provided a tuple of tuple[RowKey, dict[ColumnKey | str, CellType]]
to the default sort_by_column_keys()
function, then after sorting, it uses the RowKey
part of the now sorted tuples to update the _row_locations
dictionary. I just followed that same logic when adding the catch for key
and columns
. I could re-write the catch to be more clear and in line with the default sort_by_column_keys()
function like below.
if key and columns:
def _key(row):
_, row_data = row
return key(itemgetter(*columns)(row_data))
Let me know if I am missing what you are point out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how the row_key
is used now. The key function gets a tuple of row key, and row data, but it discards the row key (which you unpack as _
).
I think it would be surprising to have the row key there (and its not documented in the signature). Could we allow the key function to accept just the row data? That's what I would expect to be the default.
I guess you would need to wrap the key function. Something like the following:
def key_wrapper(row):
row_key, row_data = row
return (row_key, key(row_data))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, can I back up a bit and get your thoughts to make sure we are on the same page... I think we are working toward the same thing since I am already "wrapping" the key function to only send the row_data to the original user-supplied key
function...
First things first. I should have asked this in the beginning, I guess. The _data_table.sort()
API in main
raises an exception if no columns
are passed since the base sort_by_column_keys()
function uses itemgetter
to get each item from columns
. When a key
is provided should the same logic be followed, or should all row data be sent to the provided key
function?
If the same exception should be raised if no columns
are provided, then this "wraps" the user-supplied key
function and only sends the actual row data.
_key = key
if key:
def _key(row: tuple[RowKey, dict[ColumnKey | str, CellType]]) -> Any:
_, row_data = row
return key(itemgetter(*columns)(row_data))
If all data should be sent in the absence of columns
, then a ternary to send the correct data should do the trick.
_key = key
if key:
def _key(row):
_, row_data = row
return key(itemgetter(*columns)(row_data) if columns else tuple(row_data.values()))
In both cases, only the actual row_data is being passed to the user-supplied key
function, the only reason for the row_key
tagging along is to be able to update the TwoWayDict
the API has set up to track all of the data. To my knowledge, if you don't send along with the RowKey
to the built-in sorted
method, you will have to do more work to reconstruct the self._row_locations
dict after sorting.
ordered_rows = sorted(
self._data.items(),
key=_key if key is not None else sort_by_column_keys,
reverse=reverse,
)
self._row_locations = TwoWayDict(
{row_key: new_index for new_index, (row_key, _) in enumerate(ordered_rows)}
)
src/textual/widgets/_data_table.py
Outdated
_key = key | ||
if key and columns: | ||
|
||
def _key(row): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how the row_key
is used now. The key function gets a tuple of row key, and row data, but it discards the row key (which you unpack as _
).
I think it would be surprising to have the row key there (and its not documented in the signature). Could we allow the key function to accept just the row data? That's what I would expect to be the default.
I guess you would need to wrap the key function. Something like the following:
def key_wrapper(row):
row_key, row_data = row
return (row_key, key(row_data))
src/textual/widgets/_data_table.py
Outdated
ordered_rows = sorted( | ||
self._data.items(), key=sort_by_column_keys, reverse=reverse | ||
self._data.items(), | ||
key=_key if _key else sort_by_column_keys, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have an explicit check for is not None
here. For the (admittedly unlikely) scenario of a callable object with a __bool__
that returns False.
Using this new API, is there a way I could sort by multiple columns that each need a different key? For example, say I have a table of DNS A records: one column of FQDNs and another of IPv4 addresses (both represented as |
@aaronst, here's the same example code edited to fit your description. There are two options "F" sorts by the FQDN first, then the IPv4 address. "I" does the reverse, sorting the rows by IPv4 first then by the FQDN. I put a few of the same FQDN names so you can that it does sort the IPv4 after the name. Let me know if this is what you were looking for? import ipaddress
from random import randint
from rich.text import Text
from textual.app import App, ComposeResult
from textual.widgets import DataTable, Footer
def random_ip() -> str:
return f"{randint(0, 256)}.{randint(0, 256)}.{randint(0, 256)}.{randint(0, 256)}"
ROWS = [
("DNS A", "FQDN", "IPv4", "country"),
(random_ip(), "Joseph Schooling", random_ip(), Text("Singapore", style="italic")),
(random_ip(), "Michael Phelps", random_ip(), Text("United States", style="italic")),
(random_ip(), "Joseph Schooling", random_ip(), Text("South Africa", style="italic")),
(random_ip(), "László Cseh", random_ip(), Text("Hungary", style="italic")),
(random_ip(), "Li Zhuhao", random_ip(), Text("China", style="italic")),
(random_ip(), "László Cseh", random_ip(), Text("France", style="italic")),
(random_ip(), "Tom Shields", random_ip(), Text("United States", style="italic")),
(random_ip(), "Li Zhuhao", random_ip(), Text("Russia", style="italic")),
(random_ip(), "Li Zhuhao", random_ip(), Text("Scotland", style="italic")),
]
class TableApp(App):
BINDINGS = [
("f", "sort_fqdn_first", "Sort by FQDN then IPv4"),
("i", "sort_ipv4_first", "Sort by IPv4 then FQDN"),
]
current_sorts: set = set()
def compose(self) -> ComposeResult:
yield DataTable()
yield Footer()
def on_mount(self) -> None:
table = self.query_one(DataTable)
for col in ROWS[0]:
table.add_column(col, key=col)
table.add_rows(ROWS[1:])
def sort_reverse(self, sort_type: str):
"""Determine if `sort_type` is ascending or descending."""
reverse = sort_type in self.current_sorts
if reverse:
self.current_sorts.remove(sort_type)
else:
self.current_sorts.add(sort_type)
return reverse
def action_sort_fqdn_first(self) -> None:
"""Sort by FQDN (alphabetically) then by IPv4 (numerically)."""
def custom_sort(row_data):
fqdn, ipv4 = row_data
return (fqdn, ipaddress.ip_address(ipv4))
table = self.query_one(DataTable)
table.sort(
"FQDN",
"IPv4",
key=custom_sort,
reverse=self.sort_reverse("fqdn"),
)
def action_sort_ipv4_first(self) -> None:
"""Sort by FQDN (alphabetically) then by IPv4 (numerically)."""
def custom_sort(row_data):
ipv4, fqdn = row_data
return (ipaddress.ip_address(ipv4), fqdn)
table = self.query_one(DataTable)
table.sort(
"IPv4",
"FQDN",
key=custom_sort,
reverse=self.sort_reverse("ipv4"),
)
app = TableApp()
if __name__ == "__main__":
app.run() |
@joshbduncan thanks! I believe this does cover the use case I have. Essentially, |
@aaronst, yep you've got it. Essentially, the |
|
Renamed |
I suspect the behaviour we want for the
As it stands, raising an exception on @willmcgugan Can you confirm this is what you had in mind? |
Yep, getting all row data back would be the behavior I would expect as a consumer. |
I concur. If you don't specify column(s), you get the whole row. |
@darrenburns and @willmcgugan, two questions?
# explicit if/else statement
def key_wrapper(row: tuple[RowKey, dict[ColumnKey | str, CellType]]) -> Any:
_, row_data = row
if columns:
return key(itemgetter(*columns)(row_data))
else:
return key(tuple(row_data.values()))
# or inline if statement
def key_wrapper(row: tuple[RowKey, dict[ColumnKey | str, CellType]]) -> Any:
_, row_data = row
return key(
itemgetter(*columns)(row_data) if columns else tuple(row_data.values())
)
|
So, if everyone concurs that all row data should be returned when no columns are passed, I think you could just get rid of the default def key_wrapper(row: tuple[RowKey, dict[ColumnKey | str, CellType]]) -> Any:
_, row_data = row
if columns:
result = itemgetter(*columns)(row_data)
else:
result = tuple(row_data.values())
if key is not None:
return key(result)
return result
ordered_rows = sorted(
self._data.items(),
key=key_wrapper,
reverse=reverse,
) I also merged in the latest main. I am not sure why the snapshot tests are failing thought as none of them seem to have anything to do with my changes to DataTable. |
…s test coverage of the case where a key, but no columns, is passed to DataTable.sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I've made some changes to the tests as some things could be considered redundant (e.g. testing using both a lambda and a function as a key), and also fixed a changelog conflict. Playing around with it, it seems to work well too.
Also, to answer your earlier question about inline vs block if/else - we don't really mind as for simple cases it's usually very subjective.
@willmcgugan could you unblock this when you get a few minutes?
Thanks @joshbduncan - this is going out in the next release (today)! |
@darrenburns, thanks again for your help! Happy Halloween 👻 |
This solution does not solve the use case where the sort criteria are not present in the row data. I need some way to link a row to the actual ordering I want. I could encode the sort criteria in the row key string (for instance), if I could access that in my sort function. A better solution (for me at least) would be a hidden column. I could put the sorting value in a column the user does not see. |
Your sort function could reference any external data store/source as long as some portion of your row data could be used as the lookup value. I don't know your exact use case but I could see a separate dictionary that holds the sorting data that you could access from inside of your sort function. |
Thanks, I am considering that as my only option, but it can get complex for a large, dynamically updated table. |
add_column with a width=0 should do it I would think. |
Yes @TomJGooding, previously (before this PR merge), I used this exact approach. It is basically a hidden column with the data you need to sort by and it is a great work-a-round that makes sorting very easy with no need for separate functions. That would be the route I would take. |
Thanks, I would never have thought of that! |
Please review the following checklist.
After running into the issue of sorting a
DataTable
that includesRich.Text
objects a few times, and seeing a few issues and responses on the Textual Discord, I figured I would attempt to implement a "key-like" argument for thesort
method. This has been discussed in #2243, and #2261, and I took the suggestedsort
method argument ofby
from @darrenburns comment (#2512 (comment)).I have an example app below (also included in the docs) that demos using the new functionality a few different ways.
action_sort_by_average_time()
uses a custom sort functionaction_sort_by_last_name()
uses a custom sort lambdaactionsort_by_country()
sorts a column containingRich.Text
objectsAll three examples can be accessed via a binding which you will see in the footer.
This functionality also works great with the
on_data_table_header_selected()
method, allowing you to apply different sorts depending on the clicked column header viaevent.label
.Finally, this is a breaking change since it requires all calls to the
sort
method to include an iterable of key(s) (or a singular function/callable). I have updated all currentDataTable
tests with iterables for the first argument and also added new tests for the expanded functionality.