Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve sorting when filtering datasets #2834

Merged
merged 3 commits into from
Jul 2, 2018
Merged

Conversation

philippotto
Copy link
Member

@philippotto philippotto commented Jun 29, 2018

I used the dice coefficient to sort the datasets by best match. This works quite well for the suffix example which covers probably 95% of the cases where this is needed. I'm a bit concerned about performance, since the sorting resulted in a noticable lag for 1000 datasets. However, my datasets all had almost the same name. So, if the search query filters the amount of datasets considerably, the lag should be reduced, as well. In the end, the lab should just give final feedback (after deployment, since the dev server doesn't have many datasets). For the copy&paste use case it should definitely be enough.

Mailable description of changes:

  • When using the search functionality in the datasets view, the datasets will be sorted so that the best match is shown first. If a different sorting is desired, the sorting-arrows in the columns can still be used to change the sorting criteria.

URL of deployed dev instance (used for testing):

Steps to test:

  • open dashboard (default sort order should be "created")
  • sort by a different criteria
  • type something in the search bar (blue sorting-arrow should turn gray since the datasets are sorted by best match now)
  • sort again by another criteria --> should work even after refining the search query

Issues:


  • Ready for review

Copy link
Member

@daniel-wer daniel-wer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, works well. I'm alright with deploying it for performance feedback.

? _.chain(filteredDataSource)
.map(row => ({
row,
diceCoefficient: dice(row.name, this.props.searchQuery),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether this will result in strange results if the user searches for a word in the description. The search results are filtered using the name and description properties, but the diceCoefficient only takes into account the dataset name for sorting.
Including the description in the diceCoefficient will probably slow it down further, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I noticed the description is not even displayed in the advanced dataset view, so it's probably alright to remove "description" in line 66.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching for some word of the description doesn't work for me anyways, neither in the advanced nor on the spotlight dataset view, not sure why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching for some word of the description doesn't work for me anyways, neither in the advanced nor on the spotlight dataset view, not sure why.

Really? It works for me 🤔

I'm not sure whether this will result in strange results if the user searches for a word in the description. The search results are filtered using the name and description properties, but the diceCoefficient only takes into account the dataset name for sorting.

My assumption is that the search is only used for dataset name matching, anyway. In the rare case, that a user searches the description, the ordering might be suboptimal (or "random"?), but I don't think that's a showstopper, as the user's had to scroll through a long list before anyway. We can remove the description from the search parameters, but I'd leave it as is for now until someone complains. That way, we don't remove existing functionality from the view.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that the string Original data and segmentation is the description and I could search for parts of that, but that string is only a default string that is shown if there is no description. Setting a custom description and searching for it works as intended, all fine :)
Let's leave the description in there, your argumentation makes sense to me!

@philippotto philippotto merged commit eb08d0f into master Jul 2, 2018
@normanrz normanrz deleted the dice-sort-datasets branch July 2, 2018 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve dataset search/sorting
2 participants