Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort categorical Cartesian axes by value #3864

Merged
merged 13 commits into from
May 17, 2019
Merged

sort categorical Cartesian axes by value #3864

merged 13 commits into from
May 17, 2019

Conversation

antoinerg
Copy link
Contributor

@antoinerg antoinerg commented May 14, 2019

Closes #3606 by introducing the possibility to sort by values using categoryorder.

Most of the work involved writing tests to make sure everything works properly across all traces. Note that scattergl and splom required special handling since they differ from other traces.

Supported traces (:lock: down with tests of course):

  • scatter
  • bar
  • box
  • heatmap
  • histogram
  • histogram2d
  • contour
  • violin
  • funnel
  • waterfall
  • scattergl
  • splom
  • ohlc
  • candlestick
  • histogram2dcontour

Supported aggregations:

  • min
  • max
  • sum
  • mean (unweighted across all values)
  • median (unweighted across all values)

I chose aggregations (min, max and sum) because they are unambiguously defined regardless of the order of operation (the minimum of the traces' minimum is equal to the minimum of all values across all traces). Other operations such as "take the mean" are ambiguous: is it the mean of all values across all traces or the mean of each traces' means (at the moment we assume the former for mean/median). To be clear, we need the ability to specify how to aggregate at each step:

  1. how to aggregate multiple values belonging to a trace
  2. how to aggregate the aggregated value obtained in step 1

This will probably require making a new attribute. Do you have any suggestions for this? Related #3606 (comment)

Coverage:

  • support sorting matching axes
  • support sorting of both x and y axes

TODO:

  • splom: get rid indexOf calls

FUTURE WORK:

cc @nicolaskruchten @plotly/plotly_js

@antoinerg
Copy link
Contributor Author

antoinerg commented May 14, 2019

For reasons unknown, running the tests on CircleCI today resulted in a ridiculously low success rate 😞

@etpinard
Copy link
Contributor

, running the tests on CircleCI today resulted in a ridiculously low success rate

Have you rebased off master since #3859 got merged?

@antoinerg
Copy link
Contributor Author

, running the tests on CircleCI today resulted in a ridiculously low success rate

Have you rebased off master since #3859 got merged?

Yes I did!

@etpinard
Copy link
Contributor

Yes I did!

Cool. Looks like test-jasmine is responsible for most of the failures today. Maybe we should tag a few more of those tests as @flaky.

@etpinard etpinard added this to the v1.48.0 milestone May 14, 2019
src/plots/plots.js Outdated Show resolved Hide resolved
src/plots/plots.js Outdated Show resolved Hide resolved
src/plots/plots.js Outdated Show resolved Hide resolved
src/plots/plots.js Outdated Show resolved Hide resolved
src/traces/bar/calc.js Outdated Show resolved Hide resolved
}, {
"type": "bar",
"x": ["a", "b", "c"],
"y": [10, 20, 30],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I don't understand why the categories show up as [b, c, a]. Shouldn't the y values for the bar trace by considered also?

Copy link
Contributor Author

@antoinerg antoinerg May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I don't understand why the categories show up as [b, c, a]. Shouldn't the y values for the bar trace by considered also?

At the moment, not if they are on a different axis: only the traces associated with the axis to be sorted are accounted for. In this mock, the bar trace is on different axes namely (x|y)axis2. It ends up inheriting the order of categories of xaxis because it matches it but that's all. I thought this behavior would be useful.

We could account for the values of all traces across all matching axes IF we also support a way to specify traces to ignore as suggested by @nicolaskruchten (see #3606 (comment)) so that we can reproduce the current baseline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not if they are on a different axis: only the traces associated with the axis to be sorted are accounted for. In this mock, the bar trace is on different axes namely (x|y)axis2. It ends inheriting the order of categories of xaxis because it matches it but that's all. I thought this behavior would be useful.

Thanks for the explanation. That makes perfect sense 👍

@etpinard
Copy link
Contributor

Nicely done @antoinerg ! Test coverage looks impressive.

I would be ok leaving histogram2dcontour support out of this PR if it proves too tedious to add.

What I would like to see though: the sorting logic in https://github.com/plotly/plotly.js/blob/master/src/plots/cartesian/category_order_defaults.js getting moved to Plots.doCalcdata and possibly merged into your sortAxisCategoriesByValue routine. That way all things categoryorder will be handled in the same place, only during calc edits.

@antoinerg
Copy link
Contributor Author

I would be ok leaving histogram2dcontour support out of this PR if it proves too tedious to add.

I ended up adding support for histogram2dcontour in 76683ba !

src/plots/plots.js Outdated Show resolved Hide resolved
'sum': function(values) {return Lib.aggNums(function(a, b) { return a + b;}, null, values);},
'value': function(values) {return Lib.aggNums(function(a, b) { return a + b;}, null, values);},
'mean': function(values) {return Lib.mean(values);},
'median': function(values) {values.sort(); var mid = Math.round((values.length - 1) / 2); return values[mid];}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised our linter didn't complain here. Would you mind rewriting this as

'median': function(values) {
    values.sort(); 
    var mid = Math.round((values.length - 1) / 2); 
    return values[mid];
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably an eslint rule for this 🔬

src/lib/stats.js Outdated Show resolved Hide resolved
@antoinerg
Copy link
Contributor Author

As discussed in the original issue #3606 (comment), in the future, we will need to add new attributes to support different ways of sorting categories. @etpinard suggested that we create a new container to group all those attributes. I think it would be a good idea.

Right now, everything is stuffed into categoryorder such as category ascending, sum descending, etc... Instead of enumerating all the possible combinations, those two parameters (example category and ascending) could/should be specified via their own separate attribute. Whenever we split those two, we should also add additional attributes to refine sorting by value as discussed in #3606 (comment) and following comments. We could still support our current categoryorder by mapping them to an equivalent set of attributes in the new container.

Please let me know if you see any problems with this approach and whether the current state of this PR can be merged as is.

@etpinard
Copy link
Contributor

etpinard commented May 17, 2019

My take on #3864 (comment) :

Our category* attributes are already kind of a mess, moving towards a axis.category.* attribute container sounds inevitable. Per #3606 (comment), it looks like we haven't spec'ed out all the potential category attributes yet, so I think just filling in categoryorder is way to go for now. At some point, we can introduce axis.category.* and convert "old" categoryorder and catergoyarray values during cleanData.


All in all, this PR LGTM. Well-earned 💃

@antoinerg antoinerg merged commit dba8f55 into master May 17, 2019
@antoinerg antoinerg deleted the pr-sort-by-value branch May 17, 2019 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

value ascending / value descending
2 participants