Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't add impute for violin plot #5611

Closed
domoritz opened this issue Dec 2, 2019 · 9 comments · Fixed by #5617
Closed

Don't add impute for violin plot #5611

domoritz opened this issue Dec 2, 2019 · 9 comments · Fixed by #5617
Assignees
Labels
Milestone

Comments

@domoritz
Copy link
Member

domoritz commented Dec 2, 2019

{
  "data": {"url": "data/iris.json"},
  "transform": [
    {
      "fold": ["sepalLength", "sepalWidth",
      "petalLength", "petalWidth"],
      "as": ["feature", "value"]
    },
    {
      "density": "value",
      "extent": [0, 8],
      "groupby": ["feature"]
    }
  ],
  "mark": {"type": "area", "orient": "horizontal"},
  "encoding": {
    "column": {"type": "nominal", "field": "feature"},
    "x": {"type": "quantitative", "field": "density", "stack": "center"},
    "y": {"type": "quantitative", "field": "value"}
  },
  "width": 60
}

@jheer said: I took a closer look and the culprit is not a sorting issue, but rather the auto-magical inclusion of an impute transform that has no business being there. By default, Vega performs adaptive sampling to determine which points along the density curve to include. As this can result in different sample points for the different areas, their domain values should not be used together to perform imputation. @domoritz, @kanitw I think this needs to be fixed prior to a v4 release.

@domoritz domoritz modified the milestones: 4.0? (Maybe in 4.0), 4.x, 4.0 Dec 2, 2019
@domoritz
Copy link
Member Author

domoritz commented Dec 3, 2019

In the future, we will provide alignment to create violin plots. For now, we won't change anything about the spec above (I also don't know how what we would change) but we should allow disabling of imputation like this:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {"url": "data/iris.json"},
  "transform": [
    {
      "fold": ["sepalLength", "sepalWidth", "petalLength", "petalWidth"],
      "as": ["feature", "value"]
    },
    {"density": "value", "extent": [0, 8], "groupby": ["feature"]}
  ],
  "mark": {"type": "area", "orient": "horizontal"},
  "encoding": {
    "column": {"type": "quantitative", "field": "feature"},
    "x": {
      "type": "quantitative",
      "field": "density",
      "stack": "center",
      "impute": null
    },
    "y": {"type": "quantitative", "field": "value"}
  },
  "width": 60
}

@domoritz domoritz self-assigned this Dec 3, 2019
@jheer
Copy link
Member

jheer commented Dec 3, 2019

Not clear to me why imputation would be opt out rather than opt in. Also, if the same logic also applies to line marks we have the same problem for regression lines. In general it is simply not correct to assume that different groups within a groupby should have identical x/y domain values.

@domoritz
Copy link
Member Author

domoritz commented Dec 3, 2019

I'm looking into why we initially decided to add imputation by default. Maybe we can disable it.

@domoritz
Copy link
Member Author

domoritz commented Dec 3, 2019

We are currently only adding imputation when stacking path marks.

@domoritz
Copy link
Member Author

domoritz commented Dec 3, 2019

Here is what happens without imputation by default for stacked path marks:

image

And with imputation

image

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": { "url": "data/population.json"},
  "transform": [
    {"filter": "datum.year == 2000"},
    {"filter": "datum.age>50 || datum.sex == 2"},
    {"calculate": "datum.sex == 2 ? 'Female' : 'Male'", "as": "gender"}
  ],
  "mark": {"type": "area", "line": true},
  "encoding": {
    "y": {
      "aggregate": "sum", "field": "people", "type": "quantitative"
    },
    "x": {"field": "age", "type": "ordinal"},
    "color": {
      "field": "gender", "type": "nominal",
      "scale": {"range": ["#675193", "#ca8861"]}
    },
    "opacity": {"value": 0.7}
  }
}

Can you say more about why imputation should be opt-in for stacked path marks?

@jheer
Copy link
Member

jheer commented Dec 3, 2019

Ah that makes sense. Do we have anyway of knowing at compile time that a stack only has one entry, as in this violin case? Or, can we somehow use Vega’s xc channel instead of a stack transform?

@domoritz
Copy link
Member Author

domoritz commented Dec 3, 2019

Or, can we somehow use Vega’s xc channel instead of a stack transform?

I think that's the right thing to do in general but I'd like to defer this feature to 4.1. For now, we should support disabling imputation.

@domoritz
Copy link
Member Author

domoritz commented Dec 3, 2019

I have a fix in #5617 for now (which I think we should have either way).

We could know that there is only a single mark (if we don't encode color, opacity, detail, etc) but this would require more modifications that I want to do right now and the right solution is to center marks without stacking. We need to think a bit more about a design for that.

@domoritz
Copy link
Member Author

New spec with penguins: Open the Chart in the Vega Editor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants