Skip to content

Commit

Permalink
fixes ggplot histogram to match R's version (#754)
Browse files Browse the repository at this point in the history
  • Loading branch information
bbeat2782 authored Jul 26, 2023
1 parent 0b7c03a commit 0233eb0
Show file tree
Hide file tree
Showing 50 changed files with 27 additions and 10 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## 0.9.0dev
* [Fix] Fix error that was incorrectly converted into a print message
* [API Change] A shorter name is now assigned as an alias by default
* [Fix] Modified histogram query to ensure histogram binning is done correctly (#751)
* [Fix] Fix bug that caused the `COMMIT` not to work when the SQLAlchemy driver did not support `set_isolation_level`
* [Fix] Fixed vertical color breaks in histograms (#702)

Expand Down
8 changes: 7 additions & 1 deletion doc/api/magic-plot.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ jupytext:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.14.6
jupytext_version: 1.14.7
kernelspec:
display_name: Python 3 (ipykernel)
language: python
Expand Down Expand Up @@ -139,6 +139,12 @@ generate histograms without explicitly removing NULL entries.
%sqlplot histogram --table penguins.csv --column body_mass_g
```

When plotting a histogram, it divides a range with the number of bins - 1 to calculate a bin size. Then, it applies round half down relative to the bin size and categorizes continuous values into bins to replicate right closed intervals from the ggplot histogram in R.

![body_mass_g](../static/body_mass_g_R.png)

+++

### Number of bins

```{code-cell} ipython3
Expand Down
Binary file added doc/static/body_mass_g_R.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 3 additions & 1 deletion src/sql/ggplot/ggplot.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,11 @@ def _draw(self, other) -> mpl.figure.Figure:
ax_ = self.figure.add_subplot(n_rows, n_cols, i + 1)
facet_key_val = {"key": other.facet, "value": value[0]}
self.geom.draw(self, ax_, facet_key_val)
handles, labels = ax_.get_legend_handles_labels()
ax_.set_title(value[0])
ax_.tick_params(axis="both", labelsize=7)
ax_.legend(prop={"size": 10})
# reverses legend order so alphabetically first goes on top
ax_.legend(handles[::-1], labels[::-1], prop={"size": 10})
if other.legend is False:
plt.legend("", frameon=False)
self.axs.append(ax_)
Expand Down
23 changes: 16 additions & 7 deletions src/sql/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -455,7 +455,9 @@ def histogram(
bottom += values_

ax.set_title(f"Histogram from {table!r}")
ax.legend()
# reverses legend order so alphabetically first goes on top
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1])
elif isinstance(column, str):
bin_, height, bin_size = _histogram(
table, column, bins, with_=with_, conn=conn, facet=facet
Expand Down Expand Up @@ -534,12 +536,13 @@ def _histogram(table, column, bins, with_=None, conn=None, facet=None):
f"bins are '{bins}'. Please specify a valid number of bins."
)

# Use bins - 1 instead of bins and round half down instead of floor
# to mimic right-closed histogram intervals in R ggplot
range_ = max_ - min_
bin_size = range_ / bins

bin_size = range_ / (bins - 1)
template_ = """
select
floor("{{column}}"/{{bin_size}})*{{bin_size}} as bin,
ceiling("{{column}}"/{{bin_size}} - 0.5)*{{bin_size}} as bin,
count(*) as count
from "{{table}}"
{{filter_query}}
Expand Down Expand Up @@ -595,9 +598,14 @@ def _histogram_stacked(
conn = sql.connection.ConnectionManager.current

cases = []
tolerance = bin_size / 1000 # Use to avoid floating point error
for bin in bins:
case = f'SUM(CASE WHEN FLOOR({column}/{bin_size})*{bin_size} = {bin} \
THEN 1 ELSE 0 END) AS "{bin}",'
# Use round half down instead of floor to mimic
# right-closed histogram intervals in R ggplot
case = (
f"SUM(CASE WHEN ABS(CEILING({column}/{bin_size} - 0.5)*{bin_size} "
f"- {bin}) <= {tolerance} THEN 1 ELSE 0 END) AS '{bin}',"
)
cases.append(case)

cases = " ".join(cases)
Expand All @@ -614,7 +622,8 @@ def _histogram_stacked(
{{cases}}
FROM "{{table}}"
{{filter_query}}
GROUP BY {{category}};
GROUP BY {{category}}
ORDER BY {{category}} DESC;
"""
)
query = template.render(
Expand Down
Binary file modified src/tests/baseline_images/test_ggplot/facet_wrap_custom_fill.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_ggplot/facet_wrap_default.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_ggplot/facet_wrap_nulls_data.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_ggplot/histogram_custom_color.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_ggplot/histogram_custom_fill.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_ggplot/histogram_default.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_ggplot/histogram_stacked_default.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_ggplot/histogram_with_default.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_magic_plot/hist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified src/tests/baseline_images/test_magic_plot/hist_bin.png
Binary file modified src/tests/baseline_images/test_magic_plot/hist_custom.png
Binary file modified src/tests/baseline_images/test_magic_plot/hist_null.png
Binary file modified src/tests/baseline_images/test_magic_plot/hist_two.png

0 comments on commit 0233eb0

Please sign in to comment.