Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing outcome plot for panelview function #581

Merged
merged 52 commits into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
be2e782
adding option to input the type of panelview
Aug 14, 2024
0f28434
changing the grid style
Aug 14, 2024
892826f
set development mode for mac
Aug 14, 2024
cb004dd
adding customizable figure size
Aug 14, 2024
2fbf0da
update did notebook with example
s3alfisc Aug 14, 2024
f14264a
adding subsamp feature to outcome plot
Aug 15, 2024
f0335f1
adding docstring
Aug 15, 2024
2cccb70
removing comments
Aug 15, 2024
5d88555
add outcome example for docstring
Aug 22, 2024
e4220bb
adding units to plot features, xlim, and ylim
Aug 24, 2024
d8a7ed3
changing the aggregated outcome value
Aug 24, 2024
d7fe68d
adding changes to the did documentation
Aug 24, 2024
cda98a8
adding docstring for xlim, ylim, and units_to_plot
Aug 24, 2024
2fe3ac6
adding several test
Aug 24, 2024
416d5f5
changing the test file
Aug 24, 2024
3159e5f
removing test ipynb
Aug 24, 2024
8b9b9f6
slight changes on the ipynb
Aug 24, 2024
541654e
changing import
Aug 24, 2024
64c94a5
revert back justfile
Aug 24, 2024
d30b060
Merge branch 'py-econometrics:master' into outcome_plot
rafimikail Aug 24, 2024
78b95b5
changing title for outcome plot
Aug 24, 2024
2fc90a2
Merge branch 'outcome_plot' of github.com:rafimikail/pyfixest into ou…
Aug 24, 2024
7cf44c4
re-run the notebook documentation
Aug 24, 2024
c6d67af
rerun the import
Aug 24, 2024
03f0f2d
Merge branch 'master' of github.com:rafimikail/pyfixest into outcome_…
Aug 25, 2024
2686b9f
rerun the did notebook
Aug 25, 2024
63b5f37
check did ipynb
Aug 25, 2024
4f6b98f
replacing local with the origin current did.ipynb
Aug 25, 2024
3b0f8d4
updating did ipynb
Aug 25, 2024
9bd63f7
Update visualize.py
s3alfisc Aug 25, 2024
db811bc
adding panelview to the docs
Aug 25, 2024
060f3c7
Merge branch 'outcome_plot' of github.com:rafimikail/pyfixest into ou…
Aug 25, 2024
596234f
set development mode
Aug 25, 2024
34b255a
fixing visualize.py
Aug 25, 2024
ccd6d20
adding include_groups=False
Aug 25, 2024
a882acf
removing include_groups
Aug 25, 2024
c217b50
rerun the did ipynb
Aug 25, 2024
c53da2a
revert justfile
Aug 25, 2024
eeb165b
Merge branch 'master' of github.com:rafimikail/pyfixest into outcome_…
Aug 26, 2024
e14a576
updating the difference-in-differences.ipynb
Aug 26, 2024
5b6e4d1
master to local
Sep 3, 2024
53529e5
Merge branch 'master' into outcome_plot
s3alfisc Sep 28, 2024
96fd898
drop type arg; functional refactoring; updated docs
s3alfisc Sep 28, 2024
1e7e770
export panelview, did function so that they can be loaded via pf.pane…
s3alfisc Sep 28, 2024
5d9cda4
Update did docs
s3alfisc Sep 28, 2024
e44e24d
tweaks
s3alfisc Sep 28, 2024
dcbd28b
removing type params from the test_visualize
Sep 29, 2024
6bff74a
removing include groups in _prepare_panelview_df_for_outcome_plot
Sep 30, 2024
3ebe621
removing outcome in test visualize
Sep 30, 2024
5a58514
importing the helper functions to test_visualize
Sep 30, 2024
aab3e9d
adding several tests template
Oct 1, 2024
87d12e1
adding full tests for the helper
Oct 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ quartodoc:
- report.etable
- report.coefplot
- report.iplot
- did.visualize.panelview
- title: Misc / Utilities
desc: |
PyFixest internals and utilities
Expand Down
1 change: 1 addition & 0 deletions docs/_sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ website:
- reference/report.etable.qmd
- reference/report.coefplot.qmd
- reference/report.iplot.qmd
- reference/did.visualize.panelview.qmd
section: Summarize and Visualize
- contents:
- reference/estimation.demean.qmd
Expand Down
156 changes: 155 additions & 1 deletion docs/difference-in-differences.ipynb

Large diffs are not rendered by default.

165 changes: 146 additions & 19 deletions pyfixest/did/visualize.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,21 @@
unit: str,
time: str,
treat: str,
type: Optional[str] = None,
outcome: Optional[str] = None,
collapse_to_cohort: Optional[bool] = False,
subsamp: Optional[int] = None,
sort_by_timing: Optional[bool] = False,
xlab: Optional[str] = None,
ylab: Optional[str] = None,
figsize: Optional[tuple] = (11, 3), # Default plot size
noticks: Optional[bool] = False,
title: Optional[str] = None,
legend: Optional[bool] = False,
ax: Optional[plt.Axes] = None,
xlim: Optional[tuple] = None,
ylim: Optional[tuple] = None,
units_to_plot: Optional[list] = None,
) -> None:
"""
Generate a panel view of the treatment variable over time for each unit.
Expand All @@ -32,6 +38,10 @@
The column name representing the time identifier.
treat : str
The column name representing the treatment variable.
type : str, optional
Optional type of plot. Currently supported: 'outcome'.
outcome : str, optional
The column name representing the outcome variable. Used when `type` is 'outcome'.
collapse_to_cohort : bool, optional
Whether to collapse units into treatment cohorts.
subsamp : int, optional
Expand All @@ -42,6 +52,8 @@
The label for the x-axis. Default is None, in which case default labels are used.
ylab : str, optional
The label for the y-axis. Default is None, in which case default labels are used.
figsize : tuple, optional
The figure size for the outcome plot. Default is (11, 3).
noticks : bool, optional
Whether to display ticks on the plot. Default is False.
title : str, optional
Expand All @@ -52,6 +64,12 @@
ax : matplotlib.pyplot.Axes, optional
The axes on which to draw the plot. Default is None, in which case a new figure
is created.
xlim : tuple, optional
The limits for the x-axis of the plot. Default is None.
ylim : tuple, optional
The limits for the y-axis of the plot. Default is None.
units_to_plot : list, optional
A list of unit to include in the plot. If None, all units in the dataset are plotted.

Returns
-------
Expand All @@ -62,8 +80,11 @@
```python
import pandas as pd
import numpy as np
from pyfixest.did.visualize import panelview

df_het = pd.read_csv("pd.read_csv("pyfixest/did/data/df_het.csv")

# Inspect treatment assignment
panelview(
data = df_het,
unit = "unit",
Expand All @@ -72,26 +93,132 @@
subsamp = 50,
title = "Treatment Assignment"
)

# Outcome plot
panelview(
data = df_het,
unit = "unit",
time = "year",
type = "outcome",
outcome = "dep_var",
treat = "treat",
subsamp = 50,
title = "Outcome Plot"
)
```
"""
rafimikail marked this conversation as resolved.
Show resolved Hide resolved
treatment_quilt = data.pivot(index=unit, columns=time, values=treat)
treatment_quilt = treatment_quilt.sample(subsamp) if subsamp else treatment_quilt
if collapse_to_cohort:
treatment_quilt = treatment_quilt.drop_duplicates()
if sort_by_timing:
treatment_quilt = treatment_quilt.loc[
treatment_quilt.sum(axis=1).sort_values().index
]
if not ax:
f, ax = plt.subplots()
cax = ax.matshow(treatment_quilt, cmap="viridis", aspect="auto")
f.colorbar(cax) if legend else None
ax.set_xlabel(xlab) if xlab else None
ax.set_ylabel(ylab) if ylab else None
if type == "outcome" and outcome:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we raise an informative error message when a user specifies type = "outcome" but does not provide an outcome variable?

I.e. we could do

if type == "outcome" and not outcome: 
   raise ValueError("You specified ... but ...")

We would then also add a test in test_error_warnings.py

if units_to_plot:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the units_to_plot argument, I would say that we should not allow subsampling and aggregation, right? Because if a user specifies specific units, does it make sense to still aggregate them / to subsample from them? (I think you could maybe argue for the first?). Still I would rather restrict this behavior.

data = data[data[unit].isin(units_to_plot)]
data_pivot = data.pivot(index=unit, columns=time, values=outcome)
if subsamp:
data_pivot = data_pivot.sample(subsamp)
if collapse_to_cohort:

def get_treatment_start(x: pd.DataFrame) -> pd.Timestamp:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can be more explicit here? E.g. change to

treat_bool = x[treat]
return x.iloc[treat_bool, time].min()

return x[x[treat]][time].min()

treatment_starts = (
data.groupby(unit)
.apply(get_treatment_start, include_groups=False)
.reset_index(name="treatment_start")
)
data = data.merge(treatment_starts, on=unit, how="left")
data_agg = (
data.groupby(["treatment_start", time], dropna=False)[outcome]
.mean()
.reset_index()
)
data_agg[treat] = data_agg.apply(
lambda row: row[time] >= row["treatment_start"]
if pd.notna(row["treatment_start"])
else False,
axis=1,
)
data_agg = data_agg.rename(columns={"treatment_start": unit})
data = data_agg.copy()
data_pivot = data_agg.pivot(index=unit, columns=time, values=outcome)
if not ax:
f, ax = plt.subplots(figsize=figsize, dpi=300)
for unit_id in data_pivot.index:
unit_data = data_pivot.loc[unit_id]
treatment_times = data[(data[unit] == unit_id) & (data[treat])][time]

# If the unit never receives treatment, plot the line in grey
if treatment_times.empty:
ax.plot(
unit_data.index,
unit_data.values,
color="#999999",
linewidth=0.5,
alpha=0.5,
)
else:
treatment_start = treatment_times.min()

# Plot the entire line with the initial color (orange), then change to red after treatment
ax.plot(
unit_data.index,
unit_data.values,
color="#FF8343",
linewidth=0.5,
label=f"Unit {unit_id}" if legend else None,
alpha=0.5,
)
ax.plot(
unit_data.index[unit_data.index >= treatment_start],
unit_data.values[unit_data.index >= treatment_start],
color="#ff0000",
linewidth=0.9,
alpha=0.5,
)

ax.set_xlabel(xlab if xlab else time)
ax.set_ylabel(ylab if ylab else outcome)
ax.set_title(
title if title else "Outcome over Time with Treatment Effect",
fontweight="bold",
)
ax.grid(True, color="#e0e0e0", linewidth=0.3, linestyle="-")
if xlim:
ax.set_xlim(xlim)

Check warning on line 184 in pyfixest/did/visualize.py

View check run for this annotation

Codecov / codecov/patch

pyfixest/did/visualize.py#L184

Added line #L184 was not covered by tests
if ylim:
ax.set_ylim(ylim)

Check warning on line 186 in pyfixest/did/visualize.py

View check run for this annotation

Codecov / codecov/patch

pyfixest/did/visualize.py#L186

Added line #L186 was not covered by tests
if legend:
custom_lines = [

Check warning on line 188 in pyfixest/did/visualize.py

View check run for this annotation

Codecov / codecov/patch

pyfixest/did/visualize.py#L188

Added line #L188 was not covered by tests
plt.Line2D([0], [0], color="#999999", lw=1.5),
plt.Line2D([0], [0], color="#FF8343", lw=1.5),
plt.Line2D([0], [0], color="#ff0000", lw=1.5),
]
ax.legend(

Check warning on line 193 in pyfixest/did/visualize.py

View check run for this annotation

Codecov / codecov/patch

pyfixest/did/visualize.py#L193

Added line #L193 was not covered by tests
custom_lines,
["Control", "Treatment (Pre)", "Treatment (Post)"],
loc="upper center",
bbox_to_anchor=(0.5, -0.15),
ncol=3,
frameon=False,
)
else:
treatment_quilt = data.pivot(index=unit, columns=time, values=treat)
treatment_quilt = (
treatment_quilt.sample(subsamp) if subsamp else treatment_quilt
)
if collapse_to_cohort:
treatment_quilt = treatment_quilt.drop_duplicates()
if sort_by_timing:
treatment_quilt = treatment_quilt.loc[
treatment_quilt.sum(axis=1).sort_values().index
]
if not ax:
f, ax = plt.subplots()
cax = ax.matshow(treatment_quilt, cmap="viridis", aspect="auto")
f.colorbar(cax) if legend else None
ax.set_xlabel(xlab) if xlab else None
ax.set_ylabel(ylab) if ylab else None

if noticks:
ax.set_xticks([])
ax.set_yticks([])
if title:
ax.set_title(title)
if noticks:
ax.set_xticks([])
ax.set_yticks([])
if title:
ax.set_title(title)
return ax
39 changes: 39 additions & 0 deletions tests/test_visualize.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,19 @@ def test_panelview():
assert isinstance(ax, plt.Axes)
plt.close()

# Test with basic functionality for outcome plot
ax = panelview(
data=df_het,
type="outcome",
outcome="dep_var",
unit="unit",
time="year",
treat="treat",
subsamp=50
)
assert isinstance(ax, plt.Axes)
plt.close()

# Test with collapse_to_cohort
ax = panelview(
data=df_het,
Expand All @@ -59,6 +72,32 @@ def test_panelview():
assert isinstance(ax, plt.Axes)
plt.close()

# Test with collapse_to_cohort for outcome plot
ax = panelview(
data=df_het,
type="outcome",
outcome="dep_var",
unit="unit",
time="year",
treat="treat",
collapse_to_cohort=True,
)
assert isinstance(ax, plt.Axes)
plt.close()

# Test with units_to_plot for outcome plot
ax = panelview(
data=df_het,
type="outcome",
outcome="dep_var",
unit="unit",
time="year",
treat="treat",
units_to_plot=[1, 2, 3, 4]
)
assert isinstance(ax, plt.Axes)
plt.close()

# Test with sort_by_timing
ax = panelview(
data=df_het,
Expand Down