-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in mboot when clustering at two variables #175
Comments
Hi Caroline,
Does the error remains when you just use dist_id as cluster variable?
Thanks
On Thu, May 18, 2023 at 15:49 cmjoyce ***@***.***> wrote:
Hi there,
I'm using the did package and need to account for clustering at the
district level, which is different from my idname (individuals residing in
these clusters). Based on the existing documentation, I've accounted for
individual and district level clustering. The code and error message are as
follows:
att_gt(yname = "outcome",
tname = "year",
gname = "g",
idname = "id",
xformla = ~ 1,
data = df,
panel = FALSE,
weightsname = "weight_adj",
clustervars = c("id", "dist_id"),
control_group = "notyettreated",
print_details = TRUE,
bstrap=TRUE, cband=FALSE
)
Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) :
can't handle that many cluster variables
I've tried making a vector of these variables and using that as my
clustervars, but that just errors out.
Is there a way to get around this error and account for both clustering
variables?
Thanks very much,
Caroline
—
Reply to this email directly, view it on GitHub
<#175>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABE7344GREUVLUUC4OMMRVTXG2DMXANCNFSM6AAAAAAYG6EXVM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
----------------------------------------------------
*Pedro H. C. Sant'Anna*
*https://psantanna.com <https://psantanna.com>*
|
Hi thanks for the quick response! There's no error when I use just dist_id as my clustering variable, though I end up with some very large (confusingly so) standard errors for some treatment groups-- especially if including individual-level covariates. But if clustering only on district gives correctly calculated standard errors I will assume the issue is on my end. Caroline |
@cmjoyce, sorry for the delayed response. I am surprised that you got an error with the first version that you sent. I am marking that as a bug as I think it should work. That being said, by default, we already cluster at the unit level (in your case "id"), so clustering on both ends up being redundant. This is not a fix for the large standard errors, but they are the ones that I think you were trying to get from the beginning. |
Yes, I think my clustering on two variables was redundant -- I tweaked some things and got it working. I limited my clustering to one variable to avoid the error message. |
Ok, great! Note to self: I am going to leave this open as I think this could be confusing for users. Need to think about what behavior should be if user provides includes "id" among the clustering variables. |
Hi, csdid_out <- att_gt(yname = "Y2it", I will appreciate any help on this. |
Time should not be used as cluster in a DiD procedure with with fixed T.
You cant make inference with 3 observations…
…----------------------------------------------------
*Pedro H. C. Sant'Anna*
*https://psantanna.com <https://psantanna.com>*
----------------------------------------------------
Warning: This email may contain confidential or privileged information
intended only for the use of the individual or entity to whom it is
addressed. If you are not the intended recipient, please understand
that any disclosure, copying, distribution, or use of the contents
of this email is strictly prohibited.
----------------------------------------------------
On Wed, Apr 3, 2024 at 16:37 kdjiffa ***@***.***> wrote:
Hi,
I am having a similar issue related to this post. I have balanced panel
data where I want to cluster at group and time level. I am using the
individual id variable in clustervars instead of the group variable as per
the documentation. I have 3 time periods (years), 3,000 observations per
period and 1,000 per group which amounts to 9,000 observations in total.
Below is my code and error
csdid_out <- att_gt(yname = "Y2it",
tname = "year",
gname = "first.treat",
idname = "id",
est_method = "reg",
data = data,
panel = TRUE,
clustervars = c("id", "year"),
control_group = "notyettreated",
bstrap = TRUE,
cband = FALSE,
)
Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) :
can't handle time-varying cluster variables
I will appreciate any help on this.
—
Reply to this email directly, view it on GitHub
<#175 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABE7344G7JILX2DOCCOSHG3Y3RSBNAVCNFSM6AAAAAAYG6EXVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGUZTMMZQHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks for your quick feedback. In fact, what I meant is group*period (intersection) level clustering. What is the best way to cluster at such level? |
Just use the id.
Thanks
…----------------------------------------------------
*Pedro H. C. Sant'Anna*
*https://psantanna.com <https://psantanna.com>*
----------------------------------------------------
Warning: This email may contain confidential or privileged information
intended only for the use of the individual or entity to whom it is
addressed. If you are not the intended recipient, please understand
that any disclosure, copying, distribution, or use of the contents
of this email is strictly prohibited.
----------------------------------------------------
On Wed, Apr 3, 2024 at 19:34 kdjiffa ***@***.***> wrote:
Thanks for your quick feedback. In fact, what I meant is group*period
(intersection) level clustering. What is the best way to cluster at such
level?
Thanks
—
Reply to this email directly, view it on GitHub
<#175 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABE7344DTJ3RXYXW2I5LOVDY3SGXLAVCNFSM6AAAAAAYG6EXVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHAYDONZRGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks |
Hi there,
I'm using the did package and need to account for clustering at the district level, which is different from my idname (individuals residing in these clusters). Based on the existing documentation, I've accounted for individual and district level clustering. The code and error message are as follows:
I've tried making a vector of these variables and using that as my clustervars, but that just errors out.
Is there a way to get around this error and account for both clustering variables?
Thanks very much,
Caroline
The text was updated successfully, but these errors were encountered: