Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More customizable etable() #306

Closed
s3alfisc opened this issue Feb 11, 2024 · 21 comments
Closed

More customizable etable() #306

s3alfisc opened this issue Feb 11, 2024 · 21 comments
Labels
good first issue Good for newcomers

Comments

@s3alfisc
Copy link
Member

It would be great if etable() would be even more customizable.

I.e. it would be nice to allow users to:

One way to allow this would be to add a custom_statistics argument to etable(), e.g. a pd.Series with a name and length equal to the number of models passed to etable().

@Wenzhi-Ding, would you maybe be willing to / interested in picking this up?

@s3alfisc s3alfisc added the good first issue Good for newcomers label Feb 11, 2024
@Wenzhi-Ding
Copy link
Contributor

Sounds great! I can work on this. I can't guarantee when I can finish, but will try as early as possible.

@s3alfisc
Copy link
Member Author

Sounds great! I can work on this. I can't guarantee when I can finish, but will try as early as possible.

Super cool, thank you! Of course there is absolutely no time pressure =)

@Wenzhi-Ding
Copy link
Contributor

Hi @s3alfisc I am working on this issue. This is my current realization:

etable([fit1, fit2], coef_fmt="b [conf_int_lb, conf_int_ub]", custom_statistics={
    "conf_int_lb": [fit1._conf_int[0], fit2._conf_int[0]],
    "conf_int_ub": [fit1._conf_int[1], fit2._conf_int[1]],
    })

image

Is this the way of input and output in your mind? Not sure I understand correctly.

@matthiaskaeding
Copy link

Thanks for the package!

Something else that would be nice would be functionality to select coefficients, maybe by regex or only by full match.

@Wenzhi-Ding
Copy link
Contributor

Thanks for the package!

Something else that would be nice would be functionality to select coefficients, maybe by regex or only by full match.

Hi @matthiaskaeding , could you elaborate more about this function? Maybe with some pseudo code / output so that I can make sure I understand correctly.

@Wenzhi-Ding
Copy link
Contributor

Thanks for the package!

Something else that would be nice would be functionality to select coefficients, maybe by regex or only by full match.

Do you mean use expressions to set the order of coefficients and maybe omitting some coefficients?

@matthiaskaeding
Copy link

Can add pseudo code later, but I meant omitting coefficients.

My use case was regressing y on treatment plus controls, and I wanted to have an table for the 'treatment' coefficient only. That might be pretty common.

@Wenzhi-Ding
Copy link
Contributor

Can add pseudo code later, but I meant omitting coefficients.

My use case was regressing y on treatment plus controls, and I wanted to have an table for the 'treatment' coefficient only. That might be pretty common.

Thanks! This is useful. I will add it in my next PR.

@s3alfisc
Copy link
Member Author

Hi @Wenzhi-Ding - this looks great! It is exactly what I had in mind, thank you =) I'm looking forward to the PR!

@matthiaskaeding, your feature request makes excellent sense, and I think it would be fairly easy to implement.
This might even be implemented in two lines - it might suffice to simply select the relevant coeffcients from the tidy() data frame created here

Thanks! This is useful. I will add it in my next PR.

Thanks for picking this up as well @Wenzhi-Ding =)

@matthiaskaeding
Copy link

Great, thanks!

For my use, something using a regex like its done in str_sub in the R-package str_sub would be most convenient:

etable(model, coeff_pattern="tr") # Show every coefficient matching the regex 'tr'
etable(model, coeff_pattern="tr", negate=True) # Omit every coefficient matching the regex 'tr'

@Wenzhi-Ding
Copy link
Contributor

Great, thanks!

For my use, something using a regex like its done in str_sub in the R-package str_sub would be most convenient:

etable(model, coeff_pattern="tr") # Show every coefficient matching the regex 'tr' etable(model, coeff_pattern="tr", negate=True) # Omit every coefficient matching the regex 'tr'

This grammar looks nice. I will implement in this way as well as an additional way to explicitly set which variables to be displayed and the order of variables.

@Wenzhi-Ding
Copy link
Contributor

Wenzhi-Ding commented Feb 21, 2024

Hi @matthiaskaeding, I draft a version for what you said. Please see whether this makes sense to you.

Drafted documentation:

coef_pat: list, optional
    The pattern for retaining coefficient names. Default is [], keeping all coefficients.
    You should use regular expressions to select coefficients.
    [
        "age",            # would keep all coefficients containing age
        "^tr",            # would keep all coefficients starting with tr
        "\d$",            # would keep all coefficients ending with number
    ]
    Output will be in the order of the patterns.
coef_pat_negate: bool, optional
    Whether to negate the pattern. Default is False.
    If True, the pattern will be negated, i.e., coefficients not matching the pattern will be kept.

Orginal output

etable([fit1, fit2, fit3], coef_fmt="b (se)\nt [p]")
                           est1            est2               est3
------------  -----------------  --------------  -----------------
depvar                        Y               Y                  Y
------------------------------------------------------------------
X1            -0.950*** (0.067)   0.003 (0.033)  -1.000*** (0.060)
                -14.277 [0.000]   0.100 [0.921]    -16.700 [0.000]
X2            -0.174*** (0.018)  -0.014 (0.011)
                 -9.471 [0.000]  -1.332 [0.183]
f2                                0.003 (0.004)
                                  0.781 [0.435]
Intercept                                         0.919*** (0.079)
                                                    11.622 [0.000]
------------------------------------------------------------------
f1                            x               x                  -
------------------------------------------------------------------
R2                        0.489               -              0.123
S.E. type                by: f1       by: f1+f2                iid
Observations               1994             997               1996
------------------------------------------------------------------

Example 1

etable([fit1, fit2, fit3], coef_fmt="b (se) \n t [p]", coef_pat=["X1", "cep"])
                           est1           est2               est3
------------  -----------------  -------------  -----------------
depvar                        Y              Y                  Y
-----------------------------------------------------------------
X1            -0.950*** (0.067)  0.003 (0.033)  -1.000*** (0.060)
                -14.277 [0.000]  0.100 [0.921]    -16.700 [0.000]
Intercept                                        0.919*** (0.079)
                                                   11.622 [0.000]
-----------------------------------------------------------------
f1                            x              x                  -
-----------------------------------------------------------------
R2                        0.489              -              0.123
S.E. type                by: f1      by: f1+f2                iid
Observations               1994            997               1996
-----------------------------------------------------------------

Example 2

etable([fit1, fit2, fit3], coef_fmt="b (se) \n t [p]", coef_pat=["cep"], coef_pat_negate=True)
                           est1            est2               est3
------------  -----------------  --------------  -----------------
depvar                        Y               Y                  Y
------------------------------------------------------------------
X1            -0.950*** (0.067)   0.003 (0.033)  -1.000*** (0.060)
                -14.277 [0.000]   0.100 [0.921]    -16.700 [0.000]
X2            -0.174*** (0.018)  -0.014 (0.011)
                 -9.471 [0.000]  -1.332 [0.183]
f2                                0.003 (0.004)
                                  0.781 [0.435]
------------------------------------------------------------------
f1                            x               x                  -
------------------------------------------------------------------
R2                        0.489               -              0.123
S.E. type                by: f1       by: f1+f2                iid
Observations               1994             997               1996
------------------------------------------------------------------

s3alfisc pushed a commit that referenced this issue Feb 21, 2024
* dev: add support for custom statistics in etable()

* dev: Fix bug

* fix: test coverage for etable customized statistics.

* fix: Fix bug in coverage test.

* fix: fix bug. wrong logic.

* fix: pass local test.
@s3alfisc
Copy link
Member Author

@Wenzhi-Ding - I just merged #b0503e0. Thanks so much for it! 🎉

@s3alfisc
Copy link
Member Author

And btw, this syntax makes excellent sense to me =)

etable([fit1, fit2, fit3], coef_fmt="b (se) \n t [p]", coef_pat=["X1", "cep"])

Once implemented, I will add a similar function argument to both the coefplot() function and method. One thing to potentially consider is the naming of the function argument - fixest::etable() uses keep and drop arguments to select or drop variables based on regex patterns (see here). For compatibility reasons, there's an argument to rename coef_pat to keep, though the counterargument is that coef_pat in fact describes better what the argument actually does. Anyways, I am more than happy to follow your lead on this @Wenzhi-Ding =)

@Wenzhi-Ding
Copy link
Contributor

And btw, this syntax makes excellent sense to me =)

etable([fit1, fit2, fit3], coef_fmt="b (se) \n t [p]", coef_pat=["X1", "cep"])

Once implemented, I will add a similar function argument to both the coefplot() function and method. One thing to potentially consider is the naming of the function argument - fixest::etable() uses keep and drop arguments to select or drop variables based on regex patterns (see here). For compatibility reasons, there's an argument to rename coef_pat to keep, though the counterargument is that coef_pat in fact describes better what the argument actually does. Anyways, I am more than happy to follow your lead on this @Wenzhi-Ding =)

Thanks for merging the pull request!

I think keep and drop is more close to natural language. How about this following approach?

For example, first keep all coefficient names with numbers by keep=["\d"], and then drop those with "f" by drop=["f"]

etable([fit1, fit2, fit3], coef_fmt="b (se)\nt [p]", keep=['\d'], drop=["f"])
                           est1            est2               est3
------------  -----------------  --------------  -----------------
depvar                        Y               Y                  Y
------------------------------------------------------------------
X1            -0.950*** (0.067)   0.003 (0.033)  -1.000*** (0.060)
                -14.277 [0.000]   0.100 [0.921]    -16.700 [0.000]
X2            -0.174*** (0.018)  -0.014 (0.011)
                 -9.471 [0.000]  -1.332 [0.183]
------------------------------------------------------------------
f1                            x               x                  -
------------------------------------------------------------------
R2                        0.489               -              0.123
S.E. type                by: f1       by: f1+f2                iid
Observations               1994             997               1996
------------------------------------------------------------------

@matthiaskaeding
Copy link

Hi @matthiaskaeding, I draft a version for what you said. Please see whether this makes sense to you.

Drafted documentation:

coef_pat: list, optional
    The pattern for retaining coefficient names. Default is [], keeping all coefficients.
    You should use regular expressions to select coefficients.
    [
        "age",            # would keep all coefficients containing age
        "^tr",            # would keep all coefficients starting with tr
        "\d$",            # would keep all coefficients ending with number
    ]
    Output will be in the order of the patterns.
coef_pat_negate: bool, optional
    Whether to negate the pattern. Default is False.
    If True, the pattern will be negated, i.e., coefficients not matching the pattern will be kept.

Hi, nice, yes makes sense to me!
One suggestion for coef_pat: maybe for convenience you could additionaly allow a single string instead of a list. Because I would assume thats the most common case.

@Wenzhi-Ding
Copy link
Contributor

Hi @matthiaskaeding, I draft a version for what you said. Please see whether this makes sense to you.
Drafted documentation:

coef_pat: list, optional
    The pattern for retaining coefficient names. Default is [], keeping all coefficients.
    You should use regular expressions to select coefficients.
    [
        "age",            # would keep all coefficients containing age
        "^tr",            # would keep all coefficients starting with tr
        "\d$",            # would keep all coefficients ending with number
    ]
    Output will be in the order of the patterns.
coef_pat_negate: bool, optional
    Whether to negate the pattern. Default is False.
    If True, the pattern will be negated, i.e., coefficients not matching the pattern will be kept.

Hi, nice, yes makes sense to me! One suggestion for coef_pat: maybe for convenience you could additionaly allow a single string instead of a list. Because I would assume thats the most common case.

Thanks for your comments! I add support for single string input now. Here is the updated documentation. (I change the coef_pat and coef_pat_negate to keep and drop based on the previous discussion)

keep: str or list of str, optional
    The pattern for retaining coefficient names. You can pass a string (one
    pattern) or a list (multiple patterns). Default is keeping all coefficients.
    You should use regular expressions to select coefficients.
        "age",            # would keep all coefficients containing age
        r"^tr",           # would keep all coefficients starting with tr
        r"\\d$",          # would keep all coefficients ending with number
    Output will be in the order of the patterns.
drop: str or list of str, optional
    The pattern for excluding coefficient names. You can pass a string (one
    pattern) or a list (multiple patterns). Syntax is the same as for `keep`.
    Default is keeping all coefficients. Parameter `keep` and `drop` can be
    used simultaneously.

@s3alfisc
Copy link
Member Author

I have just merged @Wenzhi-Ding's second PR on this and will publish a new version to PyPi soon - I think we can close this PR for now? Or do you have any other suggestions? Thanks for all your efforts @Wenzhi-Ding , and thanks for the feedback @matthiaskaeding!

@s3alfisc
Copy link
Member Author

Version 0.16.0 is now on Pypi =)

@Wenzhi-Ding
Copy link
Contributor

Thanks for merging the code and publishing new version @s3alfisc! And thanks @matthiaskaeding for the great comments!

Please let me know if anything I can contribute. I will also keep close look at issues to see whether there is something I can pick up : )

@s3alfisc
Copy link
Member Author

Please let me know if anything I can contribute. I will also keep close look at issues to see whether there is something I can pick up : )

Hi Wenzhi, I updated #218 with many, many points on how to further improve pyfixest (and I've also send you an email!) =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants