-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] marginal distribution with rake #80
Comments
Hi @EmanueleCeglia Thanks. |
Hi @talgalili I didn't know how to do. Thanks :) |
@talgalili Hi, sorry if I bother you. |
Hi @EmanueleCeglia Thanks! |
Hi @talgalili |
Hi @talgalili and @EmanueleCeglia , |
Thanks for this!
Could you please propose a PR for me to review?
…On Tue, 21 May 2024, 11:41 Roisin, ***@***.***> wrote:
Hi @talgalili <https://github.com/talgalili> and @EmanueleCeglia
<https://github.com/EmanueleCeglia> ,
We ran into the same issue recently and forked the repo with a fix - see
here: ***@***.***:ipfn:master
<Dirguis/ipfn@master...nestauk:ipfn:master>
It seems like this error occurs when using rake with pandas df when you
have only one instance of a particular feature category in your sample
dataframe.
If you have 1 row for a category, it gets converted into numpy array when
you .loc for that category. The error has something to do with this .loc
process going wrong with numpy array because of some kind of recursiveness
(?) I think.
—
Reply to this email directly, view it on GitHub
<#80 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHOJBQG7ABHRBLL55NUYN3ZDMQGDAVCNFSM6AAAAABHRLDUZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSGMZDMMZXGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Oh, I now see that this is a bug in ipfn (not in balance). I think it's possible to fix this issue in balance using a monkey patch. Like was done here: balance/balance/weighting_methods/ipw.py Line 32 in cf22b9f
(until ipfn fixes the issue) @crispy-wonton do you want to try a PR on adding this hack to balance? (or do you think it's easier to redirect the installation to just use your repo, WDYT?) |
Hi @talgalili @crispy-wonton thanks for your feedback. Now the only thing that I have to explore is why some categories are grouped together and so at the end they are not balanced.
|
Thank you for the update!
Your checks leave me confused. I don't understand why using both solutions
is the only thing that works.
Do you have any guesses?
…On Tue, 21 May 2024, 15:41 Emanuele Ceglia, ***@***.***> wrote:
Hi @talgalili <https://github.com/talgalili> @crispy-wonton
<https://github.com/crispy-wonton> thanks for your feedback.
I tried these combination:
1: remove categories that presents only one observation (and also related
margins) -> usual error
2: update ipfn.py file with recommended changes (keeping all categories)
-> usual error
3: update ipfn.py file and remove categories that presents only one
observation (and also related margins) -> works
Now the only thing that I have to explore is why some categories are
grouped together and so at the end they are not balanced.
INFO (2024-05-21 16:30:13,119) [rake/rake (line 154)]: Final covariates
and levels that will be used in raking: {'ctrysize': ['_lumped_other',
'DE4', 'DE3', 'DE2', 'FR4', 'DE1', 'IT1'], 'ctrysect': ['_lumped_other',
'ESC', 'DEB', 'FRC', 'ITC', 'DEC', 'DED']}.
image.png (view on web)
<https://github.com/facebookresearch/balance/assets/99983605/27680136-5a28-456f-a79b-9912fdecb8f0>
—
Reply to this email directly, view it on GitHub
<#80 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHOJBUQKEW6FK3Y2H66BKDZDNMJ5AVCNFSM6AAAAABHRLDUZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSG44TKOBXHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @talgalili here I am for few updates, the library now doesn't give me any error even if I am keeping those categories that present only one observation. In order to avoid _lumped_other (categories grouped together in a generic one) I also changed other parameters inside the library:
I still have a problem: I need to balance two categories inside my dataset: ctrysize and ctrysect but after the calibration only the first one is correctly balanced with the finals weights. |
While attempting to calibrate the margins of a sample derived from a survey (df dataframe), I encounter the error displayed at the end of the code flow.
The margins used for calibration are real totals of country x size and country x sector, in the same order as obtained through the command sorted(set(df['ctrysize/sect'])).
The text was updated successfully, but these errors were encountered: