Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for data weights #81

Open
TonisOrmisson opened this issue Sep 3, 2024 · 5 comments
Open

Support for data weights #81

TonisOrmisson opened this issue Sep 3, 2024 · 5 comments
Labels
enhancement ✨ New feature or request help wanted ❤️ Extra attention is needed
Milestone

Comments

@TonisOrmisson
Copy link

Describe the new feature
An option to use weight column for any calculations to use data weights.

I do not find the option in source code, but I found an existing question here:
https://stackoverflow.com/questions/71224873/how-to-use-weight-with-the-package-crsosstable-for-r

@TonisOrmisson TonisOrmisson added the enhancement ✨ New feature or request label Sep 3, 2024
@TonisOrmisson
Copy link
Author

I am very new to R, so I not in a position to make a PR for that yet, but in essence using weights would mean that instead of using counts of rows, sums of weight variable values of respective data rows should be used to get the "weighted" results.

@DanChaltiel
Copy link
Owner

DanChaltiel commented Sep 3, 2024

Hi Tonis,

As of the current state, I'm not sure how to support weights in crosstable.

All variables are described using the funs argument which should be a list of summary functions (see the doc).
Summary functions as currently implemented take only one argument - the current column being described - and cannot access other columns.
In this context, I don't know how to pass the weight argument to any function, including for instance weighted.mean().

You can work this around by using a fixed reference to the weights, but it won't work with by=, as the length of the weight vector would become wrong.
Example:

library(tidyverse)
library(crosstable)

a = mtcars2 %>% 
  mutate(w = ifelse(am=="manual", 1, 2))

#works
a %>% 
  crosstable(where(is.numeric), 
             funs=c(mean=~mean(.x), weighted.mean = ~weighted.mean(.x, w=a$w)))

#doesn't work
a %>% 
  crosstable(where(is.numeric), by=vs, 
             funs=c(mean=~mean(.x), weighted.mean = ~weighted.mean(.x, w=a$w)))

If you (or anyone) have an idea of how I could implement this, I'd gladly add the feature.

EDIT

I understood your question as it was about numerical variables.

For categorical variables, you could use tidyr::uncount() to duplicate rows depending on a weighting variable:

a %>% 
  crosstable(am)
a %>% 
  uncount(w) %>% 
  crosstable(am)

Obviously, that will only work if your weighting variable is integer.
I have even less clue on how to implement that feature directly in crosstable, to be honest.

If you know of a package that does that, please let me know.

@TonisOrmisson
Copy link
Author

Thanks for looking into it!

I am testing multiple crosstab packages to see the one more suitable, here is one that has weights implemented:

https://github.com/gdemin/expss

@DanChaltiel
Copy link
Owner

On GitHub, it's always a good idea to accompany a statement with code 😉
Please show a code that does what you want using expss. Using the reprex package is also highly recommended.

@TonisOrmisson
Copy link
Author

I have not tried the expss with weights yet, its just what they state on their package page and what I noticed here, I have yet no Idea if this actually works, sry :)

@DanChaltiel DanChaltiel added the help wanted ❤️ Extra attention is needed label Sep 29, 2024
@DanChaltiel DanChaltiel added this to the Nice to have milestone Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ✨ New feature or request help wanted ❤️ Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants