Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convergence issues with larger datasets #2

Open
yjml opened this issue Aug 4, 2023 · 2 comments
Open

Convergence issues with larger datasets #2

yjml opened this issue Aug 4, 2023 · 2 comments

Comments

@yjml
Copy link

yjml commented Aug 4, 2023

I ran into an convergence issue trying SBW on a dataset of mine, which seems size related, once ≥N samples (dependent on the treatment distribution?), optweight fails to converge.

ow_test.zip

library(optweight)
library(WeightIt)

testdf = readRDS("ow_test.rds")

optweight(trt ~ age, data=testdf, estimand = "ATE")           # fails
optweight(trt ~ age, data=testdf[1:16765,], estimand = "ATE") # runs
optweight(trt ~ age, data=testdf[1:16766,], estimand = "ATE") # fails
weightit(trt ~ age, data=testdf, method = "ebal", estimand = "ATE")

data("lalonde", package = "cobalt")
set.seed(123)
lalonde_dup = lalonde[sample(nrow(lalonde), 50000, replace = T),]
optweight(treat ~ age + educ + married + nodegree + re74,             # original  
          data = lalonde, tols = .01, estimand = "ATE")               
optweight(treat ~ age + educ + married + nodegree + re74,             # fails
          data = lalonde_dup, tols = .01, estimand = "ATE")           
optweight(treat ~ age + educ + married + nodegree + re74,             # success
          data = lalonde_dup[1:14339,], tols = .01, estimand = "ATE") 
optweight(treat ~ age + educ + married + nodegree + re74,             # fails
          data = lalonde_dup[1:14340,], tols = .01, estimand = "ATE") 
optweight(treat ~ age,                                                # continues to fail despite simplification
          data = lalonde_dup[1:14340,], tols = .01, estimand = "ATE") 
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] WeightIt_0.14.2 optweight_0.2.5

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7       rstudioapi_0.14  magrittr_2.0.1   cobalt_4.5.1     tidyselect_1.1.1 munsell_0.5.0    colorspace_2.0-3 lattice_0.20-44 
 [9] R6_2.5.1         rlang_1.1.1      fansi_0.5.0      dplyr_1.0.7      tools_4.0.5      grid_4.0.5       gtable_0.3.1     osqp_0.6.0.8    
[17] utf8_1.2.2       cli_3.4.1        ellipsis_0.3.2   tibble_3.1.4     lifecycle_1.0.3  crayon_1.4.1     Matrix_1.3-4     purrr_0.3.4     
[25] ggplot2_3.4.2    vctrs_0.6.3      glue_1.4.2       compiler_4.0.5   pillar_1.6.2     chk_0.9.0        backports_1.4.1  generics_0.1.0  
[33] scales_1.2.1     pkgconfig_2.0.3 
@ngreifer
Copy link
Owner

ngreifer commented Aug 4, 2023

Thank you for this bug report. Unfortunately I am mystified. This is definitely a problem with the OSQP solver and nothing I can solve. For now, all I can recommend is using a different weighting method. Sorry about that. Do you get a similar problem when using the sbw package?

@yjml
Copy link
Author

yjml commented Aug 11, 2023

sbw with its default optimizer does successfully run (slowly) with higher sample counts, but it really likes its RAM (30K samples ~ 24GB RAM) and eventually will return the classic Error: cannot allocate vector of size ... once the count gets higher.

library(sbw)
testdf$ind = ifelse(testdf$trt=="B",1,0)
# success
sbw(testdf[1:16766,], ind = "ind", bal = list(bal_cov = c("age"), bal_alg = F, bal_tols = 0.01),
    par = list(par_est = "ate"))
# success
sbw(testdf[1:30000,], ind = "ind", bal = list(bal_cov = c("age"), bal_alg = F, bal_tols = 0.01),
    par = list(par_est = "ate"))
# 'Error: cannot allocate vector of size ...'
sbw(testdf, ind = "ind", bal = list(bal_cov = c("age"), bal_alg = F, bal_tols = 0.01),
    par = list(par_est = "ate"))

# success
sbw(lalonde_dup[1:14340,], ind = "treat", bal = list(bal_cov = c("age"), bal_alg = F, bal_tols = 0.01),
    par = list(par_est = "ate"))
# success
sbw(lalonde_dup[1:30000,], ind = "treat", bal = list(bal_cov = c("age"), bal_alg = F, bal_tols = 0.01),
    par = list(par_est = "ate"))
# 'Error: cannot allocate vector of size ...'
sbw(lalonde_dup, ind = "treat", bal = list(bal_cov = c("age"), bal_alg = F, bal_tols = 0.01),
    par = list(par_est = "ate"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants