Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very long running time for survival on a static regime #9

Open
osofr opened this issue May 1, 2014 · 3 comments
Open

Very long running time for survival on a static regime #9

osofr opened this issue May 1, 2014 · 3 comments

Comments

@osofr
Copy link

osofr commented May 1, 2014

ltmle() takes too much time to run for end of follow-up time-point survival on a static regime (no SuperLearner). Full data run takes about 80min and 20GB of RAM. It appears most of the time is spent in ConvertCensoringNodesToBinary(), CleanData() and XMatch() functions and the running time is approximately linear in N (5K subsample takes 8min). My goal is to eventually run MSMs on survival at each of 17 time points with SuperLearning, which would take too long with current run times. Coding censoring variables as factors (per documentation) or as binaries has no effect on performance.

Any advice on what could be causing this slow down and how to fix it? Unfortunately, I can't share the data, but would be happy to run any tests / give more details. See below detailed description of the problem and some ltmle.R profiling results.

Thanks,
Oleg

Data:

50K observations (N), 17 time points (t), 60 baseline covariates (W), 35 time-dep covariates (L_t), time-dep treatment (A_t), 3 types of censoring (Ct_1,Ct_2,Ct_3), survival outcome (Y_t)

Modeling:

Models Q_t and g_t are specified for each time point (one Q_t model for each LY block), both depend only on baseline and previous time-point covariates

Running ltmle:

Running tlmle() function with a static regime, abar=(1,...,1) and no stratification, setstrat=FALSE.

ltmle_out <- ltmle(data=DataWide_subsamp, Anodes=Anodes, Cnodes=Cnodes, Lnodes=Lnodes_1_tmax, Ynodes=Ynodes, abar=rep(1, 17), estimate.time=FALSE, survivalOutcome=TRUE, stratify=setstrat, iptw.only=FALSE, gform=gform_vec, Qform=Qform_vec)

Profiling ltmle.R by line on a subsample of 5K observations:

(8 min run time)

Rprof("ltmle_run_memuse_line.out", line.profiling=TRUE)
source("./ltmle/R/ltmle.R")
ltmle_out <- ltmle(....)
Rprof(NULL)
summaryRprof("ltmle_run_memuse_line.out", lines = "show")

summaryRprof("ltmle_run_memuse_line.out", lines = "show")
$by.self

line self.time self.pct total.time total.pct
ltmle.R#1525 123.70 26.60 130.76 28.12
ltmle.R#1212 121.32 26.09 121.32 26.09
ltmle.R#1206 83.28 17.91 84.40 18.15
ltmle.R#997 46.66 10.03 46.66 10.03
ltmle.R#893 39.58 8.51 39.58 8.51
ltmle.R#1026 15.74 3.38 15.74 3.38
# Noam Ross proftable function
source("proftable.R")
proftable("ltmle_run_memuse_line.out", lines=40)

ConvertCensoringNodesToBinary > 1#1525 > [<- > [<-.data.frame
CleanData > 1#1212 > [ > [.data.frame
CleanData > 1#1212 > [ > [.data.frame
CleanData > 1#1222 > is.na.strict > 1#1206 > [ > [.data.frame
EstimateG > 1#797 > Estimate > 1#851 > SuppressGivenWarnings > 3#20 > withCallingHandlers > 1#852 > > 1#893 > glm > eval > glm.fit
XMatch > 1#997 > apply
IsDeterministic > 1#925 > XMatch > 1#997 > apply
Estimate > 1#845 > ConvertCensoringNodesToBinary > 1#1525
EstimateG > 1#771 > SetA > 1#1026 > [<- > [<-.data.frame > split > split.default > as.factor > factor > unique > unique.matrix > apply > FUN > paste
CleanData > 1#1212 > [ > [.data.frame > [.factor > NextMethod
CleanData > 1#1212 > is.na > is.na.data.frame > do.call > cbind

@joshuaschwab
Copy link
Owner

Hi Oleg,

Sorry it's taking a long time to run. I'm not sure how much I can help right now. It seems like even if we speed up ltmle (without SuperLearner), once you start calling SuperLearner, I would guess that SuperLearner is going to take most of the time. You're going to call SuperLearner at least 5 times per time point (3 C nodes, 1 A node, 1 or more LY nodes), so I would think that 85+ calls to SuperLearner with n=50k and 95 columns is going to be much slower than all the rest of the ltmle code.

Nonetheless, if you want to try to speed up ltmle, you can just take out CleanData (assuming your data already conforms - it does if you're not seeing the "Note: for internal purposes, all nodes after a censoring event..." message). CleanData takes about 16 mins on my (old) computer. It looks to me like ConvertCensoringNodesToBinary takes less than one second, I don't know why the profiler is saying it's taking a lot of time. XMatch gets called a lot, so I'm not sure there's an easy fix there.

Here's the code I used to try to replicate the speed issues:

set.seed(1)
max.time <- 17
n <- 50000
num.w <- 60
num.l <- 35
W <- matrix(rnorm(n * num.w), nrow=n)
data <- data.frame(W=W)
Y <- rep(0, n)
cens <- rep(F, n)
died <- rep(F, n)
for (t in 1:max.time) {
  L <- matrix(rnorm(n * num.l), nrow=n)
  L[Y==1 | cens, ] <- NA
  C <- matrix(rbinom(n * 3, size=1, prob=0.99), nrow=n)
  C[Y==1 | cens, ] <- NA
  for (i in 1:3) {
    C[cens, i] <- NA
    cens <- cens | (!C[, i] & !is.na(C[, i]))
  }
  A <- rbinom(n, size=1, prob=0.5)
  C[Y==1] <- NA
  A[Y==1 | cens] <- NA
  Y <- as.numeric(rbinom(n, size=1, prob=0.05) | died) #problem when Y is already NA - probably similar problem with C, maybe use ua
  died <- Y==1
  Y[cens] <- NA
  data <- data.frame(data, data.frame(L=L, C=C, A=A, Y=Y))
}

Anodes <- grep("^A", names(data))
Cnodes <- grep("^C", names(data))
Lnodes <- grep("^L", names(data))
Ynodes <- grep("^Y", names(data))
nodes <- ltmle:::CreateNodes(data, Anodes, Cnodes, Lnodes, Ynodes)

data <- ltmle:::ConvertCensoringNodes(data, Cnodes, has.deterministic.functions=F)
print(system.time(temp <- ltmle:::ConvertCensoringNodesToBinary(data, Cnodes)))
   user  system elapsed
  0.696   0.080   0.775
print(system.time(temp <- ltmle:::CleanData(data, nodes, deterministic.Q.function=NULL, survivalOutcome=T, showMessage=T)))
   user  system elapsed
829.209 134.643 963.671

Josh


From: osofr [email protected]
To: joshuaschwab/ltmle [email protected]
Sent: Thursday, May 1, 2014 1:01 PM
Subject: [ltmle] Very long running time for survival on a static regime (#9)

ltmle() takes too much time to run for end of follow-up time-point survival on a static regime (no SuperLearner). Full data run takes about 80min and 20GB of RAM. It appears most of the time is spent in ConvertCensoringNodesToBinary(), CleanData() and XMatch() functions and the running time is approximately linear in N (5K subsample takes 8min). My goal is to eventually run MSMs on survival at each of 17 time points with SuperLearning, which would take too long with current run times. Coding censoring variables as factors (per documentation) or as binaries has no effect on performance.
Any advice on what could be causing this slow down and how to fix it? Unfortunately, I can't share the data, but would be happy to run any tests / give more details. See below detailed description of the problem and some ltmle.R profiling results.
Thanks,
Oleg
Data:
50K observations (N), 17 time points (t), 60 baseline covariates (W), 35 time-dep covariates (L_t), time-dep treatment (A_t), 3 types of censoring (Ct_1,Ct_2,Ct_3), survival outcome (Y_t)
Modeling:
Models Q_t and g_t are specified for each time point (one Q_t model for each LY block), both depend only on baseline and previous time-point covariates
Running ltmle:
Running tlmle() function with a static regime, abar=(1,...,1) and no stratification, setstrat=FALSE.
ltmle_out <- ltmle(data=DataWide_subsamp, Anodes=Anodes, Cnodes=Cnodes, Lnodes=Lnodes_1_tmax, Ynodes=Ynodes, abar=rep(1, 17), estimate.time=FALSE, survivalOutcome=TRUE, stratify=setstrat, iptw.only=FALSE, gform=gform_vec, Qform=Qform_vec)
Profiling ltmle.R by line on a subsample of 5K observations:
(8 min run time)
Rprof("ltmle_run_memuse_line.out", line.profiling=TRUE)
source("./ltmle/R/ltmle.R")
ltmle_out <- ltmle(....)
Rprof(NULL)
summaryRprof("ltmle_run_memuse_line.out", lines = "show")
summaryRprof("ltmle_run_memuse_line.out", lines = "show")

$by.self
line
self.time
self.pct
total.time
total.pct
ltmle.R#1525 123.70 26.60 130.76 28.12
ltmle.R#1212 121.32 26.09 121.32 26.09
ltmle.R#1206 83.28 17.91 84.40 18.15
ltmle.R#997 46.66 10.03 46.66 10.03
ltmle.R#893 39.58 8.51 39.58 8.51
ltmle.R#1026 15.74 3.38 15.74 3.38

Noam Ross proftable function

source("proftable.R")
proftable("ltmle_run_memuse_line.out", lines=40)
ConvertCensoringNodesToBinary > 1#1525 > [<- > [<-.data.frame
CleanData > 1#1212 > [ > [.data.frame
CleanData > 1#1212 > [ > [.data.frame
CleanData > 1#1222 > is.na.strict > 1#1206 > [ > [.data.frame
EstimateG > 1#797 > Estimate > 1#851 > SuppressGivenWarnings > 3#20 > withCallingHandlers > 1#852 > > 1#893 > glm > eval > glm.fit
XMatch > 1#997 > apply
IsDeterministic > 1#925 > XMatch > 1#997 > apply
Estimate > 1#845 > ConvertCensoringNodesToBinary > 1#1525
EstimateG > 1#771 > SetA > 1#1026 > [<- > [<-.data.frame > split > split.default > as.factor > factor > unique > unique.matrix > apply > FUN > paste
CleanData > 1#1212 > [ > [.data.frame > [.factor > NextMethod
CleanData > 1#1212 > is.na > is.na.data.frame > do.call > cbind

Reply to this email directly or view it on GitHub.

@osofr
Copy link
Author

osofr commented May 2, 2014

Hi Josh,

Thanks for a thoughtful reply. ConvertCensoringNodesToBinary is definitely a big bottleneck on my dataset, so its clearly something specific to the data I am working with. I will try to simulate this scenario to see if I can replicate the profiler results from the actual data.

A somewhat unrelated note. Memoise function is applied to a glm object, which stores tons of unnecessary information (including the entire dataset). The only thing that is needed, if I understand it correctly, is the result of predict.glm for a given design matrix, which is just a vector. Isn't it possible to wrap glm and predict into one function that returns prediction vector and memoise that instead?

Also, a question. How easy is it to parallelize the SuperLearner? Same thing for the ltmle package, for example in MSM estimation, how easy do you think it would be to parallelize estimation for each survival time point? I have access to a server with a lot of cores so parallelizing could give a big boost in performance.

Thanks,
Oleg

@joshuaschwab
Copy link
Owner

Hi Oleg,

Memoise doesn't apply to ltmle, only to ltmleMSM. But if you're using ltmleMSM, I agree that the memoise section is not well written - it's just a temporary hack. I'm planning on removing memoise entirely in a future release - it shouldn't be needed if I rewrite a few other functions to reuse g.

SuperLearner has some parallelized versions - mcSuperLearner and snowSuperLearner - see ?SuperLearner. I haven't used them, but it looks like you could make a minor change to ltmle:::Estimate to have them called. Parellelizing ltmleMSM would take a little work, but is doable. You'd want to parrellize the final.Ynodes loop in MainCalcs (if using the pooled MSM) or NonpooledMSM (if not). But I would guess that if you get all of the available cores working on SuperLearner, that's going to be 90% of the speed benefit.

I haven't used the ltmle package on datasets as large as yours, so I'm glad you're trying it out and identifying things to improve.

thanks,
Josh


From: osofr [email protected]
To: joshuaschwab/ltmle [email protected]
Cc: joshuaschwab [email protected]
Sent: Friday, May 2, 2014 12:44 PM
Subject: Re: [ltmle] Very long running time for survival on a static regime (#9)

Hi Josh,
Thanks for a thoughtful reply. ConvertCensoringNodesToBinary is definitely a big bottleneck on my dataset, so its clearly something specific to the data I am working with. I will try to simulate this scenario to see if I can replicate the profiler results from the actual data.
A somewhat unrelated note. Memoise function is applied to a glm object, which stores tons of unnecessary information (including the entire dataset). The only thing that is needed, if I understand it correctly, is the result of predict.glm for a given design matrix, which is just a vector. Isn't it possible to wrap glm and predict into one function that returns prediction vector and memoise that instead?
Also, a question. How easy is it to parallelize the SuperLearner? Same thing for the ltmle package, for example in MSM estimation, how easy do you think it would be to parallelize estimation for each survival time point? I have access to a server with a lot of cores so parallelizing could give a big boost in performance.
Thanks,
Oleg

Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants