Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle more than one factor variables (How to simulate missings) #58

Open
rabea-a opened this issue Jul 8, 2020 · 2 comments
Open

handle more than one factor variables (How to simulate missings) #58

rabea-a opened this issue Jul 8, 2020 · 2 comments

Comments

@rabea-a
Copy link

rabea-a commented Jul 8, 2020

Hello @imkemayer,
it's me again, sorry! I think your temporary fix to handle factors fails for more than one factor variable. Below you see a minimal example which causes an error:

set.seed(1)
x1 <- as.factor(sample(c("A","B"), 20, replace = TRUE, prob = c(0.75, 0.25)))
x2 <- as.factor(sample(c("A","B"), 20, replace = TRUE, prob = c(0.75, 0.25)))
x3 <- rnorm(20, mean = 5)
x4 <- rnorm(20, mean = -5)

x <- data.frame(x1,x2,x3,x4)
data_NA <- produce_NA(data = x, mechanism = "MAR", perc.missing = 0.2, 
                      idx.incomplete = c(0, 1, 0, 0), idx.covariates = c(1, 0, 1, 0))

I think the case distinction of the two if queries is not necessary and should be removed. Instead, insert

orig.data <- data
vars_factor <- colnames(data)[!sapply(data, is.numeric)]
# if (length(vars_factor)==1){
#   levels_factor <- list(gdata::mapLevels(x=data[,vars_factor]))
# }
# if (length(vars_factor)>1){
#   levels_factor <- sapply(data[,vars_factor], FUN = function(x) gdata::mapLevels(x=x))
# }
levels_factor <- list(gdata::mapLevels(x=data[,vars_factor]))

for all length of vars_factor.
Furthermore, the correct indexing needs be used in lines 466 and 467 afterwards (also you have to check in case of by.patterns==TRUE):

#gdata::mapLevels(x=tmp$data.init[,vars_factor[[i]]]) <- levels_factor[[i]]
#gdata::mapLevels(x=tmp$data.incomp[,vars_factor[[i]]]) <- levels_factor[[i]]
gdata::mapLevels(x=tmp$data.init[,vars_factor[[i]]]) <- levels_factor[[1]][[i]]
gdata::mapLevels(x=tmp$data.incomp[,vars_factor[[i]]]) <- levels_factor[[1]][[i]]

I look forward to hearing your opinion of my thoughts.

Best,
Rabea.

@rabea-a
Copy link
Author

rabea-a commented Jul 8, 2020

I refer to

# temporary fix to handle factors
# (transform them to numeric and revert the conversion in the end)
orig.data <- data
vars_factor <- colnames(data)[!sapply(data, is.numeric)]
if (length(vars_factor)==1){
levels_factor <- list(gdata::mapLevels(x=data[,vars_factor]))
}
if (length(vars_factor)>1){
levels_factor <- sapply(data[,vars_factor], FUN = function(x) gdata::mapLevels(x=x))
}
. Sorry, in the previous comment the lines weren't assigned as I expected.

@rabea-a rabea-a changed the title handle more than one factor variables handle more than one factor variables (How to simulate missings) Jul 8, 2020
@rabea-a
Copy link
Author

rabea-a commented Jul 10, 2020

Hello,
I have to correct my suggestion,- sorry!!
Unfortunately, I haven't noticed that my solution does not work for data with only one factor variable. I would suggest to do a case distinction in lines 466 and 467:

if (length(vars_factor) > 0){
          for (i in 1:length(vars_factor)){
            tmp$data.init[,vars_factor[[i]]] <- as.factor(tmp$data.init[,vars_factor[[i]]])
            tmp$data.incomp[,vars_factor[[i]]] <- as.factor(tmp$data.incomp[,vars_factor[[i]]])
            
            if(length(vars_factor) == 1){
              gdata::mapLevels(x=tmp$data.init[,vars_factor[[i]]]) <- levels_factor[[i]]
              gdata::mapLevels(x=tmp$data.incomp[,vars_factor[[i]]]) <- levels_factor[[i]]
            }else{
              gdata::mapLevels(x=tmp$data.init[,vars_factor[[i]]]) <- levels_factor[[1]][[i]]
              gdata::mapLevels(x=tmp$data.incomp[,vars_factor[[i]]]) <- levels_factor[[1]][[i]]
            }
            
          }
        }

Greetings,
Rabea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant