Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added multinomial, ordinal and firth logistic regression #179

Conversation

fqixiang
Copy link
Contributor

@fqixiang fqixiang commented Nov 1, 2022

@fqixiang fqixiang requested a review from Kucharssim November 1, 2022 11:12
@EJWagenmakers
Copy link

Awesome :-)

Copy link
Member

@Kucharssim Kucharssim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, but has some rough edges. Namely:

  1. AIC/BIC is missing for the new models, is that intentional?
  2. Adding a term to the null model in multinomial model crashes.
  3. Adding a weight to any analysis crashes.
  4. Adding a term to the null model in firth regression crashes (but with a different error).

See attached .jasp file for reproducible examples
glm-bugs.jasp.zip

Otherwise the code looks ok, I am just a little confused about your .getWeightVariable() which does not seem necessary to me (and especially the combination with eval()). What is the reason for doing this, it looks a little over-engineered to me

nullModel <- stats::glm(nf,
family = familyLink,
data = dataset,
weights = get(options$weights))
weights = eval(.getWeightVariable(options$weights)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Could you just pass the vector of values from the dataset?

Copy link
Contributor Author

@fqixiang fqixiang Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Simon, thanks for the quick review!

  1. Yes the implementation has some rough edges. I haven't been able to load the module into JASP (see screenshot for error, any idea what happened?), so it wasn't easy for me to test it thoroughly. I'll work on the issues you helped to find out (thx!).

  2. The reason for using .getWeightVariable() is to make the code cleaner. It used to be two if conditions (e.g. evaluating whether options[["weight"]] is equal to ""), so there was quite some repetition of code (especially after adding the three new logistic regression variants).

  3. Passing the vector of values from the dataset would lead to error. The default (no weight is added) is an empty string, which glm does not accept. If a weight variable is used, it has to be added to the glm function as a variable (instead of a string), hence eval(call(get("variable name"))). I tried to simplify the implementations but this seems to be the only one that works.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for using .getWeightVariable() is to make the code cleaner

Yes that's fine, I was just wondering about the implementation.

Passing the vector of values from the dataset would lead to error. The default (no weight is added) is an empty string, which glm does not accept. If a weight variable is used, it has to be added to the glm function as a variable (instead of a string), hence eval(call(get("variable name"))).

Yes, but how about something like

weights <- if(options[["weights"]] == "") NULL else dataset[[options[["weights"]]]] 
...
nullModel <- stats::glm(..., weights = weights)

That wouldn't work? :)

I haven't been able to load the module into JASP (see screenshot for error, any idea what happened?), so it wasn't easy for me to test it thoroughly.

Hm apart from the usual "check whether you use your personal GitHub PAT" or "click Clear installed modules and packages and try again", or "which JASP version are you using?" not sure, but if nothing helps perhaps it's something related to https://github.com/jasp-stats/INTERNAL-jasp/issues/2151 and needs to be fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights <- if(options[["weights"]] == "") NULL else dataset[[options[["weights"]]]]
...
nullModel <- stats::glm(..., weights = weights)

Unfortunately, this didn't work. When weights isn't NULL, the glm function would somehow look for a variable called weights and of course, this variable doesn't exist. Tried some variations, same problem.

I think this probably has to do with how R is run behind JASP, because in a normal R session, I can use your suggested approach perfectly fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm okay, i see the issue.

Seems it it due to the following lines in stats::glm

mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "weights", "na.action", 
             "etastart", "mustart", "offset"), names(mf), 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
mf <- eval(mf, parent.frame())

Where the model.frame() fails to find the appropriate object in the parent.frame().
What is kind of weird that stats::lm does the same thing, but supplying weights as a numeric vector works in our linear regression without issues:

fit <- stats::lm(formula, data = dataset, weights = weights, x = TRUE)

@vandenman do you have an idea what could be going wrong here? I guess the workaround by @fqixiang works fine, but it irks me that it does not just work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. These match.call constructions that end with eval(mf, parent.frame()) seem a bit unreliable to me. BAS has a similar problem where an object cannot be found, but only when you pass the formula as an object and not as a literal (see merliseclyde/BAS#56).

Copy link

@merliseclyde merliseclyde Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fixed the issue in BAS (finally!!!) and the problem/solution is discussed nicely in https://stackoverflow.com/questions/61164404/call-to-weight-in-lm-within-function-doesnt-evaluate-properly/61164660#61164660?newreg=04a2b71c6da04693a5b172a54a4a43b0

The problem is that BAS, lm and glmassume that the weights are in the same environment as the formula

here is a simple fix with lm that is now in BAS on GitHub (not CRAN)

`
data(UScrime, package = "MASS")
UScrime <- UScrime[, 1:5]

mylm = function(object) { 
                  modelform = as.formula(eval(object$call$formula, parent.frame()))
                  environment(modelform) = environment()
                  data = eval(object$call$data)
                  weights = eval(object$call$weights)

  object = lm(formula = modelform,
                      data = data,
                      weights = weights)
 return(object) }

crime.lm1 <- lm(formula = M ~ So + Ed + Po1 + Po2, data = UScrime)  
tmp1 = mylm(crime.lm1)

broken before

form = M ~ So + Ed + Po1 + Po2
crime.lm2 <- lm(formula = form, data = UScrime)  

tmp = mylm(crime.lm2)

test that::expect_equal(coef(tmp), coef(tmp1))

`

The key lines are the
modelform = as.formula(eval(object$call$formula, parent.frame())); environment(modelform) = environment()

in the function to change the environment for the formula, data, and weights to the current environment

@fqixiang fqixiang force-pushed the addMultinomialOrdinalAndFirthLogisticRegression branch from eeb4d56 to a042ab8 Compare November 7, 2022 15:18
@fqixiang
Copy link
Contributor Author

fqixiang commented Nov 7, 2022

The unit tests succeeded for ubuntu but not for windows or macOS. Is this expected?

@fqixiang fqixiang requested a review from Kucharssim November 7, 2022 15:46
@Kucharssim
Copy link
Member

Kucharssim commented Nov 8, 2022

The unit tests succeeded for ubuntu but not for windows or macOS. Is this expected?

Hopefully that will go away once we merge #185

if you rebase, it should work now

@fqixiang fqixiang force-pushed the addMultinomialOrdinalAndFirthLogisticRegression branch from a042ab8 to 5b78934 Compare December 1, 2022 15:59
Copy link
Member

@Kucharssim Kucharssim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! The issues are fixed, just two suggestions and we can merge it

Comment on lines 56 to 57
ff <- .createGLMFormula(options, nullModel = FALSE)
nf <- .createGLMFormula(options, nullModel = TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we do it like this, it should work without the workaround:

Suggested change
ff <- .createGLMFormula(options, nullModel = FALSE)
nf <- .createGLMFormula(options, nullModel = TRUE)
# make sure that the formula get the current environment (https://github.com/jasp-stats/jaspRegression/pull/179#discussion_r1017412764)
ff <- .createGLMFormula(options, nullModel = FALSE)
environment(ff) <- environment()
nf <- .createGLMFormula(options, nullModel = TRUE)
environment(nf) <- environment()
weights <- dataset[[options[["weights"]]]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. changed it.

weights = eval(.getWeightVariable(options$weights)))
}

if (options$family == "other") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be else if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if is fine in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a preference for using else if?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for options$family? Could also be an else?
now you're doing

if (options$family != "other") {
  ... # code
} 

if (options$family == "other") {
  ... # code
} 

but if the two code paths are mutually exclusive then this is more clear:

if (options$family != "other") {
  ... # code
} else { # family == "other"
  ... # code
} 

in the original code it's not clear that the two code paths are mutually exclusive (since the first one may change the value of options$family).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, I'll fix that. thx

Copy link
Member

@Kucharssim Kucharssim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now!

The test failures on ubuntu are not related to this PR.

@Kucharssim Kucharssim merged commit 2312c78 into jasp-stats:master Dec 10, 2022
@fqixiang fqixiang deleted the addMultinomialOrdinalAndFirthLogisticRegression branch December 15, 2022 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants