added multinomial, ordinal and firth logistic regression #179

fqixiang · 2022-11-01T11:12:26Z

fixed jasp-stats/jasp-issues#202
fixed jasp-stats/jasp-issues#784
fixed jasp-stats/jasp-issues#1345

EJWagenmakers · 2022-11-01T11:25:27Z

Awesome :-)

Kucharssim

Looks fine, but has some rough edges. Namely:

AIC/BIC is missing for the new models, is that intentional?
Adding a term to the null model in multinomial model crashes.
Adding a weight to any analysis crashes.
Adding a term to the null model in firth regression crashes (but with a different error).

See attached .jasp file for reproducible examples
glm-bugs.jasp.zip

Otherwise the code looks ok, I am just a little confused about your .getWeightVariable() which does not seem necessary to me (and especially the combination with eval()). What is the reason for doing this, it looks a little over-engineered to me

Kucharssim · 2022-11-01T14:46:32Z

R/glmCommonFunctions.R

    nullModel <- stats::glm(nf,
                            family = familyLink,
                            data = dataset,
-                            weights = get(options$weights))
+                            weights = eval(.getWeightVariable(options$weights)))


Why is this necessary? Could you just pass the vector of values from the dataset?

Hi Simon, thanks for the quick review!

Yes the implementation has some rough edges. I haven't been able to load the module into JASP (see screenshot for error, any idea what happened?), so it wasn't easy for me to test it thoroughly. I'll work on the issues you helped to find out (thx!).

The reason for using .getWeightVariable() is to make the code cleaner. It used to be two if conditions (e.g. evaluating whether options[["weight"]] is equal to ""), so there was quite some repetition of code (especially after adding the three new logistic regression variants).

Passing the vector of values from the dataset would lead to error. The default (no weight is added) is an empty string, which glm does not accept. If a weight variable is used, it has to be added to the glm function as a variable (instead of a string), hence eval(call(get("variable name"))). I tried to simplify the implementations but this seems to be the only one that works.

The reason for using .getWeightVariable() is to make the code cleaner

Yes that's fine, I was just wondering about the implementation.

Passing the vector of values from the dataset would lead to error. The default (no weight is added) is an empty string, which glm does not accept. If a weight variable is used, it has to be added to the glm function as a variable (instead of a string), hence eval(call(get("variable name"))).

Yes, but how about something like

weights <- if(options[["weights"]] == "") NULL else dataset[[options[["weights"]]]] ... nullModel <- stats::glm(..., weights = weights)

That wouldn't work? :)

I haven't been able to load the module into JASP (see screenshot for error, any idea what happened?), so it wasn't easy for me to test it thoroughly.

Hm apart from the usual "check whether you use your personal GitHub PAT" or "click Clear installed modules and packages and try again", or "which JASP version are you using?" not sure, but if nothing helps perhaps it's something related to https://github.com/jasp-stats/INTERNAL-jasp/issues/2151 and needs to be fixed

weights <- if(options[["weights"]] == "") NULL else dataset[[options[["weights"]]]]
...
nullModel <- stats::glm(..., weights = weights)

Unfortunately, this didn't work. When weights isn't NULL, the glm function would somehow look for a variable called weights and of course, this variable doesn't exist. Tried some variations, same problem.

I think this probably has to do with how R is run behind JASP, because in a normal R session, I can use your suggested approach perfectly fine.

hmm okay, i see the issue.

Seems it it due to the following lines in stats::glm

mf <- match.call(expand.dots = FALSE) m <- match(c("formula", "data", "subset", "weights", "na.action", "etastart", "mustart", "offset"), names(mf), 0L) mf <- mf[c(1L, m)] mf$drop.unused.levels <- TRUE mf[[1L]] <- quote(stats::model.frame) mf <- eval(mf, parent.frame())

Where the model.frame() fails to find the appropriate object in the parent.frame().
What is kind of weird that stats::lm does the same thing, but supplying weights as a numeric vector works in our linear regression without issues:

jaspRegression/R/regressionlinear.R

Line 842 in fcb6de3

fit <- stats::lm(formula, data = dataset, weights = weights, x = TRUE)

@vandenman do you have an idea what could be going wrong here? I guess the workaround by @fqixiang works fine, but it irks me that it does not just work.

I'm not sure. These match.call constructions that end with eval(mf, parent.frame()) seem a bit unreliable to me. BAS has a similar problem where an object cannot be found, but only when you pass the formula as an object and not as a literal (see merliseclyde/BAS#56).

Just fixed the issue in BAS (finally!!!) and the problem/solution is discussed nicely in https://stackoverflow.com/questions/61164404/call-to-weight-in-lm-within-function-doesnt-evaluate-properly/61164660#61164660?newreg=04a2b71c6da04693a5b172a54a4a43b0

The problem is that BAS, lm and glmassume that the weights are in the same environment as the formula

here is a simple fix with lm that is now in BAS on GitHub (not CRAN)

`
data(UScrime, package = "MASS")
UScrime <- UScrime[, 1:5]

mylm = function(object) { modelform = as.formula(eval(object$call$formula, parent.frame())) environment(modelform) = environment() data = eval(object$call$data) weights = eval(object$call$weights) object = lm(formula = modelform, data = data, weights = weights) return(object) } crime.lm1 <- lm(formula = M ~ So + Ed + Po1 + Po2, data = UScrime) tmp1 = mylm(crime.lm1)

broken before

form = M ~ So + Ed + Po1 + Po2 crime.lm2 <- lm(formula = form, data = UScrime) tmp = mylm(crime.lm2) test that::expect_equal(coef(tmp), coef(tmp1))

`

The key lines are the
modelform = as.formula(eval(object$call$formula, parent.frame())); environment(modelform) = environment()

in the function to change the environment for the formula, data, and weights to the current environment

fqixiang · 2022-11-07T15:46:02Z

The unit tests succeeded for ubuntu but not for windows or macOS. Is this expected?

Kucharssim · 2022-11-08T13:04:19Z

The unit tests succeeded for ubuntu but not for windows or macOS. Is this expected?

Hopefully that will go away once we merge #185

if you rebase, it should work now

Kucharssim

Looks great! The issues are fixed, just two suggestions and we can merge it

Kucharssim · 2022-12-07T10:32:17Z

R/glmCommonFunctions.R

  ff <- .createGLMFormula(options, nullModel = FALSE)
  nf <- .createGLMFormula(options, nullModel = TRUE)


So if we do it like this, it should work without the workaround:

Suggested change

ff <- .createGLMFormula(options, nullModel = FALSE)

nf <- .createGLMFormula(options, nullModel = TRUE)

# make sure that the formula get the current environment (https://github.com/jasp-stats/jaspRegression/pull/179#discussion_r1017412764)

ff <- .createGLMFormula(options, nullModel = FALSE)

environment(ff) <- environment()

nf <- .createGLMFormula(options, nullModel = TRUE)

environment(nf) <- environment()

weights <- dataset[[options[["weights"]]]]

thanks. changed it.

Kucharssim · 2022-12-07T10:33:04Z

R/glmCommonFunctions.R

+                            weights = eval(.getWeightVariable(options$weights)))
+  }
+
+  if (options$family == "other") {


this should probably be else if

I think if is fine in this case.

Is there a preference for using else if?

for options$family? Could also be an else?
now you're doing

if (options$family != "other") { ... # code } if (options$family == "other") { ... # code }

but if the two code paths are mutually exclusive then this is more clear:

if (options$family != "other") { ... # code } else { # family == "other" ... # code }

in the original code it's not clear that the two code paths are mutually exclusive (since the first one may change the value of options$family).

true, I'll fix that. thx

Kucharssim

Looks good now!

The test failures on ubuntu are not related to this PR.

fqixiang requested a review from Kucharssim November 1, 2022 11:12

Kucharssim requested changes Nov 1, 2022

View reviewed changes

fqixiang force-pushed the addMultinomialOrdinalAndFirthLogisticRegression branch from eeb4d56 to a042ab8 Compare November 7, 2022 15:18

fqixiang requested a review from Kucharssim November 7, 2022 15:46

fqixiang added 3 commits December 1, 2022 16:58

added multinomial, ordinal and firth logistic regression

406a257

fixed unit test error

e9b4ac2

implemented suggested changes

5b78934

fqixiang force-pushed the addMultinomialOrdinalAndFirthLogisticRegression branch from a042ab8 to 5b78934 Compare December 1, 2022 15:59

Kucharssim approved these changes Dec 7, 2022

View reviewed changes

fqixiang added 2 commits December 9, 2022 11:48

defined environment for model formulas

3f0d866

fixed an if/else statement

1d6c846

Kucharssim approved these changes Dec 10, 2022

View reviewed changes

Kucharssim merged commit 2312c78 into jasp-stats:master Dec 10, 2022

fqixiang deleted the addMultinomialOrdinalAndFirthLogisticRegression branch December 15, 2022 09:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added multinomial, ordinal and firth logistic regression #179

added multinomial, ordinal and firth logistic regression #179

fqixiang commented Nov 1, 2022 •

edited

Loading

EJWagenmakers commented Nov 1, 2022

Kucharssim left a comment

Kucharssim Nov 1, 2022

fqixiang Nov 1, 2022 •

edited

Loading

Kucharssim Nov 2, 2022

fqixiang Nov 7, 2022

Kucharssim Nov 7, 2022

vandenman Nov 8, 2022

merliseclyde Nov 9, 2022 •

edited

Loading

fqixiang commented Nov 7, 2022

Kucharssim commented Nov 8, 2022 •

edited

Loading

Kucharssim left a comment

Kucharssim Dec 7, 2022

fqixiang Dec 9, 2022

Kucharssim Dec 7, 2022

fqixiang Dec 9, 2022

fqixiang Dec 9, 2022

vandenman Dec 9, 2022

fqixiang Dec 9, 2022

Kucharssim left a comment

		ff <- .createGLMFormula(options, nullModel = FALSE)
		nf <- .createGLMFormula(options, nullModel = TRUE)

added multinomial, ordinal and firth logistic regression #179

added multinomial, ordinal and firth logistic regression #179

Conversation

fqixiang commented Nov 1, 2022 • edited Loading

EJWagenmakers commented Nov 1, 2022

Kucharssim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fqixiang Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merliseclyde Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

broken before

fqixiang commented Nov 7, 2022

Kucharssim commented Nov 8, 2022 • edited Loading

Kucharssim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kucharssim left a comment

Choose a reason for hiding this comment

fqixiang commented Nov 1, 2022 •

edited

Loading

fqixiang Nov 1, 2022 •

edited

Loading

merliseclyde Nov 9, 2022 •

edited

Loading

Kucharssim commented Nov 8, 2022 •

edited

Loading