Skip to content

Commit

Permalink
use grant's more general way to check for an intercept column
Browse files Browse the repository at this point in the history
  • Loading branch information
svteichman committed Jul 31, 2024
1 parent f948fff commit 67409a9
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions R/multinom_test.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ covariates in formula must be provided.")
X <- model.matrix(formula, data)
}

# if X has intercept column, remove it
if (sum(X[, 1] == 1) == nrow(X)) {
# if X has intercept column (or a column of repeating values), remove it
if (all(round((X %*% solve(t(X) %*% X) %*% t(X)) %*% rep(1, nrow(X)), 1e-20) == 1)) {
X <- X[, -1, drop = FALSE]
}

Expand Down

2 comments on commit 67409a9

@gthopkins
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svteichman the if statement identifies if a vector of 1s is in the column space, however we do not know that the vector is in the first column (so X <- X[, -1, drop = FALSE] will not always work). I think we should have an iterated solution: step 1 tries to remove any columns that are an intercept and step 2 stops the code with an error, if that does not resolve the issue:

Also, important note: I mistakenly set digits to 1e-20 in the previous code. It should be a natural number less than 16 (roughly the number of digits stored in double-precision numerical calculations). 12 seems appropriate to me.

# first remove any columns that represent an intercept
if (all(round(X %*% solve(t(X) %*% X) %*% t(X) %*% rep(1, nrow(X)), 12) == 1)) {
  X <- X[, rowMeans(apply(X, 1, function(v)(v == colMeans(X)))) != 1, drop = FALSE]
  # can instead use X <- X[, -1, drop = FALSE] if we only want to try to identify classical intercept (vector of 1s in the first column)
}

# confirm that there is no longer an intercept in the column space
if (all(round(X %*% solve(t(X) %*% X) %*% t(X) %*% rep(1, nrow(X)), 12) == 1)) {
  stop("An intercept is in the column space of X. This may happen if you have a
       column of 1s or a column for every level in a categorical variable, for
       example. Your design matrix cannot contain a vector of 1s in the column
       space when you use raoBust.")
} 

For testing, I constructed the following two examples that elucidate why this two-stage system may be useful:

n <- 20

X1 <- matrix(data = c(rbinom(n = n, size = 12, prob = 0.7),
                      rep(0.3, n),
                      rep(0, n),
                      rnorm(n = n, mean = -5, sd = 2)),
             ncol = 4, byrow = FALSE)

X2 <- matrix(data = c(rep(1, n),
                      rbinom(n = n, size = 12, prob = 0.7),
                      c(rep(1, n / 2), rep(0, n / 2)),
                      c(rep(0, n / 2), rep(1, n / 2)),
                      rnorm(n = n, mean = -5, sd = 2)),
             ncol = 5, byrow = FALSE)

This is all probably excessive, but I figured why not make the code more general!

@svteichman
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I need to make sure I remove the correct column!

I think that we don't necessarily need the two stage check but I do like the generality to check for different ways that an intercept can be encoded. I'll fix this in another PR.

Please sign in to comment.