You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The reason will be displayed to describe this comment to others. Learn more.
@svteichman the if statement identifies if a vector of 1s is in the column space, however we do not know that the vector is in the first column (so X <- X[, -1, drop = FALSE] will not always work). I think we should have an iterated solution: step 1 tries to remove any columns that are an intercept and step 2 stops the code with an error, if that does not resolve the issue:
Also, important note: I mistakenly set digits to 1e-20 in the previous code. It should be a natural number less than 16 (roughly the number of digits stored in double-precision numerical calculations). 12 seems appropriate to me.
# first remove any columns that represent an intercept
if (all(round(X %*% solve(t(X) %*% X) %*% t(X) %*% rep(1, nrow(X)), 12) == 1)) {
X <- X[, rowMeans(apply(X, 1, function(v)(v == colMeans(X)))) != 1, drop = FALSE]
# can instead use X <- X[, -1, drop = FALSE] if we only want to try to identify classical intercept (vector of 1s in the first column)
}
# confirm that there is no longer an intercept in the column space
if (all(round(X %*% solve(t(X) %*% X) %*% t(X) %*% rep(1, nrow(X)), 12) == 1)) {
stop("An intercept is in the column space of X. This may happen if you have a
column of 1s or a column for every level in a categorical variable, for
example. Your design matrix cannot contain a vector of 1s in the column
space when you use raoBust.")
}
For testing, I constructed the following two examples that elucidate why this two-stage system may be useful:
n <- 20
X1 <- matrix(data = c(rbinom(n = n, size = 12, prob = 0.7),
rep(0.3, n),
rep(0, n),
rnorm(n = n, mean = -5, sd = 2)),
ncol = 4, byrow = FALSE)
X2 <- matrix(data = c(rep(1, n),
rbinom(n = n, size = 12, prob = 0.7),
c(rep(1, n / 2), rep(0, n / 2)),
c(rep(0, n / 2), rep(1, n / 2)),
rnorm(n = n, mean = -5, sd = 2)),
ncol = 5, byrow = FALSE)
This is all probably excessive, but I figured why not make the code more general!
The reason will be displayed to describe this comment to others. Learn more.
good point, I need to make sure I remove the correct column!
I think that we don't necessarily need the two stage check but I do like the generality to check for different ways that an intercept can be encoded. I'll fix this in another PR.
67409a9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svteichman the
if
statement identifies if a vector of 1s is in the column space, however we do not know that the vector is in the first column (soX <- X[, -1, drop = FALSE]
will not always work). I think we should have an iterated solution: step 1 tries to remove any columns that are an intercept and step 2 stops the code with an error, if that does not resolve the issue:Also, important note: I mistakenly set digits to 1e-20 in the previous code. It should be a natural number less than 16 (roughly the number of digits stored in double-precision numerical calculations). 12 seems appropriate to me.
For testing, I constructed the following two examples that elucidate why this two-stage system may be useful:
This is all probably excessive, but I figured why not make the code more general!
67409a9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I need to make sure I remove the correct column!
I think that we don't necessarily need the two stage check but I do like the generality to check for different ways that an intercept can be encoded. I'll fix this in another PR.