You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When working with msqrob2, particularly with the msqrobLm function on single-cell data, a significant number of features produce a fitError object. This issue arise from the high presence of missing data. Upon investigation, it became apparent that these non-estimated features were those where one of the reference levels of the factor variable involved in the formula had no observations. This creates a problem due to how msqrob2 handles data subsetting before modeling. Specifically, msqrob2 subsets the design matrix by removing all rows and columns without observations for the current feature. However, it cannot subset out the reference level since it is "buried" within the model formulation. Consequently, the resulting model becomes rank-deficient, rendering it impossible to fit with methods like MASS::rlm.
To understand how other tools handle this, we examined limma and found that it does not manage these special cases. Instead, limma returns all models, even if they are rank-deficient. This means that, for features with no observations in the reference level, the reference level changes during modeling. This shift leads to confusing and potentially misleading interpretations of the associated coefficients.
The Fix
To address the issue of rank deficiency, we adopted a different approach to subset the model matrix. Instead of subsetting columns of the design matrix based on the missingness of observations, we propose subsetting the columns by selecting the first x columns according to the sorting of the columns based on their pivoting rank, where x represents the rank of the model. To determine x and the pivoting rank of the columns, we perform a QR decomposition. This approach allows us to fit features that lack observations in their reference levels.
However, the coefficients of these problematic features may still pose challenges because the interpretation may change for some parameters associated with shifts in their reference levels. To manage this, we introduced a new slot in the params slot of the StatModel object called "referencePresent". This new slot contains information for each parameter, indicating whether its reference level changed during modeling.
This information is then utilized by other functions in the package, such as getContrast and varContrast. These functions now include the option to return only the contrasts involving parameters whose reference levels did not change. This update is also reflected in higher-level functions like hypothesisTest or topFeatures, which can now be used to return only the feature contrasts that do not involve these reference changes. If users still wish to obtain contrasts for these problematic cases, a new column indicating whether the reference has changed will be included in the output table of the results.
The text was updated successfully, but these errors were encountered:
Also pasting a relevant section from Léopold's report:
This [PR] is not perfect as it remains limited; specifically, it does not apply when ridge regression is used. In addition, the solution could be further optimized, as it currently demands significant resources. An initial optimization effort was made, but further refinement is needed to deliver a solution ready for deployment to users.
The Issue
When working with msqrob2, particularly with the msqrobLm function on single-cell data, a significant number of features produce a fitError object. This issue arise from the high presence of missing data. Upon investigation, it became apparent that these non-estimated features were those where one of the reference levels of the factor variable involved in the formula had no observations. This creates a problem due to how msqrob2 handles data subsetting before modeling. Specifically, msqrob2 subsets the design matrix by removing all rows and columns without observations for the current feature. However, it cannot subset out the reference level since it is "buried" within the model formulation. Consequently, the resulting model becomes rank-deficient, rendering it impossible to fit with methods like MASS::rlm.
To understand how other tools handle this, we examined limma and found that it does not manage these special cases. Instead, limma returns all models, even if they are rank-deficient. This means that, for features with no observations in the reference level, the reference level changes during modeling. This shift leads to confusing and potentially misleading interpretations of the associated coefficients.
The Fix
To address the issue of rank deficiency, we adopted a different approach to subset the model matrix. Instead of subsetting columns of the design matrix based on the missingness of observations, we propose subsetting the columns by selecting the first x columns according to the sorting of the columns based on their pivoting rank, where x represents the rank of the model. To determine x and the pivoting rank of the columns, we perform a QR decomposition. This approach allows us to fit features that lack observations in their reference levels.
However, the coefficients of these problematic features may still pose challenges because the interpretation may change for some parameters associated with shifts in their reference levels. To manage this, we introduced a new slot in the params slot of the StatModel object called "referencePresent". This new slot contains information for each parameter, indicating whether its reference level changed during modeling.
This information is then utilized by other functions in the package, such as getContrast and varContrast. These functions now include the option to return only the contrasts involving parameters whose reference levels did not change. This update is also reflected in higher-level functions like hypothesisTest or topFeatures, which can now be used to return only the feature contrasts that do not involve these reference changes. If users still wish to obtain contrasts for these problematic cases, a new column indicating whether the reference has changed will be included in the output table of the results.
The text was updated successfully, but these errors were encountered: