Due to data scale differences between multiple omics data, a model constructed from a training data tends to have poor prediction power on a validation data. While the usual bioinformatics approach is to re-normalise both the training and the validation data, this step may not be possible due to ethics constrains. CPOP avoids re-normalisation of additional data through the use of log-ratio features and thus also enable prediction for single omics samples.
The novelty of the CPOP procedure lies in its ability to construct a transferable model across gene expression platforms and for prospective experiments. Such a transferable model can be trained to make predictions on independent validation data with an accuracy that is similar to a re-substituted model. The CPOP procedure also has the flexibility to be adapted to suit the most common clinical response variables, including linear response, binomial and Cox PH models.
See https://sydneybiox.github.io/CPOP/articles/CPOP.html.
devtools::install_github("sydneybiox/CPOP")
Wang, K.Y.X., Pupo, G.M., Tembe, V. et al. Cross-Platform Omics Prediction procedure: a statistical machine learning framework for wider implementation of precision medicine. npj Digit. Med. 5, 85 (2022). https://doi.org/10.1038/s41746-022-00618-5