Dimensionality_Reduction

9 Dimensionality Reduction

Reducing the high dimensions of the features/genes into low dimensions is called dimensionality reduction. MOG reduces the high-dimensional data to 2-D (two-dimensional) data that still maintains most of the information in the original data.

Two types of dimensionality reduction can be performed using MOG:

Principal component analysis (PCA)
t-distributed stochastic neighbor embedding (t-SNE)

9.1 Principal component analysis (PCA)

PCA is one of the most widely used unsupervised linear dimensionality reduction techniques. The principal component analysis can be performed to reduce the dimensions of large data into a smaller one without losing much information. A scatter plot is generated, which visualizes the principal components. All the scatter plot operations can be performed on the PCA scatter plot.

PCA can be performed in MOG with the following steps:

In the "Sample Metadata Table pane", select the required rows (each row represents a sample) in a list.
In the "Sample Metadata Table pane" menubar, select the "Analyze" option and choose the "Compute PCA" option to open up the Compute PCA dialog.
The compute PCA dialog has the following options:
1. Gene List: From the "Gene/Feature list", select the interested feature list to reduce its dimensions.
2. Normalize data (Mean center): Select the check box to normalize the data, i.e., adjust each feature to have zero mean.
Click Ok to perform and display the PCA scatter plot.

Note: If the data has a lot of variances, transform the data before performing PCA to get better results.

Figure 21: An example PCA scatter plot generated using MOG.

9.2 t-distributed stochastic neighbor embedding (t-SNE)

t-SNE is based on Stochastic Neighbor Embedding. It is an unsupervised non-linear dimensionality reduction and data visualization technique. MOG uses the Barnes-Hut version of t-SNE. Similar to PCA, t-SNE generates a scatter plot.

t-SNE dimensionality reduction can be performed in MOG with the following steps:

In the "Sample Metadata Table pane", select the required rows (each row represents a sample) in a list.
In the "Sample Metadata Table pane" menubar, select the "Analyze" option and choose the "Compute t-SNE" option to open up the Compute t-SNE dialog.
The compute t-SNE dialog has the following options:
1. Gene List: From the "Gene/Feature list", select the interested feature list to reduce its dimensions.
2. Perplexity: It is a guess about the number of close/nearest neighbors each sample has. The typical values of perplexity range between 5 and 50. Different values can result in significantly different results. Note: If the selected gene/feature list contains very few genes, you may have to give a small perplexity value(sometimes maybe 1 for 15-20 genes/features). The default or large perplexity value results in an error "Perplexity too large".
3. Maximum Iterations: Maximum number of iterations the t-SNE algorithm should run over the data.
4. Theta: Used to determine the summary node within a cluster. This parameter is used only in Barnes-Hut t-SNE version, which is implemented in MOG. A value less than 0.2 increases the computation time, and more than 0.8 typically increases error.
5. Use PCA: If the number of features is high, It is highly recommended to check this parameter. It reduces the initial dimensions using PCA and then uses those reduced dimensions in t-SNE to improve the computation time.
6. Run in parallel: It is highly recommended to check this parameter which runs the algorithm in parallel using multiple processors, thereby improving the computation time.
Click Ok to perform and display the t-SNE scatter plot.

Note: If the data has a lot of variances, transform the data before performing t-SNE to get better results. Each t-SNE run may give different results, but the structure of the data(clusters) is preserved.

Dimensionality_Reduction

9 Dimensionality Reduction

9.1 Principal component analysis (PCA)

9.2 t-distributed stochastic neighbor embedding (t-SNE)

Overview

1. Introduction

2. The BASICS

3. Open a Project

4. The Main MOG GUI

5. Sort, Subset, transform, analyze, and reorder the Data

6. Coexpression Analysis

7. Visualization

8. Differential Expression Analysis

9. Dimensionality Reduction

10. Create Your Own Projects

11. Interface to R

12. Change Project Properties and MOG Properties

13. Reproducibility

Clone this wiki locally