-
Notifications
You must be signed in to change notification settings - Fork 1
/
umap_clustering.Rd
64 lines (54 loc) · 2.49 KB
/
umap_clustering.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{umap_clustering}
\alias{umap_clustering}
\title{Perform UMAP dimensionality reduction and HDBSCAN clustering on copy number data}
\usage{
umap_clustering(
CNbins,
n_neighbors = 10,
min_dist = 0.1,
minPts = 30,
seed = NULL,
field = "copy",
umapmetric = "correlation",
hscn = FALSE,
pca = NULL
)
}
\arguments{
\item{CNbins}{A data frame containing copy number data. Must include columns
for 'cell_id' and the specified `field`.}
\item{n_neighbors}{Integer. The number of neighbors to consider in UMAP. Default is 10.}
\item{min_dist}{Numeric. The minimum distance between points in UMAP. Default is 0.1.}
\item{minPts}{Integer. The minimum number of points to form a cluster in HDBSCAN. Default is 30.}
\item{seed}{Integer or NULL. Random seed for reproducibility. Default is NULL.}
\item{field}{Character. The column name in `CNbins` to use for copy number values. Default is "copy".}
\item{umapmetric}{Character. The distance metric to use in UMAP. Default is "correlation".}
\item{hscn}{Logical. Whether to use haplotype-specific copy number data. Default is FALSE.}
\item{pca}{Integer or NULL. Number of principal components to use in UMAP. If NULL, pca not used, this is the default.}
}
\value{
A list containing:
\item{clustering}{A data frame with UMAP coordinates and cluster assignments for each cell.}
\item{hdbscanresults}{The results of the HDBSCAN clustering.}
\item{umapresults}{The results of the UMAP dimensionality reduction.}
\item{tree}{A phylogenetic tree object representing the hierarchical structure of the clusters.}
}
\description{
This function takes copy number data, performs UMAP dimensionality reduction,
and then applies HDBSCAN clustering to identify cell populations. It can handle
both standard copy number data and haplotype-specific copy number (HSCN) data.
}
\details{
The function performs the following steps:
1. Creates a copy number matrix from the input data.
2. Applies UMAP dimensionality reduction.
3. Performs HDBSCAN clustering on the UMAP results.
4. Generates a phylogenetic tree from the clustering results.
If `hscn` is TRUE, the function expects columns 'copy' and 'BAF' in `CNbins`,
and creates separate matrices for A and B alleles.
The function automatically adjusts `n_neighbors` if there are too few cells.
If UMAP fails, it attempts to rerun with small jitter added to the data points.
The function will reduce `minPts` if only one cluster is initially found.
}