man/umap_clustering.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{umap_clustering}
\alias{umap_clustering}
\title{Perform UMAP dimensionality reduction and HDBSCAN clustering on copy number data}
\usage{
umap_clustering(
  CNbins,
  n_neighbors = 10,
  min_dist = 0.1,
  minPts = 30,
  seed = NULL,
  field = "copy",
  umapmetric = "correlation",
  hscn = FALSE,
  pca = NULL
)
}
\arguments{
\item{CNbins}{A data frame containing copy number data. Must include columns
for 'cell_id' and the specified `field`.}

\item{n_neighbors}{Integer. The number of neighbors to consider in UMAP. Default is 10.}

\item{min_dist}{Numeric. The minimum distance between points in UMAP. Default is 0.1.}

\item{minPts}{Integer. The minimum number of points to form a cluster in HDBSCAN. Default is 30.}

\item{seed}{Integer or NULL. Random seed for reproducibility. Default is NULL.}

\item{field}{Character. The column name in `CNbins` to use for copy number values. Default is "copy".}

\item{umapmetric}{Character. The distance metric to use in UMAP. Default is "correlation".}

\item{hscn}{Logical. Whether to use haplotype-specific copy number data. Default is FALSE.}

\item{pca}{Integer or NULL. Number of principal components to use in UMAP.  If NULL, pca not used, this is the default.}
}
\value{
A list containing:
  \item{clustering}{A data frame with UMAP coordinates and cluster assignments for each cell.}
  \item{hdbscanresults}{The results of the HDBSCAN clustering.}
  \item{umapresults}{The results of the UMAP dimensionality reduction.}
  \item{tree}{A phylogenetic tree object representing the hierarchical structure of the clusters.}
}
\description{
This function takes copy number data, performs UMAP dimensionality reduction,
and then applies HDBSCAN clustering to identify cell populations. It can handle
both standard copy number data and haplotype-specific copy number (HSCN) data.
}
\details{
The function performs the following steps:
1. Creates a copy number matrix from the input data.
2. Applies UMAP dimensionality reduction.
3. Performs HDBSCAN clustering on the UMAP results.
4. Generates a phylogenetic tree from the clustering results.

If `hscn` is TRUE, the function expects columns 'copy' and 'BAF' in `CNbins`,
and creates separate matrices for A and B alleles.

The function automatically adjusts `n_neighbors` if there are too few cells.
If UMAP fails, it attempts to rerun with small jitter added to the data points.
The function will reduce `minPts` if only one cluster is initially found.
}