This repository contains the code used for analysis by Siletti et al. (2022). You can also find links below to the complete dataset of 3,369,219 cells.
https://www.biorxiv.org/content/10.1101/2022.10.12.511898v1
The dataset can be browsed from our collection at CELLxGENE. There is one browser per dissection, and one browser per supercluster. A browser for the combined non-neuronal cells is also available (but note that some immune cells are found in the Miscellaneous supercluster).
Raw data in fastq and BAM format are available at NeMO.
Our gene and transcript annotation is based on GRCh38.p13 gencode V35 primary sequence assembly. We discarded genes or transcripts that overlapped or mapped to other genes' or non-coding RNAs' 3’ UTRs. Here we provide the GTF file used to count reads, and the genes and transcripts that were discarded.
The final dataset is available for download at https://storage.cloud.google.com/linnarsson-lab-human. Two files are available in loom file format:
- Genes x cells: adult_human_20221007.loom
- Genes x clusters: adult_human_20221007.agg.loom
The genes x cells dataset is alternatively available in two .h5ad files:
- Neurons: Neurons.h5ad
- Non-neuronal cells: Nonneurons.h5ad
💡Tip: Data for superclusters and dissections can also be downloaded from CELLxGENE in .h5ad
(AnnData, for Scanpy) and .rds
(for Seurat) formats by following the links to the browsers above.
In addition, expression matrices generated with the "standard" cellranger + velocyto pipeline using cellranger GRCh38-3.0.0 annotations are available in loom and anndata formats:
human_adult-GRCh38-3.0.0.h5ad (Annotations basically follow CELLxGENE standards.)
These files contain exactly the same cells as adult_human_20221007.loom. Some ~70000 cells that were filtered out by this procedure have zero total UMI count.
The files with the molecule coordinates (as .parquet) and gene x cell counts (as .loom) are available in the EEL_adult folder at: https://storage.cloud.google.com/linnarsson-lab-human
Data in the .parquet format can be opened by FISHscale, Python Pandas or any other Parquet reader.
r_px_microscope_stitched
and c_px_microscope_stitched
contain the RNA molecule coordinates in pixels (pixel size of 0.18um).
r_transformed
and c_transformed
contain the RNA molecule coordinates in pixels (pixel size of 0.27um).
Clustering was performed using cytograph
. Installation and usage are described here. Other materials include:
scripts
: other scripts named in the Methods sectionnotebooks
: the code used to make figurestables
: the manuscript's supplementary tables, as well as a subcluster annotation table.
Auto-annotations are available in a separate repository.