Data and code related to our manuscript Comprehensive cell atlas of the first-trimester developing human brain (Emelie Braun, Miri Danan-Gotthold et al. 2022, in review).
https://www.biorxiv.org/content/10.1101/2022.10.24.513487v1
We used the Shoji tensor database and the cytograph-shoji pipeline.
Code for making many of the figures is available as Jupyter notebooks
Metadata per sample: table_S1.xlsx
Metadata per cluster: table_S2.xlsx
Raw data: EGAS00001004107
Complete processed dataset: HumanFetalBrainPool.h5
Also available in h5ad format with CELLxGENE annotations: human_dev.h5ad
See further below for a description of the content of the .h5 files
Alternative expression matrices generated with the "standard" cellranger + velocyto pipeline using cellranger GRCh38-3.0.0 annotations are available in loom and anndata formats:
human_dev-GRCh38-3.0.0.h5ad (Annotations basically follow CELLxGENE standards.)
human_dev-GRCh38-3.0.0_all_layers.h5ad (The same but including 'ambiguous', 'spliced', and 'unspliced' layers.)
These files contain exactly the same cells as the HumanFetalBrainPool.h5 file. Some ~8000 cells that were filtered out by this procedure have zero total UMI count.
(coming soon)
Section 1 Z=970um
Section 2 Z=810um
Section 3 Z=640um
3 spatial EEL FISH datasets of sagittaly cut full human embryo at 5 weeks post conception. Data is in the .parquet format and can be opened by FISHscale, Python Pandas or any other Parquet reader.
r_px_microscope_stitched
and c_px_microscope_stitched
contain the RNA molecule coordinates in pixels (pixel size of 0.18um).
r_transformed
and c_transformed
contain the RNA molecule coordinates in pixels (pixel size of 0.27um).
Tissue
and Brain
columns indicate if the detected molecules are in the tissue or in the brain respectively.
The datasets are provided as HDF5 files containing the tensors listed below. In Python, they can be accessed using h5py (other languages have similary libraries).
The most important tensors are Expression (the expression matrix; sum of spliced and unspliced UMIs), Gene (gene names), Accession (Ensembl accessions), Clusters (cluster labels), Embedding (tSNE), Factors (PCA components), ManifoldIndices (KNN graph edges) and ManifoldWeights (KNN graph edge weights).
dtype | rank | dims | shape | (values) | |
---|---|---|---|---|---|
Accession | string | 1 | genes | 59,480 | ["pCAG-DsRed2_101-650", "pCS-Cherry-DEST_101-850", "pCAG ··· |
Age | float32 | 1 | cells | 1,665,937 | [8.0, 8.0, 8.0, 8.0, 8.0, ...] |
AnnotationDefinition | string | 1 | annotations | 51 | ["+MPZ", "+EYA1 +ISL1", "+NHLH1", "+MEIS2 +ISL1 +SIX3", ··· |
AnnotationDescription | string | 1 | annotations | 51 | ["Schwann cell-like (E-SCHWL; +MPZ)", "Otic vesicle of t ··· |
AnnotationName | string | 1 | annotations | 51 | ["E-SCHWL", "HB-OTV", "NBL", "TH-RETN", "CB-PURK", ...] |
AnnotationPosterior | float32 | 2 | clusters ✕ annotations | 617 ✕ 51 | [[-1.8189894e-12, 6.617445e-24, 1.0, 3.3087225e-24, 3.30 ··· |
CellClass | string | 1 | cells | 1,665,937 | ["Erythrocyte", "Erythrocyte", "Erythrocyte", "Erythrocy ··· |
CellCycleFraction | float32 | 1 | cells | 1,665,937 | [0.0, 0.0001071352, 0.0, 0.00095663266, 0.0, ...] |
CellID | string | 1 | cells | 1,665,937 | ["10X89_1:AAACGGGAGGCTACGA", "10X89_1:ACGAGGAAGAGCCTAG", ··· |
Chemistry | string | 1 | cells | 1,665,937 | ["v2", "v2", "v2", "v2", "v2", ...] |
Chromosome | string | 1 | genes | 59,480 | ["chrEXTRA", "chrEXTRA", "chrEXTRA", "chrEXTRA", "chrEXT ··· |
Class | string | 1 | clusters | 617 | ["Neuroblast", "Radial glia", "Radial glia", "Glioblast" ··· |
ClusterID | uint32 | 1 | clusters | 617 | [0, 1, 2, 3, 4, ...] |
Clusters | uint32 | 1 | cells | 1,665,937 | [240, 240, 236, 240, 233, ...] |
Donor | string | 1 | cells | 1,665,937 | ["BRC2006", "BRC2006", "BRC2006", "BRC2006", "BRC2006", ...] |
DoubletFlag | bool | 1 | cells | 1,665,937 | [False, False, False, False, False, ...] |
DoubletScore | float32 | 1 | cells | 1,665,937 | [0.02, 0.02, 0.03, 0.01, 0.02, ...] |
DropletClass | uint8 | 1 | cells | 1,665,937 | [0, 0, 0, 0, 0, ...] |
Embedding | float32 | 2 | cells ✕ 2 | 1,665,937 ✕ 2 | [[22.061909, 11.055673], [23.594717, 10.600938], [25.339 ··· |
End | string | 1 | genes | 59,480 | ["550", "1320", "2090", "3610", "4730", ...] |
Enrichment | float32 | 2 | clusters ✕ genes | 617 ✕ 59,480 | [[1.0, 1.0, 1.0, 1.0, 1.0, ...], [1.0, 1.0, 1.0, 1.0, 1. ··· |
Expression | uint16 | 2 | cells ✕ genes | 1,665,937 ✕ 59,480 | [[0, 0, 0, 0, 0, ...], [0, 0, 0, 0, 0, ...], [0, 0, 0, 0 ··· |
Factors | float32 | 2 | cells ✕ __ | 1,665,937 ✕ 50 | [[-1.5914472, 1.524089, 0.21222332, -4.3109193, -5.85292 ··· |
Gene | string | 1 | genes | 59,480 | ["marker-DsRed", "marker-Cherry", "marker-GFP", "marker- ··· |
GeneNonzeros | uint32 | 1 | genes | 59,480 | [0, 0, 0, 0, 0, ...] |
GeneTotalUMIs | uint32 | 1 | genes | 59,480 | [0, 0, 0, 0, 0, ...] |
Linkage | float32 | 2 | __ ✕ 4 | 616 ✕ 4 | [[238.0, 239.0, 0.0016231078, 2.0], [237.0, 617.0, 0.002 ··· |
Loadings | float32 | 2 | genes ✕ __ | 59,480 ✕ 50 | [[0.0, 0.0, 0.0, 0.0, 0.0, ...], [0.0, 0.0, 0.0, 0.0, 0. ··· |
ManifoldIndices | uint32 | 2 | __ ✕ 2 | 40,164,783 ✕ 2 | [[0, 6], [0, 106], [0, 208], [0, 225], [0, 246], ...] |
ManifoldRadius | float32 | 0 | () | () | 1.0 |
ManifoldWeights | float32 | 1 | __ | 40,164,783 | [0.9746674, 0.9753966, 0.97435904, 0.9760038, 0.98073715 ··· |
MeanAge | float64 | 1 | clusters | 617 | [10.651846331718932, 10.967863210449874, 10.768960981864 ··· |
MeanCellCycle | float64 | 1 | clusters | 617 | [0.002357223176804402, 0.003319249633509612, 0.023186484 ··· |
MeanDoubletScore | float64 | 1 | clusters | 617 | [0.09462042097992746, 0.11769588179965942, 0.19775236498 ··· |
MeanExpression | float64 | 2 | clusters ✕ genes | 617 ✕ 59,480 | [[0.0, 0.0, 0.0, 0.0, 0.0, ...], [0.0, 0.0, 0.0, 0.0, 0. ··· |
MeanTotalUMI | float64 | 1 | clusters | 617 | [5449.63220088626, 5258.164957264958, 7567.301298701311, ··· |
MitoFraction | float32 | 1 | cells | 1,665,937 | [0.0, 0.0038568673, 0.008797339, 0.0015943878, 0.0018687 ··· |
NCells | uint64 | 1 | clusters | 617 | [1354, 1170, 770, 1232, 1536, ...] |
NGenes | uint32 | 1 | cells | 1,665,937 | [121, 271, 674, 101, 113, ...] |
Nonzeros | uint64 | 2 | clusters ✕ genes | 617 ✕ 59,480 | [[0, 0, 0, 0, 0, ...], [0, 0, 0, 0, 0, ...], [0, 0, 0, 0 ··· |
OverallTotalUMIs | uint64 | 0 | () | () | 13029800607 |
PrevClusters | uint32 | 1 | cells | 1,665,937 | [658, 658, 662, 658, 669, ...] |
Recipe | string | 1 | __ | 2 | ["{'InitializeWorkspace': {'from_workspace': 'samples202 ··· |
Region | string | 1 | cells | 1,665,937 | ["Telencephalon", "Telencephalon", "Telencephalon", "Tel ··· |
SampleID | string | 1 | cells | 1,665,937 | ["10X89_1", "10X89_1", "10X89_1", "10X89_1", "10X89_1", ...] |
SelectedFeatures | bool | 1 | genes | 59,480 | [False, False, False, False, False, ...] |
Sex | string | 1 | cells | 1,665,937 | ["", "", "", "", "", ...] |
Species | string | 0 | () | () | "Homo sapiens" |
Start | string | 1 | genes | 59,480 | ["1", "571", "1341", "2111", "3631", ...] |
StdevExpression | float32 | 1 | genes | 59,480 | [0.0, 0.0, 0.0, 0.0, 0.0, ...] |
Subdivision | string | 1 | cells | 1,665,937 | ["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...] |
Subregion | string | 1 | cells | 1,665,937 | ["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...] |
Tissue | string | 1 | cells | 1,665,937 | ["Cortex", "Cortex", "Cortex", "Cortex", "Cortex", ...] |
TopLevelCluster | uint32 | 1 | cells | 1,665,937 | [25, 25, 25, 25, 25, ...] |
TotalUMIs | uint32 | 1 | cells | 1,665,937 | [4630, 9334, 9321, 3136, 4281, ...] |
Trinaries | float32 | 2 | clusters ✕ genes | 617 ✕ 59,480 | [[-1.8189894e-12, -1.8189894e-12, -1.8189894e-12, -1.818 ··· |
UnsplicedFraction | float32 | 1 | cells | 1,665,937 | [0.3514039, 0.33833298, 0.3174552, 0.32589287, 0.3585611 ··· |
ValidCells | bool | 1 | cells | 1,665,937 | [True, True, True, True, True, ...] |
ValidGenes | bool | 1 | genes | 59,480 | [False, False, False, False, False, ...] |
Our gene and transcripts annotation is based on Based on GRCh38.p13 gencode V35 primary sequence assembly.
We discarded genes or transcripts that overlapped or mapped to other genes or non-coding RNAs 3’ UTR.
The GTF file used for read counts: gb_pri_annot_filtered.gtf.gz
The genes and transcripts that were discarded: gb_pri_filtered_transcripts.txt.gz