Skip to content

Commit

Permalink
Resolve conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
vianamp committed Mar 19, 2024
2 parents 841a709 + 3e5e3cd commit fbf6e03
Show file tree
Hide file tree
Showing 6 changed files with 90 additions and 35 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/build-master.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Build Master
# name: Build Master

on:
push:
Expand Down Expand Up @@ -29,8 +29,8 @@ jobs:
- name: Upload codecov
uses: codecov/codecov-action@v1

lint:
runs-on: ubuntu-latest
# lint:
# runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v1
Expand Down
29 changes: 29 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Publish

on:
push:
branches:
- master

jobs:
publish:
if: "contains(github.event.head_commit.message, 'Bump version')"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install Dependencies
run: |
python -m pip install --upgrade pip
pip install build wheel
- name: Build Package
run: |
python -m build
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@master
with:
password: ${{ secrets.PYPI_TOKEN }}
verbose: true
8 changes: 4 additions & 4 deletions .github/workflows/test-and-lint.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Test and Lint
# name: Test and Lint

on: pull_request
# on: pull_request

jobs:
test:
Expand All @@ -26,8 +26,8 @@ jobs:
- name: Upload codecov
uses: codecov/codecov-action@v1

lint:
runs-on: ubuntu-latest
# lint:
# runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v1
Expand Down
74 changes: 50 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,34 +13,43 @@
## Installation

First, create a conda environment for this project:

```
conda create --name cvapipe python=3.8
conda activate cvapipe
```

then clone this repo

```
git clone https://github.com/AllenCell/cvapipe_analysis.git
```

and install it with

```
cd cvapipe_analysis
pip install -e .
```

Alternatively, install the latest stable version from pypi by running

```
pip install cvapipe_analysis
```

## Types of usage

This package can be used to reproduce main results shown in [1] or to generate similar results using your own data. However, before applying to your dataset, we highly recommend you to first run it for reproducibility in our test dataset to understand how the package works.

[1] - [Viana, Matheus P., et al. "Robust integrated intracellular organization of the human iPS cell: where, how much, and how variable?." bioRxiv (2020)](https://www.biorxiv.org/content/10.1101/2020.12.08.415562v1).


## The YAML configuration file

This package is fully configured through the file `config.yaml`. This file is divided into sections that more or less has a one-to-one mapping to existing workflow steps. Here are the main things you need to know about the configuration file:

**Project**

```
appName: cvapipe_analysis
project:
Expand All @@ -52,6 +61,7 @@ project:
Set the full path where you want data and results to be stored in `local_staging`.

**Data**

```
data:
nucleus:
Expand All @@ -75,6 +85,7 @@ data:
Here we provide a description of the data. Aliases must be unique and they are used in the rest of the configuration file to specify which data we are referring to. In case you are using this package on your own data, be aware that the values used in the field `channel` must be found in the column `name_dict`of your input manifets file (see the section "Running the pipeline on your own data").

**Features**

```
features:
aliases: ["NUC", "MEM", "STR"]
Expand All @@ -92,9 +103,11 @@ features:
# and nuclear shape
lmax: 16
```

This section is used to specify which aliases we should compute features on. In addition, which aliases we should calculate the spherical harmonics coefficies on and which type of alignment should be used.

**Pre-processing**

```
preprocessing:
remove_mitotics: on
Expand All @@ -104,6 +117,7 @@ preprocessing:
Here we set whether or not to remove mitotic cells or outlier from the dataset. You can turn this off when running `cvapipe_analysis` on your own data.

**Shape Space**

```
shapespace:
# Specify the a set of aliases here
Expand All @@ -125,6 +139,7 @@ shapespace:
Here we specify which aliases should be used to create a shape space. This must be a subset of the aliases specified above to have their spherical harmonics coefficients computed. In case os small datasets with only hundreds of cells, you may want to reduce the number of map points of your shape soace. The number of map points must be odd.

**Intensity Parameterization**

```
parameterization:
inner: "NUC"
Expand All @@ -136,6 +151,7 @@ parameterization:
First we specify which alias should be used as internal and external references and the aliases that we obtain parameterization for.

**Structures**

```
structures:
"FBL": ["nucleoli [DFC)", "#A9D1E5", "{'raw': (420, 2610), 'seg': (0,30), 'avgseg': (80,160)}"]
Expand Down Expand Up @@ -172,20 +188,23 @@ Here we specify a dictionary with the gene names, description and color for each
This analysis is currently not configured to run as a workflow. Please run steps individually.

### 1. Download the single-cell image dataset manifest including raw GFP and segmented cropped images

```
cvapipe_analysis loaddata run
```

This command downloads the whole dataset of ~7Tb. For each cell in the dataset, we provide a raw 3-channels image containing fiducial markers for cell membrane and nucleus, toghether with a FP marker for one intracellular structure. We also provide segmentations for each cell in the format of 5-channels binary images. The extra two channels corresponds to roof-augmented versions of cell and intracellular structures segmentations. For more information about this, please refer to our paper [1]. Metadata about each cell can be found in the file `manifest.csv`. This is a table where each row corresponds to a cell.

**Importantly**, you can download a *small test dataset composed by 300 cells chosen at random* from the main dataset. To do so, please run
**Importantly**, you can download a _small test dataset composed by 300 cells chosen at random_ from the main dataset. To do so, please run

```
cvapipe_analysis loaddata run --test
```

This step saves the single-cell images in the folders `local_staging/loaddata/crop_raw` and `local_staging/loaddata/crop_seg`.

### 2. Compute single-cell features

```
cvapipe_analysis computefeatures run
```
Expand All @@ -195,17 +214,19 @@ This step extract single-cell features, including cell, nuclear and intracellula
This step saves the features in the file `local_staging/computefeatures/manifest.csv`.

### 3. Pre-processing dataset

```
cvapipe_analysis preprocessing run
```

This step removes outliers and mitotic cells from the single cell dataset. This step depends on step 2.
This step removes outliers and mitotic cells from the single cell dataset. This step depends on step 2.

This step saves results in the file `local_staging/preprocessing/manifest.csv` and the **folder: `local_staging/preprocessing/outliers/`**

- `xx.png`: Diagnostic plots for outlier detection.
- `xx.png`: Diagnostic plots for outlier detection.

### 4. Compute shapemodes

```
cvapipe_analysis shapemode run
```
Expand All @@ -216,47 +237,50 @@ Two output folders are produced by this step:

**Folder: `local_staging/shapemode/pca/`**

- `explained_variance.png`: Explained variance by each principal component.
- `feature_importance.txt`: Importance of first few features of each principal component.
- `pairwise_correlations.png`: Pairwise correlations between all principal components.
- `explained_variance.png`: Explained variance by each principal component.
- `feature_importance.txt`: Importance of first few features of each principal component.
- `pairwise_correlations.png`: Pairwise correlations between all principal components.

**Folder: `local_staging/shapemode/avgshape/`**

- `xx.vtk`: vtkPolyData files corresponding to 3D cell and nuclear meshes. We recommend [Paraview](https://www.paraview.org) to open these files.
- `xx.gif`: Animated GIF illustrating cell and nuclear shape modes from 3 different projections.
- `combined.tif`: Multichannel TIF that combines all animated GIFs in the same image.
- `xx.vtk`: vtkPolyData files corresponding to 3D cell and nuclear meshes. We recommend [Paraview](https://www.paraview.org) to open these files.
- `xx.gif`: Animated GIF illustrating cell and nuclear shape modes from 3 different projections.
- `combined.tif`: Multichannel TIF that combines all animated GIFs in the same image.

### 5. Create the parameterized intracellular location representation (PILR)

```
cvapipe_analysis parameterization run
```

Here we use `aics-cytoparam` [(link)](https://github.com/AllenCell/aics-cytoparam) to create parameterizations for all of the single-cell data. This steps depends on step 4 and step 3.

One output folder is produced by this step:
One output folder is produced by this step:

**Folder: `local_staging/parameterization/representations/`**

- `xx.tif`: Multichannels TIFF image with the cell PILR.
- `xx.tif`: Multichannels TIFF image with the cell PILR.

### 6. Create average PILRs

```
cvapipe_analysis aggregation run
```

This step average multiple cell PILRs and morphs them into idealized shapes from the shape space. This step depends on step 5.

Two output folders are produced by this step:
Two output folders are produced by this step:

**Folder: `local_staging/aggregation/repsagg/`**

- `avg-SEG-TUBA1B-DNA_MEM_PC4-B5-CODE.tif`: Example of file generated. This represents the average PILR from segmented images of all TUBA1B cells that fall into bin number 5 from shape mode 4.
- `avg-SEG-TUBA1B-DNA_MEM_PC4-B5-CODE.tif`: Example of file generated. This represents the average PILR from segmented images of all TUBA1B cells that fall into bin number 5 from shape mode 4.

**Folder: `local_staging/aggregation/aggmorph/`**

- `avg-SEG-TUBA1B-DNA_MEM_PC4-B5.tif`: Same as above but the PILR has been morphed into the cell shape corresponding to bin number 5 of shape mode 4.
- `avg-SEG-TUBA1B-DNA_MEM_PC4-B5.tif`: Same as above but the PILR has been morphed into the cell shape corresponding to bin number 5 of shape mode 4.

### 7. Correlate single-cells PIRL

```
cvapipe_analysis correlation run
```
Expand All @@ -267,26 +291,29 @@ One output folder is produced by this step:

**Folder: `local_staging/correlation/values/`**

- `avg-STR-NUC_MEM_PC8-1.tif`: Example of file generated. Correlation matrix of between PILRs of all cells that fall into bin number 1 and shape mode 8.
- `avg-STR-NUC_MEM_PC8-1.csv`: Example of file generated. Provides the cell indices for the correlation matrix above.
- `avg-STR-NUC_MEM_PC8-1.tif`: Example of file generated. Correlation matrix of between PILRs of all cells that fall into bin number 1 and shape mode 8.
- `avg-STR-NUC_MEM_PC8-1.csv`: Example of file generated. Provides the cell indices for the correlation matrix above.

### 8. Stereotypy analysis

```
cvapipe_analysis stereotypy run
```

This step calculates the extent to which a structure’s individual location varies. This step depends on step 5.

Two output folders are produced by this step:
Two output folders are produced by this step:

**Folder: `local_staging/stereotypy/values`**

- `*.csv*`: Stereotypy values.
- `*.csv*`: Stereotypy values.

**Folder: `local_staging/stereotypy/plots`**

- Resulting plots.
- Resulting plots.

### 9. Concordance analysis

```
cvapipe_analysis concordance run
```
Expand All @@ -297,11 +324,11 @@ Two output folders are produced by this step:

**Folder: `local_staging/concordance/values/`**

- `*.csv*`: Concordance values
- `*.csv*`: Concordance values

**Folder: `local_staging/concordance/plots/`**

- Resulting plots.
- Resulting plots.

## Running the pipeline on your own data

Expand All @@ -327,5 +354,4 @@ All the other steps can be ran without modifications.

If you are running `cvapipe_analysis` on a Slurm cluster or any other cluster with `sbatch` capabilities, each step can be called with a flag `--distribute`. This will spawn many jobs to run in parallel in the cluster. Specific parameters can be set in the `resources` section of the YAML config file.

***Free software: Allen Institute Software License***

**_Free software: Allen Institute Software License_**
6 changes: 3 additions & 3 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.1.0
current_version = 0.1.4
commit = True
tag = True

Expand All @@ -21,9 +21,9 @@ test = pytest
collect_ignore = ['setup.py']

[flake8]
exclude =
exclude =
docs/
ignore =
ignore =
E203
E402
W291
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,6 @@
url="https://github.com/AllenCellModeling/cvapipe_analysis",
# Do not edit this string manually, always use bumpversion
# Details in CONTRIBUTING.rst
version="0.1.0",
version="0.1.4",
zip_safe=False,
)

0 comments on commit fbf6e03

Please sign in to comment.