Electrophysiology analysis pipeline using Kilosort2.5 via SpikeInterface.
The pipeline includes:
- preprocessing: phase_shift, highpass filter, and 1. common median reference ("cmr") or 2. destriping (bad channel interpolation + highpass spatial filter - "destripe")
- spike sorting: with KS2.5
- postprocessing: remove duplicate units, compute amplitudes, spike/unit locations, PCA, correlograms, template similarity, template metrics, and quality metrics
- curation based on ISI violation ratio, presence ratio, and amplitude cutoff
- visualization of timeseries, drift maps, and sorting output in sortingview
The run_capsule_*.py
scripts in the code
folder accept positional or optional arguments.
When using positional argument, up to 7 arguments can be passed, in this STRICT order:
- "debug": Whether to run in DEBUG mode (
false
ortrue
, defaultfalse
) - "concatenate": Whether to concatenate recordings/segments (
false
ortrue
. defaultfalse
) - "denoising strategy": Which denoising strategy to use. Can be
cmr
(default) ordestripe
- "remove out channels": Whether to remove out channels (
false
ortrue
, defaulttrue
) - "remove bad channels": Whether to remove bad channels (
false
ortrue
, defaulttrue
) - "max bad channel fraction": Maximum fraction of bad channels to remove. If more than this fraction, processing is skipped (default 0.5)
- "debug duration": Duration of clipped recording in debug mode. Default is 30 seconds. Only used if debug is enabled
The scripts also support the same options as follows:
--debug
: Whether to run in DEBUG mode. Default: False--concatenate
: Whether to concatenate recordings (segments) or not. Default: False--denoising
{cmr,destripe}: Which denoising strategy to use. Can be 'cmr' or 'destripe'--no-remove-out-channels
: Whether to remove out channels--no-remove-bad-channels
: Whether to remove bad channels--max-bad-channel-fraction
: Maximum fraction of bad channels to remove. If more than this fraction, processing is skipped--debug-duration
: Duration of clipped recording in debug mode. Default is 30 seconds. Only used if debug is enabled
In addition, the scripts accept the following configuration parameters:
--data-folder
: option to modify the path of the data (by default../data
)--results-folder
: option to modify the path of the results (by default../results
)--scratch-folder
: option to modify the path of the scratch (by default../scratch
), used to store temporary files--n-jobs
: parameter to control the maximum number of jobs used for parallelization.--params-file
: path to a JSON file to specify parameters--params-str
: JSON-formatted string with custom parameters
The NWB script also accepts the following parameter:
--electrical-series-path
: path to the electrical series to process, e.g.acquisision/ElectricalSeriesAP
This parameter is required if multiple electrical series are avaialable in the NWB file (otherwise an error is thrown with the available options).
NOTES ON PARAMETERS: In case
--params-file
/--params-str
are not specified, default parameters are used (seecode/processing_params.json
file).
For example, one could run:
python run_capsule_*.py true false destripe true false 0.8 30
Or:
python run_capsule_*.py --debug --denoising destripe --no-remove-bad-channels \
--max-bad-channel-fraction 0.8 --debug-duration 30
The script produces the following output files in the results
folder:
drift_maps
: raster maps for each streampostprocessed
: postprocessing output for each stream with waveforms, correlograms, isi histograms, principal components, quality metrics, similarity, spike amplitudes, spike and unit locations and template metrics. Each folder can be loaded with:we = si.load_waveforms("postprocessed/{stream_name}", with_recording=False)
spikesorted
: raw spike sorting output from KS2.5 for each stream. Each sorting output can be loaded with:sorting_raw = si.load_extractor("spikesorted/{stream_name}")
curated
: pre-curated spike sorting output, with an additionaldefault_qc
property (True
/False
) for each unit. Each pre-curated sorting output can be loaded with:sorting_raw = si.load_extractor("curated/{stream_name}")
processing_params.json
: the processing parameter following the aind-data-schema metadata schema.visualization_output.json
: convenient file with FigURL links for cloud visualization
The processing pipeline assumes that FigURL is correctly set up. If you are planning to use this pipeline extensively, please consider providing your own cloud resources (see Create Kachery Zone)
This pipeline is currently used at AIND on the Code Ocean platform.
The main
branch includes includes scripts and resources to run the pipeline locally.
In particular, the code/run_capsule_spikeglx.py
is designed to run on SpikeGLX datasets.
The code/run_capsule_nwb.py
is designed to run on an NWB file.
First, let's clone the repo:
git clone https://github.com/AllenNeuralDynamics/aind-capsule-ephys-spikesort-kilosort25-full
cd aind-capsule-ephys-spikesort-kilosort25-full
Next, we need to move the dataset to analyze in the data
folder.
For example, we can download an NWB file from DANDI (e.g. this dataset) and
move it to the data
folder:
mkdir data
mv path-to-download-folder/sub-mouse412804_ses-20200803T115732_ecephys.nwb data
Finally, we can start the container (ghcr.io/allenneuraldynamics/aind-ephys-spikesort-kilosort25-full:latest
)
from the repo base folder (aind-ephys-spikesort-kilosort25-full
):
chmod +x ./code/run_nwb
docker run -it --gpus all -v .:/capsule --shm-size 8G \
--env KACHERY_ZONE --env KACHERY_CLOUD_CLIENT_ID --env KACHERY_CLOUD_PRIVATE_KEY \
ghcr.io/allenneuraldynamics/aind-ephys-spikesort-kilosort25-full:latest
and run the pipeline:
cd /capsule/code
./run_nwb # + optional parameters (e.g., --debug)
NOTES ON DOCKER RUN:
The--gpu all
flag is required to make the GPU available to the container (and Kilosort).
The--shm-size 8G
flag is required to increase the shared memory size (default is 64M), which is used internally for parallel processing.
The-v .:/capsule
option mounts the current folder.
to the/capsule
folder in the container, so that the data and scripts are available.
THE FOLDER IS NOT MOUNTED IN READ-ONLY MODE, so be careful when deleting files in the container.
The--env KACHERY_ZONE --env KACHERY_CLOUD_CLIENT_ID --env KACHERY_CLOUD_PRIVATE_KEY
flags are required to set up the cloud visualization with FigURL (see Notes on visualization for more details)
Use the aind
branch for a Code Ocean-ready version.
The environment
folder contains a Dockerfile
to build the container with all required packages.
The code
folder contains the scripts to run the analysis (run_capsule_aind.py
).
The script assumes that the data in the data
folder is organized as follows:
- there is only one "session" folder
- the "session" folder contains either:
- the
ecephys
folder, with a validOpen Ephys
folder - the
ecephys_compressed
and theecephys_clipped
folders, created by the openephys_job script in the aind-data-transfer repo.
- the
For instructions for local deployment, refer to the Local Deployment section at the end of the page.
Here is a list of the key changes that are needed:
Code Ocean uses an internal registry of base Docker images. To use the same pipeline locally,
the base Docker image in the environment/Dockerfile
of the aind
branch is changed to:
FROM registry.codeocean.allenneuraldynamics.org/codeocean/kilosort2_5-compiled-base:latest
The first part of the code/run_capsule.py
script is dealing with loading the data.
This part is clearly tailored to the way we store the data at AIND (see this section).
In the main
branch, we included two extra run_capsule_*
scripts, one for SpikeGLX (run_capsule_spikeglx
) and one for NWB files (run_capsule_nwb
).
In both cases, we assume that the data folder includes a single dataset (either a SpikeGLX generated folder or a single NWB file).
At AIND, we use aind-data-schema to deal with metadata.
The scripts in the main
do not have metadata logging using the aind-data-schema
.