Skip to content

snippet file format

bnaecker edited this page Sep 8, 2015 · 6 revisions

Snippet file format

Snippets are small sections of raw data traces that are either candidate spikes, or noise segments used to cluster and sort true spikes. These are found and written to disk with the extract command-line tool. This document describes the format of the HDF files to which snippets are saved.

Thresholds and channels

The root group contains two datasets. "/thresholds" gives the threshold value used when extracting spikes from each channel. "/extracted-channels" lists the actual channel indices (0-based) from which data was extracted. Note that these are indices into the channels of the "/data" dataset from the original raw data file, not the true channel numbers.

Channel groups

For each channel from which snippets are extracted, there exists a corresponding group in the HDF file. Groups are named "channel-<num>", where the number is a 3-digit, 0-padded number indicating the index of the channel in the "/data" dataset from the original raw data file.

Channel datasets

In each channel group, there are four datasets. Let m be the number of snippets for a channel, let k be the length of each snippet, and let M be the default number of random snippets extracted, usually 5000.

  • spike-idx -- size: {m}
    • Indices into the original recording file where each snippet may be found
  • spike-snippets -- size: {m, k}
    • Actual candidate spike snippets.
  • noise-idx -- size: {M}
    • Indices into the original recording where each random snippet was extracted
  • noise-snippets -- size: {M, k}
    • Actual noise snippets

Attributes and metadata

In addition to the above datasets, snippet files also have another dataset, called "channels", which gives the actual channel numbers sorted. These are the same as the numbers in each channel group itself, but saved in numerical format.

Finally, a few attributes are also attached to the file itself. These are:

  • array: string
    • The array on which data was originally recorded.
  • gain: float32
    • The gain of the analog-digital conversion stage
  • offset: float32
    • The offset of the analog-digital conversion stage
  • date/time: strings
    • The date and time of the original recording.
  • source-file: string
    • The filename of the original data file from which data was extracted.

Example

Below is an example of a snippet file. You can also run extract on a datafile and open it in a GUI HDF browser.

$ h5ls --recursive 2015-01-27a.snip
/                        Group
/channel-004             Group
/channel-004/noise-idx   Dataset {5000}
/channel-004/noise-snippets Dataset {5000, 35}
/channel-004/spike-idx   Dataset {6481}
/channel-004/spike-snippets Dataset {6481, 35}
...
/channel-063/spike-snippets Dataset {19628, 35}
/channels                Dataset {60}
/thresholds              Dataset {60}