Releases: neurosity/EEG-GPT
Releases · neurosity/EEG-GPT
v0.1.0 - Initial release!
Release Notes: v0.1.0
Initial Release (v0.1.0)
Overview
We are excited to announce the initial release of our EEG foundation model project, based on the NeuroGPT model by Wenhui Cui et al. This release includes several models trained on EEG data, available at different checkpoints throughout the training process.
Features
- Preprocessing Script: Added preprocess.py to convert CSV or EDF files to NumPy
.npy
files with various preprocessing steps, including notch filtering and bandpass filtering. - Parallel Processing: Implemented parallel processing for preprocessing using the
--parallel
flag. - TUH EEG Support: Added support for TUH EEG files in preprocessing.
- Experiment Tracking: Integrated wandb for experiment tracking.
- Training and Evaluation Logging: Added CSVLogCallback class for logging training and evaluation metrics to CSV files.
- Distributed Training: Provided train_parallel.sh script for distributed training using PyTorch with multiple GPUs.
- Model Checkpoints: Several models are available from different points within the training process, allowing for comparison and selection based on performance metrics.
Documentation
- Comprehensive README: Detailed setup instructions, preprocessing details, and example usage.
- External Resources: Links to additional help with tools like tmux.
Models
- Multiple Checkpoints: Models are available at various checkpoints, including:
- Checkpoint at 9,900 steps
- Checkpoint at 25,000 steps
- Checkpoint at 41,100 steps
- Final model at 50,000 steps
How to Use
- Preprocessing: Use preprocess.py to convert your EEG data into the required format.
- Training: Use train_gpt.py with the provided scripts for training on your data.
- Evaluation: Evaluate the models using test_gpt.py to determine the best performing checkpoint for your application.
Example Usage
For preprocessing TUH EEG files:
python3 src/eeg/preprocess.py --input_directory edf/ --output_directory data/npy_tuh_eeg --notch_filter 50 60 --bandpass_filter 1 48 --verbose --tuh_eeg --cutoff_samples 18
For training the model:
python src/train_gpt.py --training-steps=50000 --eval_every_n_steps=100 --log-every-n-steps=10 --per-device-training-batch-size=32 --per-device-validation-batch-size=32 --num-workers=32 --num_chunks=32 --chunk_len=500 --chunk_ovlp=50 --num-hidden-layers=6 --num-encoder-layers=6 --run-name=32clen2_embed1024 --training-style=CSM_causal --embedding-dim=1024 --train-data-path=data/npy_tuh_eeg --verbose=True
For distributed training:
python -m torch.distributed.launch --nproc_per_node=8 src/train_gpt.py --training-steps=50000 --eval_every_n_steps=100000 --log-every-n-steps=100 --per-device-training-batch-size=32 --per-device-validation-batch-size=32 --num-workers=32 --num_chunks=32 --chunk_len=500 --chunk_ovlp=50 --num-hidden-layers=6 --num-encoder-layers=6 --run-name=32clen2_embed1024_multi_gpu --training-style=CSM_causal --embedding-dim=1024 --train-data-path=data/npy_tuh_eeg --verbose=True &> train_parallel.log
References
- NeuroGPT: Based on the NeuroGPT model by Wenhui Cui et al.
- Neurosity Foundational Model: Inspired by the Neurosity Foundational Model by Jeremy Nixon and AJ Keller.
Acknowledgments
We would like to thank the contributors and the community for their support and feedback. This project is under active development, and we welcome contributions and suggestions.
For more details, please refer to the README and the CHANGELOG.
"""
test.py
Testing of models based on given data. See get_args() for
details on command line arguments
Give it n chunks, n<32.. test the n+1 chunk...
"""
import os
from typing import Dict
from safetensors import safe_open
from safetensors.torch import load_file
from train_gpt import make_model, get_config
from batcher.downstream_dataset import EEGDataset
import torch
from torch.utils.data import DataLoader
from numpy import random
if __name__ == '__main__':
config = dict(get_config())
model = make_model(config)
root_path = os.getcwd()
# results/models/upstream/32clen2_embed1024/model_final/model.safetensors
model_path = os.path.join(os.getcwd(), config["log_dir"], "model_final")
state_dict = load_file(model_path + "/model.safetensors")
model.load_state_dict(state_dict=state_dict)
# input_dataset = {'inputs': torch.ones(size=(1,32,68,256)),
# 'attention_mask': torch.zeros(size=(1,32,68,256)).numpy(),
# 'seq_on': 0,
# 'seq_len': 32
# }
train_data_path = config["train_data_path"]
files = [os.path.join(train_data_path, f) for f in os.listdir(train_data_path) if f.endswith('.npy')]
# # Remove files less than 0.2 MB
files = [f for f in files if os.path.getsize(f) >= 0.2 * 1024 * 1024]
random.shuffle(files)
num_files = len(files)
split_index = int(num_files * 0.9)
train_files = files[:split_index]
validation_files = files[split_index:]
test_dataset = EEGDataset(validation_files, sample_keys=[
'inputs',
'attention_mask'
], chunk_len=config["chunk_len"],num_chunks=config["num_chunks"], ovlp=config["chunk_ovlp"], root_path=root_path, gpt_only=not config["use_encoder"], normalization=config["do_normalization"])
model.eval()
sample = DataLoader(test_dataset, batch_size=1)
output = model(next(iter(sample)), prep_batch=True)
print("Predictions: ", output['outputs'])
print("Shape: ", output['outputs'].shape)
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: EEG Utils",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/src/eeg/utils.py",
"args": [
"--input_directory",
"data/crown/sessions",
"--find_latest_timestamp",
]
},
{
"name": "Python: Validate Numpy Arrays",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/src/eeg/validate.py",
"args": [
"--path",
"data/npy_tuh_eeg",
"--parallel"
]
},
{
"name": "Python: TUH EEG DEBUG",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/src/eeg/preprocess.py",
"args": [
"--input_directory",
"data/tuh_eeg",
"--output_directory",
"data/npy_tuh_eeg_test",
"--notch_filter",
"50",
"60",
"--bandpass_filter",
"1",
"48",
python src/train_gpt.py \
--training-steps=50000 \
--eval_every_n_steps=100 \
--log-every-n-steps=10 \
--per-device-training-batch-size=32 \
--per-device-validation-batch-size=32 \
--num-workers=32 \
--num_chunks=32 \
--chunk_len=500 \
--chunk_ovlp=50 \
--num-hidden-layers=6 \
--num-encoder-layers=6 \
--run-name=32clen2_embed1024 \
--training-style=CSM_causal \
--embedding-dim=1024 \
--train-data-path=data/npy_tuh_eeg \
--verbose=True
python -m torch.distributed.launch --nproc_per_node=8 \
src/train_gpt.py \
--training-steps=50000 \
--eval_every_n_steps=100000 \
--log-every-n-steps=100 \
--per-device-training-batch-size=32 \
--per-device-validation-batch-size=32 \
--num-workers=32 \
--num_chunks=32 \
--chunk_len=500 \
--chunk_ovlp=50 \
--num-hidden-layers=6 \
--num-encoder-layers=6 \
--run-name=32clen2_embed1024_multi_gpu \
--training-style=CSM_causal \
--embedding-dim=1024 \
--train-data-path=data/npy_tuh_eeg \
--verbose=True \
&> train_parallel.log