Skip to content

Commit

Permalink
merge codebase changes on main into default (#92)
Browse files Browse the repository at this point in the history
* wip: dataloader first draf

* Fixing train, val, and test path

* Added initial project structure

Added a bunch of directories with (mostly) empty/dummy .py files for now, so that everyone can see what the project will be structured like. On top of the present directories, there will also be a datasets and a logs directory, the latter being dynamically created at traintime or validation time.

* rename file, remove one-hot encode

* Revert "wip: dataloader first draft"

* Updating component loading section

* sequence dataloader baseline model

* fixing a couple typos

* Delete src/metrics directory

Deleting metrics directory as it was decided we'll have only one file with all metrics.

* Added refactored DDPM and UNet from notebook V2

Refactored Lucas's DDPM, UNet and units and added them as PL modules.

* Update diffusion.py

Added "instantiate_from_config" import.

* Update ddpm.py

Added nucleotides as a parameter with a default of 4 to the sample method.

* wip: separate train/val/test subclasses

* Delete codebase/src/data directory

* Updated PL  dataloader

* placeholder test file

* Update unet_lucas.py

Added default function import.

* Added matching dummy test files

* complete: initial dataloader

* Added config template

Designed config template mainly for PL-related parameters. Keeping multiprocessing arguments for multi-GPU for the first test, which we'll change to multi-node. Diffusion and UNet parameters can easily vary.

* Delete dummy_config.yaml

* delete test_diffusion

* fix: fixed function naming convention

* feat: Add initial CI proposal

* feat: Add a simple pyproject config file

* wip: train.py + configs

* config folder structure update

* fix datapath param of datasets

* add additional sequence encoding schemes + separate transforms

* add tests for sequence dataloader

* add additional asserts for data batches

* check sequence lengths in datasets

* add more tests for invalid data

* style: run black

* feat: Refactor schedules and remove time_difference

* feat: Add type hints to schedule utility functions

* feat: Refactor noise schedule fn

* feat: refactor q_sample fn

* feat: add type hints to q_sample

* feat: drop bit_scale

* feat: run black and switch to torch.log

* feat: drop t_index

* feat: refactor p_sample fn

* feat: refactor p_sample_loop fn

* feat: refactor sample fn

* feat: refactor training_step fn

* feat(ci): Add `codebase` branch to CI

Based on discussion with @mateibejan1, running the tests on the `codebase` branch is also essential. It's the branch which is under heavy development and we should ensure all tests pass before we merge into `codebase` as well.

* reqs: add `pandas` to requirements.txt

* reqs: add `torch` to requirements.txt

* reqs: bump torch to `1.11.0` for compatibility

* fix(ci): run pytest as a module

* reqs: add torchvision to `0.12.0`

* reqs: add `pytorch-lightning`

* fix: failing CI tests for dataloader across platforms

* fix: failing CI tests for dataloader - wrap transforms

* fix: failing CI tests for dataloader - no multiprocessing for transforms

* Add Lucas' conditioned UNet

* Update EMA with Lucas' version

* Added mean_flat util from P2 paper

* Added P2 weighting skeleton. 

Need to figure out how to use P2 weighting on DNA data.

* misc: create a PR template

Fixes #51

* misc: add doc strings and type hints to the PR template

cc: @mateibejan1

* Add files via upload

* Add files via upload

* Add files via upload

Updated DDPM with the Noah's refactored notebook version. Preemptively added p2_weighting, need to figure out if/how it works on bit sequences.

* Add files via upload

* Add files via upload

* style: run black

* feat: add type hints to `utils/misc.py`

* feat: add type hints to utils/metrics

* feat: add type hints to utils/schedules

* feat: add type hints to unet_bitdiffusion

* feat: add type hints to unet_lucas

* feat: add type hints to ddim

* feat: add type hints to seq dataloader

* feat: add type hints to unet_lucas_cond

* Delete ddim.py

Deprecated.

* Delete unet_bitdiffusion.py

Deprecated.

* Update unet_conditional.yaml

Changed default number of timesteps from 1000 to 200.

* Update unet_conditional.yaml

Moved unet_config params inside the diffusion models params, so it mirrors the hierarchical relationship between the diffusion class and the unet class.

* Update misc.py

Minor dict property name changes.

* Update diffusion.py

* Update diffusion.py

* Update default.yaml

* Update unet_lucas.py

* initial test lucas unet

* add test vq

* ddm

* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci

* merge codebase-hydra-restructure into main (#90)

* WIP new folder structure

* ema parameter fix

* Base dataloader instantiation with full hydraconfig succesful, missing full params

* Update sequence_dataloader.py

* Remove outputs folder, update .gitignore

* Update network.py

* Update sequence_datamodule.py

* Update sequence_datamodule.py

---------

Co-authored-by: cmvcordova <[email protected]>
Co-authored-by: cmvcordova <[email protected]>
Co-authored-by: Matei Bejan <[email protected]>

---------

Co-authored-by: ssenan <[email protected]>
Co-authored-by: Matei Bejan <[email protected]>
Co-authored-by: Bendidi Ihab <[email protected]>
Co-authored-by: Saurav Maheshkar <[email protected]>
Co-authored-by: Jan Sobotka <[email protected]>
Co-authored-by: ceziegler <[email protected]>
Co-authored-by: jamesthesnake <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: cmvcordova <[email protected]>
Co-authored-by: cmvcordova <[email protected]>
  • Loading branch information
11 people authored Mar 7, 2023
1 parent 73cf225 commit da6599f
Show file tree
Hide file tree
Showing 57 changed files with 3,400 additions and 0 deletions.
22 changes: 22 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## What does this PR do?

<!-- Please include a summary of what the PR aims to do -->

Fixes `#<issue_number>`

### Checklist

- [ ] This change is discussed in a Github **issue/discussion** (`#<link>`).
- [ ] Did you **document** your changes? (if necessary)
- [ ] Have you **documented (doc strings)** your code ? (if necessary)
- [ ] Have you added type hints to your code ? (if necessary)
- [ ] Did you write any **new necessary high-coverage tests**?
- [ ] Did you verify new and **existing tests pass** locally with your changes?

### PR Review

<!-- Tag Relevant People here -->

Reviewers:

### Any Other Comments
36 changes: 36 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: "Continuous Integration & Testing"

on:
push:
branches:
- main
- codebase
pull_request:
branches:
- main
- codebase

jobs:
build:
runs-on: ${{ matrix.os }}

strategy:
matrix:
python-version: ["3.8", "3.9", "3.10"]
os: [ubuntu-latest, windows-latest, macos-latest]

steps:
- uses: actions/checkout@v3
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: "pip"

- name: Install dependencies
run: |
python -m pip install --upgrade pip wheel setuptools
python -m pip install -r requirements.txt
- name: Test with PyTest
run: |
python -m pytest -v .
156 changes: 156 additions & 0 deletions .idea/workspace.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[tool.pylint.messages_control]
disable = [
"no-member",
"too-many-arguments",
"too-few-public-methods",
"no-else-return",
"duplicate-code",
"too-many-branches",
"redefined-builtin",
"dangerous-default-value",
]

[tool.pylint.format]
max-line-length = 88

[tool.black]
line-length = 88

[[tool.mypy.overrides]]
ignore_missing_imports = true

[tool.pytest.ini_options]
testpaths = ["tests"]
filterwarnings = [
"ignore::DeprecationWarning",
"ignore::UserWarning"
]
8 changes: 8 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
black==22.10.0
mypy==0.982
pandas==1.3.5
pylint==2.15.4
pytest==7.1.3
pytorch-lightning==1.7.0
torch==1.11.0
torchvision==0.12.0
96 changes: 96 additions & 0 deletions src/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
## Config Structure

Current hypothetical config folder structure is as follows:

```
├── configs
├── callbacks
├── default.yaml
├── dataset
├── sequence.yaml
├── logger
├── wandb.yaml
├── model
├── unet.yaml
├── unet_conditional.yaml
├── unet_bitdiffusion.yaml
├── paths
├── default.yaml
├── train.yaml
```

As new items (models, datasets, etc.) are added, a corresponding config file can be included so that minimal parameter altering is needed across various experiments

## How to Run

Below contains the main training config file that can be altered to fit any training alterations that are desired.
Every parameter listed under defaults is defined within a config listed above.

<details>
<summary><b>Training config</b></summary>

```yaml
defaults:
- model: unet_conditional
- dataset: sequence
- logger: wandb
- callbacks: default

ckpt: null # path to checkpoint
seed: 42
batch_size: 32
devices: gpu
benchmark: True
ckpt_dir: # path still to be defined
accelerator: gpu
strategy: ddp
min_epochs: 5
max_epochs: 100000
gradient_clip_val: 1.0
accumulate_grad_batches: 1
log_every_n_steps: 1
check_val_every_n_epoch: 1 #for debug purposes
save_last: True
precision: 32
```
</details>
### Using hydra config in a Jupyter Notebook
Including the following at the beginning of a jupyter notebook will initialize hydra, load defined training config, and then print it.
```python
from hydra import compose, initialize
from omegaconf import OmegaConf

initialize(version_base=None, config_path="./src/configs")
cfg = compose(config_name="train")
print(OmegaConf.to_yaml(cfg))
```

When initializing hydra it is possible to override any of the default assignments.
Here is an example of overriding batch_size and seed while initializing hydra:

```python
from hydra import compose, initialize
from omegaconf import OmegaConf

initialize(version_base=None, config_path="./src/configs")
cfg = compose(overrides=["batch_size=64", "seed=1"])
print(OmegaConf.to_yaml(cfg))
```

The following link to hydra documentation provides more information on override syntax: <br/>
https://hydra.cc/docs/advanced/override_grammar/basic/ <br/>

For more information regarding hydra initialization in jupyter see the following link:
https://github.com/facebookresearch/hydra/blob/main/examples/jupyter_notebooks/compose_configs_in_notebook.ipynb

## Still To Do:

- Alter training script to accommodate all logs we wish to track using wandb
- Decide on default hyperparameters in train.yaml
- Further alter config folder structure to best suit our training and testing practices
- Define default paths for dataset within path config file so that directory can be referenced across various other configs
- Hydra config logs currently output in src directory, creating the following folder structure ./src/outputs/YYYY-MM-DD/MM-HH-SS. If we wish to alter this it can be done in a hydra config file.
Binary file added src/__pycache__/config.cpython-39.pyc
Binary file not shown.
22 changes: 22 additions & 0 deletions src/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
### file to include dataclass definition
from dataclasses import dataclass
from hydra.core.config_store import ConfigStore

### needs overhaul with new folder structure
### ignore for now
@dataclass
class DNADiffusionConfig:
defaults:
- _self_
- optimizer: adam
- lr_scheduler: MultiStepLR
- unet: unet_conditional

_target_: str = "__main__.trgt" # dotpath describing location of callable
timesteps: 200
use_fp16: True
criterion: torch.nn.MSELoss #utils.metrics.MetricName
use_ema: True
ema_decay: float = 0.999
lr_warmup: 5000
image_size: 200
13 changes: 13 additions & 0 deletions src/configs/callbacks/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
save_checkpoint:
target: pytorch_lightning.callbacks.ModelCheckpoint
params:
path: # to be entered
monitor: val_loss
mode: min
save_top_k: 10
save_last: True

learning_rate:
target: pytorch_lightning.callbacks.LearningRateMonitor
params:
logging_interval: epoch
Loading

0 comments on commit da6599f

Please sign in to comment.