Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
merge codebase changes on main into default (#92)
* wip: dataloader first draf * Fixing train, val, and test path * Added initial project structure Added a bunch of directories with (mostly) empty/dummy .py files for now, so that everyone can see what the project will be structured like. On top of the present directories, there will also be a datasets and a logs directory, the latter being dynamically created at traintime or validation time. * rename file, remove one-hot encode * Revert "wip: dataloader first draft" * Updating component loading section * sequence dataloader baseline model * fixing a couple typos * Delete src/metrics directory Deleting metrics directory as it was decided we'll have only one file with all metrics. * Added refactored DDPM and UNet from notebook V2 Refactored Lucas's DDPM, UNet and units and added them as PL modules. * Update diffusion.py Added "instantiate_from_config" import. * Update ddpm.py Added nucleotides as a parameter with a default of 4 to the sample method. * wip: separate train/val/test subclasses * Delete codebase/src/data directory * Updated PL dataloader * placeholder test file * Update unet_lucas.py Added default function import. * Added matching dummy test files * complete: initial dataloader * Added config template Designed config template mainly for PL-related parameters. Keeping multiprocessing arguments for multi-GPU for the first test, which we'll change to multi-node. Diffusion and UNet parameters can easily vary. * Delete dummy_config.yaml * delete test_diffusion * fix: fixed function naming convention * feat: Add initial CI proposal * feat: Add a simple pyproject config file * wip: train.py + configs * config folder structure update * fix datapath param of datasets * add additional sequence encoding schemes + separate transforms * add tests for sequence dataloader * add additional asserts for data batches * check sequence lengths in datasets * add more tests for invalid data * style: run black * feat: Refactor schedules and remove time_difference * feat: Add type hints to schedule utility functions * feat: Refactor noise schedule fn * feat: refactor q_sample fn * feat: add type hints to q_sample * feat: drop bit_scale * feat: run black and switch to torch.log * feat: drop t_index * feat: refactor p_sample fn * feat: refactor p_sample_loop fn * feat: refactor sample fn * feat: refactor training_step fn * feat(ci): Add `codebase` branch to CI Based on discussion with @mateibejan1, running the tests on the `codebase` branch is also essential. It's the branch which is under heavy development and we should ensure all tests pass before we merge into `codebase` as well. * reqs: add `pandas` to requirements.txt * reqs: add `torch` to requirements.txt * reqs: bump torch to `1.11.0` for compatibility * fix(ci): run pytest as a module * reqs: add torchvision to `0.12.0` * reqs: add `pytorch-lightning` * fix: failing CI tests for dataloader across platforms * fix: failing CI tests for dataloader - wrap transforms * fix: failing CI tests for dataloader - no multiprocessing for transforms * Add Lucas' conditioned UNet * Update EMA with Lucas' version * Added mean_flat util from P2 paper * Added P2 weighting skeleton. Need to figure out how to use P2 weighting on DNA data. * misc: create a PR template Fixes #51 * misc: add doc strings and type hints to the PR template cc: @mateibejan1 * Add files via upload * Add files via upload * Add files via upload Updated DDPM with the Noah's refactored notebook version. Preemptively added p2_weighting, need to figure out if/how it works on bit sequences. * Add files via upload * Add files via upload * style: run black * feat: add type hints to `utils/misc.py` * feat: add type hints to utils/metrics * feat: add type hints to utils/schedules * feat: add type hints to unet_bitdiffusion * feat: add type hints to unet_lucas * feat: add type hints to ddim * feat: add type hints to seq dataloader * feat: add type hints to unet_lucas_cond * Delete ddim.py Deprecated. * Delete unet_bitdiffusion.py Deprecated. * Update unet_conditional.yaml Changed default number of timesteps from 1000 to 200. * Update unet_conditional.yaml Moved unet_config params inside the diffusion models params, so it mirrors the hierarchical relationship between the diffusion class and the unet class. * Update misc.py Minor dict property name changes. * Update diffusion.py * Update diffusion.py * Update default.yaml * Update unet_lucas.py * initial test lucas unet * add test vq * ddm * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge codebase-hydra-restructure into main (#90) * WIP new folder structure * ema parameter fix * Base dataloader instantiation with full hydraconfig succesful, missing full params * Update sequence_dataloader.py * Remove outputs folder, update .gitignore * Update network.py * Update sequence_datamodule.py * Update sequence_datamodule.py --------- Co-authored-by: cmvcordova <[email protected]> Co-authored-by: cmvcordova <[email protected]> Co-authored-by: Matei Bejan <[email protected]> --------- Co-authored-by: ssenan <[email protected]> Co-authored-by: Matei Bejan <[email protected]> Co-authored-by: Bendidi Ihab <[email protected]> Co-authored-by: Saurav Maheshkar <[email protected]> Co-authored-by: Jan Sobotka <[email protected]> Co-authored-by: ceziegler <[email protected]> Co-authored-by: jamesthesnake <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: cmvcordova <[email protected]> Co-authored-by: cmvcordova <[email protected]>
- Loading branch information