Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masked autoencoder pre-training for virtual staining models #67

Merged
merged 56 commits into from
Apr 8, 2024
Merged

Conversation

ziw-liu
Copy link
Collaborator

@ziw-liu ziw-liu commented Feb 8, 2024

No description provided.

@edyoshikun edyoshikun self-requested a review February 24, 2024 19:04
@ziw-liu ziw-liu added the enhancement New feature or request label Mar 28, 2024
@ziw-liu ziw-liu marked this pull request as ready for review March 28, 2024 23:13
Copy link
Contributor

@edyoshikun edyoshikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch has worked well for both 2.2D and 3D LUNeXT training/test so I am happy with it.

  • Combined and Concatenated data loader work
  • Augmentations show no issue

@mattersoflight
Copy link
Member

mattersoflight commented Apr 8, 2024

@ziw-liu Looks like this is now the de facto branch for both training recipes: end-to-end training and pre-training + fine-tuning. Can you confirm if that is the case? Please merge in main if the merge won't break the current experiments.

Combined and concatenated data loaders are also valuable for infection phenotyping work. cc: @Soorya19Pradeep

@ziw-liu
Copy link
Collaborator Author

ziw-liu commented Apr 8, 2024

Looks like this is now the de facto branch for both training recipes: end-to-end training and pre-training + fine-tuning. Can you confirm if that is the case? Please merge in main if the merge won't break the current experiments.

This is the case. However note that I didn't do comprehensive backwards compatibility testing so it could break previously trained models.

@ziw-liu ziw-liu merged commit 0536d29 into main Apr 8, 2024
3 checks passed
@ziw-liu ziw-liu deleted the fcmae branch April 8, 2024 16:22
@mattersoflight
Copy link
Member

mattersoflight commented Apr 8, 2024

it could break previously trained models.

👍🏼 Since we are tracking the key hyper-parameters with configs, we can retrain models we need.

edyoshikun added a commit that referenced this pull request Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* removing normalize_source from configs.

* typing fixes

* fix test data path

* fix test dataset

* add docstring for ConcatDataModule

* format

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun added a commit that referenced this pull request Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* removing normalize_source from configs.

* typing fixes

* fix test data path

* fix test dataset

* add docstring for ConcatDataModule

* format

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
edyoshikun added a commit that referenced this pull request Jun 12, 2024
* refactor data loading into its own module

* update type annotations

* move the logging module out

* move old logging into utils

* rename tests to match module name

* bump torch

* draft fcmae encoder

* add stem to the encoder

* wip: masked stem layernorm

* wip: patchify masked features for linear

* use mlp from timm

* hack: POC training script for FCMAE

* fix mask for fitting

* remove training script

* default architecture

* fine-tuning options

* fix cli for finetuning

* draft combined data module

* fix import

* manual validation loss reduction

* update linting
new black version has different rules

* update development guide

* update type hints

* bump iohub

* draft ctmc v1 dataset

* update tests

* move test_data

* remove path conversion

* configurable normalizations (#68)

* inital commit adding the normalization.

* adding dataset_statistics to each fov to facilitate the configurable augmentations

* fix indentation

* ruff

* test preprocessing

* remove redundant field

* cleanup

---------

Co-authored-by: Ziwen Liu <[email protected]>

* fix ctmc dataloading

* add example ctmc v1 loading script

* changing the normalization and augmentations default from None to empty list.

* invert intensity transform

* concatenated data module

* subsample videos

* livecell dataset

* all sample fields are optional

* fix multi-dataloader validation

* lint

* fixing preprocessing for varying array shapes (i.e aics dataset)

* update loading scripts

* fix CombineMode

* compose normalizations for predict and test stages

* black

* fix normalization in example config

* fix collate when multi-sample transform is not used

* ddp caching fixes

* fix caching when using combined loader

* move log values to GPU before syncing
Lightning-AI/pytorch-lightning#18803

* removing normalize_source from configs.

* typing fixes

* fix test data path

* fix test dataset

* add docstring for ConcatDataModule

* format

---------

Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Variable input size training and data pooling
3 participants