Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added vq_vae_accelerate notebook and trained vq_vae model model4cells #101

Merged
merged 1 commit into from
Mar 21, 2023

Conversation

ttunja
Copy link
Collaborator

@ttunja ttunja commented Mar 12, 2023

This pull request merges vq_vae into accelerate notebook, so that we could try to run diffusion with multiple GPUs in a latent space. Additionally some functions were updated and vg_vae model was added in the data section of dnadiffusion directory

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cameronraysmith cameronraysmith added the model modifies model code in the main package label Mar 12, 2023
@cameronraysmith cameronraysmith added this to the 0.0.1 milestone Mar 12, 2023
@cameronraysmith cameronraysmith linked an issue Mar 12, 2023 that may be closed by this pull request
Copy link
Collaborator

@cameronraysmith cameronraysmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ttunja !

Do we need to keep https://github.com/pinellolab/DNA-Diffusion/blob/a407f04ac8e7fed8efe46ad571f0107e7267b886/dnadiffusion/data/model4cells_train_split_3_50_dims.pkl in the git repo?
Is it able to be regenerated from code that is currently in the repository?
Does this take a very long time?

The dnadiffusion folder will be moving to src and contains the code of a python package. We would not plan to store binary files such as model checkpoints there in the long-term. We'll be setting up a system to keep data and model artifacts in s3 soon.

@LucasSilvaFerreira
Copy link
Collaborator

@ssenan What is the best way to adapt the code for our new code (instead using a notebook)?

@ssenan
Copy link
Collaborator

ssenan commented Mar 14, 2023

@LucasSilvaFerreira I have a couple PRs coming this week that updates the whole codebase to be used with pytorch lightning / hydra-zen. From there we can create a couple new scripts/configs that capture VectorQuantizer, VectorQuantizerEMA, and the encoder/decoder model.

I think that this will make it easier to compare how this is performing relative to the current model, and also makes it easier to update the architecture if we need to. This is probably good to merge in after the changes @cameronraysmith suggested are implemented, as it's probably one of the last notebooks floating around.

@cameronraysmith
Copy link
Collaborator

@ttunja we can go ahead and merge this since the model is ultimately a small file.
We should plan to address in #106. Then we can remove it from the repository.

@cameronraysmith cameronraysmith self-requested a review March 21, 2023 22:34
Copy link
Collaborator

@cameronraysmith cameronraysmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will follow-up with #106 .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #106

@cameronraysmith cameronraysmith merged commit 2a5e561 into main Mar 21, 2023
@cameronraysmith cameronraysmith deleted the latent-representation branch March 21, 2023 22:35
@ttunja
Copy link
Collaborator Author

ttunja commented Mar 23, 2023

@cameronraysmith It took around 2-3h to train vq_vae (if I remember correctly, since it was done by @noahweber1). We will follow-up in #106

@ssenan are the new things that are coming all notebooks or python scripts? When can we start writing scripts and not notebooks? (codebase update)
@LucasSilvaFerreira why can't we compare everything in this notebook, that was the reason I implemented it in your accelerate notebook, otherwise I would have used my code.

@noahweber1
Copy link
Collaborator

@cameronraysmith I already merged a notebook that trains a VQ_VAE for this making it a full stable diffusion:

https://github.com/pinellolab/DNA-Diffusion/blob/main/notebooks/experiments/conditional_diffusion/VQ_VAE_LATENT_SPACE_WITH_METRICS.ipynb

@ttunja

@cameronraysmith
Copy link
Collaborator

Many thanks @ttunja @noahweber1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model modifies model code in the main package
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Training a VQ-VAE for DNA-sequences for stable diffusion
5 participants