Training a VQ-VAE for DNA-sequences for stable diffusion #16

lucapinello · 2022-10-15T12:59:44Z

Current notebook here: https://github.com/pinellolab/DNA-Diffusion/blob/latent-space-representation/vq_vae_diffusion.ipynb

lucapinello · 2022-10-15T13:14:08Z

@sg134 can you write here where you are with this and how people can contribute/help you?

mihirneal · 2022-10-15T15:17:04Z

@lucapinello Would love to work on this however I don't understand why do we need to work with VQ-VAE. Shouldn't we directly prototype with DDPMs?

lucapinello · 2022-10-16T12:35:01Z

The idea is to derive a good embedding for DNA-sequences so we can explore later stable diffusion. Right now we are diffusing directly on the one-hot-encoding of the DNA sequences.

sg134 · 2022-10-17T01:04:56Z

Hi, just saw this (sorry). As Luca mentioned, we hope to represent the DNA sequences in a smaller latent space and pursue latent diffusion. To that end, if you have any other model suggestions to encode the sequences into a representation (another VAE variant for example), feel free to suggest to suggest and implement them -- we don't necessarily know if VQ-VAE would be the best model for this dataset. I started with this model because it was used in the DALL-E paper. Currently some of the next steps planned for the VQ-VAE:

Modify the architecture to improve the reconstruction accuracy of nucleotides in the dataset
This is the big one that I've been stuck on: "interpreting" the codebook embeddings. Does the information in the codebook confer any information regarding TF binding motifs or key features differentiating between binding patterns across cell-types??
Down the line, we'd also probably need to clean up the code and modify it a bit so that it's easy to combine this code with the diffusion code into a unified pipeline.

@lucapinello @LucasSilvaFerreira Is there anything else to include or clarify?

mihirneal · 2022-10-18T04:47:01Z

gotcha. I'd like to work on this issue. Can you assign it to me?

LucasSilvaFerreira · 2022-10-18T05:28:50Z

@mihirneal and @sg134 I would recommend that you guys create a subgroup to explore it together. @sg134 already has some code, and it would be nice if he can guide you through it. I think it will be nice to have a (latent) stable diffusion model working on these sequences.

mateibejan1 · 2022-10-18T20:12:06Z

@sg134 let me know if I can help with the VQVAE

sg134 · 2022-10-19T01:20:03Z

Hi @mihirneal & @mateibejan1, is it possible that we can allocate a few minutes during the sprint meeting to discuss the VQ-VAE code and next steps for others who are interested as well?

mihirneal · 2022-10-19T02:08:08Z

Yeah, that’s what I had in mind as well.

mateibejan1 · 2022-10-19T07:34:12Z

Sure, I'll devise a meeting planning. We'll start with a retrospective about what has been done in sprint 1, then talk current tasks and finally what we'll do. Sounds good for you schedule @sg134 ?

noahweber1 · 2022-11-28T13:53:50Z

@sg134 please contact me when you see this.

[email protected]

thanks

sg134 · 2022-11-30T05:14:42Z

@noahweber1 messaged you on Discord.

noahweber1 · 2022-11-30T12:28:35Z

Summary of what we agreed upon and what are the next steps:

I take over for couple of weeks until Sameer comes and we close the task off.
I perform refactoring cleaning
Any improvements in accuracy I can squeeze out
Any adjustments in architecture
Explainability of the inference, i.e. that the latent representations actually makes sense when inspecting manually.

github-actions · 2023-03-03T02:20:14Z

This issue is stale because it has been open for 60 days with no activity.

github-actions · 2023-03-11T01:55:53Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

lucapinello assigned sg134 Oct 15, 2022

lucapinello changed the title ~~Training a VQ-VAE for DNA-sequences for stable diffusion~~ NOTEBOOK-PROTOTYPING:Training a VQ-VAE for DNA-sequences for stable diffusion Oct 15, 2022

lucapinello changed the title ~~NOTEBOOK-PROTOTYPING:Training a VQ-VAE for DNA-sequences for stable diffusion~~ Training a VQ-VAE for DNA-sequences for stable diffusion Oct 15, 2022

LucasSilvaFerreira assigned mihirneal Oct 18, 2022

github-actions bot added the stale label Mar 3, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 11, 2023

github-project-automation bot moved this from Notebook Prototyping to Done in DNA-diffusion Mar 11, 2023

cameronraysmith moved this from Done to Archive in DNA-diffusion Mar 11, 2023

cameronraysmith reopened this Mar 12, 2023

cameronraysmith added this to the 0.0.1 milestone Mar 12, 2023

cameronraysmith linked a pull request Mar 12, 2023 that will close this issue

added vq_vae_accelerate notebook and trained vq_vae model model4cells #101

Merged

github-actions bot removed the stale label Mar 13, 2023

cameronraysmith closed this as completed in #101 Mar 21, 2023

github-project-automation bot moved this from Archive to Done in DNA-diffusion Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training a VQ-VAE for DNA-sequences for stable diffusion #16

Training a VQ-VAE for DNA-sequences for stable diffusion #16

lucapinello commented Oct 15, 2022 •

edited

Loading

lucapinello commented Oct 15, 2022

mihirneal commented Oct 15, 2022

lucapinello commented Oct 16, 2022

sg134 commented Oct 17, 2022 •

edited

Loading

mihirneal commented Oct 18, 2022

LucasSilvaFerreira commented Oct 18, 2022

mateibejan1 commented Oct 18, 2022

sg134 commented Oct 19, 2022

mihirneal commented Oct 19, 2022

mateibejan1 commented Oct 19, 2022

noahweber1 commented Nov 28, 2022

sg134 commented Nov 30, 2022

noahweber1 commented Nov 30, 2022

github-actions bot commented Mar 3, 2023

github-actions bot commented Mar 11, 2023

Training a VQ-VAE for DNA-sequences for stable diffusion #16

Training a VQ-VAE for DNA-sequences for stable diffusion #16

Comments

lucapinello commented Oct 15, 2022 • edited Loading

lucapinello commented Oct 15, 2022

mihirneal commented Oct 15, 2022

lucapinello commented Oct 16, 2022

sg134 commented Oct 17, 2022 • edited Loading

mihirneal commented Oct 18, 2022

LucasSilvaFerreira commented Oct 18, 2022

mateibejan1 commented Oct 18, 2022

sg134 commented Oct 19, 2022

mihirneal commented Oct 19, 2022

mateibejan1 commented Oct 19, 2022

noahweber1 commented Nov 28, 2022

sg134 commented Nov 30, 2022

noahweber1 commented Nov 30, 2022

github-actions bot commented Mar 3, 2023

github-actions bot commented Mar 11, 2023

lucapinello commented Oct 15, 2022 •

edited

Loading

sg134 commented Oct 17, 2022 •

edited

Loading