-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
) * add AudioDiffusionPipeline and LatentAudioDiffusionPipeline * add docs to toc * fix tests * fix tests * fix tests * fix tests * fix tests * Update pr_tests.yml Fix tests * parent 499ff34 author teticio <[email protected]> 1668765652 +0000 committer teticio <[email protected]> 1669041721 +0000 parent 499ff34 author teticio <[email protected]> 1668765652 +0000 committer teticio <[email protected]> 1669041704 +0000 add colab notebook [Flax] Fix loading scheduler from subfolder (#1319) [FLAX] Fix loading scheduler from subfolder Fix/Enable all schedulers for in-painting (#1331) * inpaint fix k lms * onnox as well * up Correct path to schedlure (#1322) * [Examples] Correct path * uP Avoid nested fix-copies (#1332) * Avoid nested `# Copied from` statements during `make fix-copies` * style Fix img2img speed with LMS-Discrete Scheduler (#896) Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the `integrate.quad` call later on- by long I mean more than 10x slower. Co-authored-by: Anton Lozhkov <[email protected]> Fix the order of casts for onnx inpainting (#1338) Legacy Inpainting Pipeline for Onnx Models (#1237) * Add legacy inpainting pipeline compatibility for onnx * remove commented out line * Add onnx legacy inpainting test * Fix slow decorators * pep8 styling * isort styling * dummy object * ordering consistency * style * docstring styles * Refactor common prompt encoding pattern * Update tests to permanent repository home * support all available schedulers until ONNX IO binding is available Co-authored-by: Anton Lozhkov <[email protected]> * updated styling from PR suggested feedback Co-authored-by: Anton Lozhkov <[email protected]> Jax infer support negative prompt (#1337) * support negative prompts in sd jax pipeline * pass batched neg_prompt * only encode when negative prompt is None Co-authored-by: Juan Acevedo <[email protected]> Update README.md: Minor change to Imagic code snippet, missing dir error (#1347) Minor change to Imagic Readme Missing dir causes an error when running the example code. make style change the sample model (#1352) * Update alt_diffusion.mdx * Update alt_diffusion.mdx Add bit diffusion [WIP] (#971) * Create bit_diffusion.py Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG * adding bit diffusion to new branch ran tests * tests * tests * tests * tests * removed test folders + added to README * Update README.md Co-authored-by: Patrick von Platen <[email protected]> * move Mel to module in pipeline construction, make librosa optional * fix imports * fix copy & paste error in comment * fix style * add missing register_to_config * fix class docstrings * fix class docstrings * tweak docstrings * tweak docstrings * update slow test * put trailing commas back * respect alphabetical order * remove LatentAudioDiffusion, make vqvae optional * move Mel from models back to pipelines :-) * allow loading of pretrained audiodiffusion models * fix tests * fix dummies * remove reference to latent_audio_diffusion in docs * unused import * inherit from SchedulerMixin to make loadable * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]>
- Loading branch information
1 parent
459b8ca
commit 48d0123
Showing
25 changed files
with
781 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -165,4 +165,4 @@ tags | |
# DS_Store (MacOS) | ||
.DS_Store | ||
# RL pipelines may produce mp4 outputs | ||
*.mp4 | ||
*.mp4 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
<!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
--> | ||
|
||
# Audio Diffusion | ||
|
||
## Overview | ||
|
||
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith. | ||
|
||
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to | ||
and from mel spectrogram images. | ||
|
||
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including | ||
training scripts and example notebooks. | ||
|
||
## Available Pipelines: | ||
|
||
| Pipeline | Tasks | Colab | ||
|---|---|:---:| | ||
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) | | ||
|
||
|
||
## Examples: | ||
|
||
### Audio Diffusion | ||
|
||
```python | ||
import torch | ||
from IPython.display import Audio | ||
from diffusers import DiffusionPipeline | ||
|
||
device = "cuda" if torch.cuda.is_available() else "cpu" | ||
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device) | ||
|
||
output = pipe() | ||
display(output.images[0]) | ||
display(Audio(output.audios[0], rate=mel.get_sample_rate())) | ||
``` | ||
|
||
### Latent Audio Diffusion | ||
|
||
```python | ||
import torch | ||
from IPython.display import Audio | ||
from diffusers import DiffusionPipeline | ||
|
||
device = "cuda" if torch.cuda.is_available() else "cpu" | ||
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device) | ||
|
||
output = pipe() | ||
display(output.images[0]) | ||
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) | ||
``` | ||
|
||
### Audio Diffusion with DDIM (faster) | ||
|
||
```python | ||
import torch | ||
from IPython.display import Audio | ||
from diffusers import DiffusionPipeline | ||
|
||
device = "cuda" if torch.cuda.is_available() else "cpu" | ||
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device) | ||
|
||
output = pipe() | ||
display(output.images[0]) | ||
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) | ||
``` | ||
|
||
### Variations, in-painting, out-painting etc. | ||
|
||
```python | ||
output = pipe( | ||
raw_audio=output.audios[0, 0], | ||
start_step=int(pipe.get_default_steps() / 2), | ||
mask_start_secs=1, | ||
mask_end_secs=1, | ||
) | ||
display(output.images[0]) | ||
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) | ||
``` | ||
|
||
## AudioDiffusionPipeline | ||
[[autodoc]] AudioDiffusionPipeline | ||
- __call__ | ||
- encode | ||
- slerp | ||
|
||
|
||
## Mel | ||
[[autodoc]] Mel | ||
- audio_slice_to_image | ||
- image_to_audio |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# flake8: noqa | ||
from .mel import Mel | ||
from .pipeline_audio_diffusion import AudioDiffusionPipeline |
Oops, something went wrong.