Releases: hpcaitech/Open-Sora
Open-Sora v1.2.0 Released Today!
Today, we are happy to announce the v1.2 release of the Open-Sora project. This release is built upon v1.1 and features extra exciting components:
- 📦 Video compression network
- 🔀 Rectifie-flow training
- 🪅 More data and better multi-stage training
- 🚀 Easy and effective model conditioning
- 👍 Better evaluation metrics
If you want to know more about this project, you can
- read our technical report
- view our gallery
- try our checkpoints and gradio demo
Open-Sora V1.1.0 Release
📍 Open-Sora 1.1 released
- 🌠 Model weights are available here. It is trained on 0s~15s, 144p to 720p, various aspect ratios videos. See our report 1.1 for more discussions.
- 🔧 Data processing pipeline v1.1 is released. An automatic processing pipeline from raw videos to (text, video clip) pairs is provided, including scene cutting, filtering(aesthetic, optical flow, OCR, etc.), captioning managing. With this tool, you can easily build your video dataset.
- ✅ Improved ST-DiT architecture includes rope positional encoding, qk norm, longer text length, etc.
- ✅ Support training with any resolution, aspect ratio, and duration (including images).
- ✅ Support image and video conditioning and video editing, and thus support animating images, connecting videos, etc.
Visit the Open-Sora Gallery to view more samples.
Open-Sora V1.0.0 Release
We present Open-Sora, an initiative dedicated to efficiently produce high-quality video and make the model, tools and contents accessible to all. By embracing open-source principles, Open-Sora not only democratizes access to advanced video generation techniques, but also offers a streamlined and user-friendly platform that simplifies the complexities of video production. With Open-Sora, we aim to inspire innovation, creativity, and inclusivity in the realm of content creation.
In this release, we have included
- source code for data processing
- source code for sora-like video generation training and inference pipeline
- system optimization powered by ColossalAI and flash attention
- model weights trained for 16×256×256 and 16×512x512 resolutions, available on Hugging Face