-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Smaller fixes for the announcements page.
- Loading branch information
Showing
4 changed files
with
17 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,24 @@ | ||
# 25 million Creative Commons image dataset released | ||
|
||
[Fondant](https://fondant.ai) is an open-source project that aims to simplify and speed up | ||
large-scale data processing by making containerized components reusable across pipelines & | ||
[Fondant](https://fondant.ai) is an open-source project that aims to simplify and speed up | ||
large-scale data processing by making containerized components reusable across pipelines & | ||
execution environments, shared within the community. | ||
|
||
A current challenge for generative AI is compliance with copyright laws. For this reason, | ||
Fondant has developed a data-processing pipeline to create a 500-million dataset of Creative | ||
A current challenge for generative AI is compliance with copyright laws. For this reason, | ||
Fondant has developed a data-processing pipeline to create a 500-million dataset of Creative | ||
Commons images to train a latent diffusion image generation model that respects copyright. Today, | ||
as a first step, we are releasing a 25-million sample dataset and invite the open source | ||
as a first step, we are releasing a 25-million sample dataset and invite the open source | ||
community to collaborate on further refinement steps. | ||
|
||
Fondant offers tools to download, explore and process the data. The current example pipeline | ||
includes a component for downloading the urls, a simple file type filter, one for downloading | ||
the images and one for deduplicating the urls. Additional processing components which could be | ||
Fondant offers tools to download, explore and process the data. The current example pipeline | ||
includes a component for downloading the urls and one for downloading the images. | ||
|
||
Creating custom pipelines for specific purposes requires different building blocks. Fondant | ||
pipelines can mix reusable components and custom components. | ||
|
||
![sample_pipeline](https://github.com/ml6team/fondant/blob/main/docs/art/announcements/sample_pipeline_cc25.png?raw=true) | ||
|
||
Additional processing components which could be | ||
contributed include, in order of priority: | ||
|
||
* Image-based deduplication | ||
|
@@ -25,6 +31,6 @@ contributed include, in order of priority: | |
* AI generated image detection | ||
* Any components that you propose to develop | ||
|
||
The Fondant team also invites contributors to the core framework and is looking for feedback on | ||
the framework’s usability and for suggestions for improvement. Contact us at | ||
The Fondant team also invites contributors to the core framework and is looking for feedback on | ||
the framework’s usability and for suggestions for improvement. Contact us at | ||
[[email protected]](mailto:[email protected]) and/or join our [discord](https://discord.gg/HnTdWhydGp). |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters