Skip to content

Commit

Permalink
Update release announcements (#471)
Browse files Browse the repository at this point in the history
Smaller fixes for the announcements page.
  • Loading branch information
mrchtr authored Sep 28, 2023
1 parent 8f8deb9 commit 56d04f3
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 11 deletions.
26 changes: 16 additions & 10 deletions docs/announcements/CC_25M_community.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
# 25 million Creative Commons image dataset released

[Fondant](https://fondant.ai) is an open-source project that aims to simplify and speed up
large-scale data processing by making containerized components reusable across pipelines &
[Fondant](https://fondant.ai) is an open-source project that aims to simplify and speed up
large-scale data processing by making containerized components reusable across pipelines &
execution environments, shared within the community.

A current challenge for generative AI is compliance with copyright laws. For this reason,
Fondant has developed a data-processing pipeline to create a 500-million dataset of Creative
A current challenge for generative AI is compliance with copyright laws. For this reason,
Fondant has developed a data-processing pipeline to create a 500-million dataset of Creative
Commons images to train a latent diffusion image generation model that respects copyright. Today,
as a first step, we are releasing a 25-million sample dataset and invite the open source
as a first step, we are releasing a 25-million sample dataset and invite the open source
community to collaborate on further refinement steps.

Fondant offers tools to download, explore and process the data. The current example pipeline
includes a component for downloading the urls, a simple file type filter, one for downloading
the images and one for deduplicating the urls. Additional processing components which could be
Fondant offers tools to download, explore and process the data. The current example pipeline
includes a component for downloading the urls and one for downloading the images.

Creating custom pipelines for specific purposes requires different building blocks. Fondant
pipelines can mix reusable components and custom components.

![sample_pipeline](https://github.com/ml6team/fondant/blob/main/docs/art/announcements/sample_pipeline_cc25.png?raw=true)

Additional processing components which could be
contributed include, in order of priority:

* Image-based deduplication
Expand All @@ -25,6 +31,6 @@ contributed include, in order of priority:
* AI generated image detection
* Any components that you propose to develop

The Fondant team also invites contributors to the core framework and is looking for feedback on
the framework’s usability and for suggestions for improvement. Contact us at
The Fondant team also invites contributors to the core framework and is looking for feedback on
the framework’s usability and for suggestions for improvement. Contact us at
[[email protected]](mailto:[email protected]) and/or join our [discord](https://discord.gg/HnTdWhydGp).
Binary file added docs/art/announcements/sample_pipeline_cc25.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/art/guides/component.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/overrides/main.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{% block announce %}
<p style="text-align: center">
We released a 25 million Creative Commons image dataset!
<a href="announcements/CC_25M_community/"
<a href="https://fondant.ai/en/latest/announcements/CC_25M_community/"
style="color: white; text-decoration: underline">Read more</a>
</p>
{% endblock %}

0 comments on commit 56d04f3

Please sign in to comment.