From a6455417c6655357eedd4bfa66f37ed1f7dc7a89 Mon Sep 17 00:00:00 2001
From: Robbe Sneyders <robbe.sneyders@ml6.eu>
Date: Wed, 27 Sep 2023 15:29:38 +0200
Subject: [PATCH 1/3] Add cc-25m announcement to docs

---
 docs/announcements/CC_25M_community.md     | 30 ++++++++++++++++++++++
 docs/announcements/CC_25M_press_release.md | 28 ++++++++++++++++++++
 docs/{ => components}/component_spec.md    |  0
 docs/{ => components}/components.md        |  0
 docs/{ => components}/custom_component.md  |  0
 docs/{ => components}/generic_component.md |  0
 docs/overrides/main.html                   |  9 +++++++
 docs/stylesheets/extra.css                 |  5 ++++
 mkdocs.yml                                 | 13 +++++++---
 9 files changed, 81 insertions(+), 4 deletions(-)
 create mode 100644 docs/announcements/CC_25M_community.md
 create mode 100644 docs/announcements/CC_25M_press_release.md
 rename docs/{ => components}/component_spec.md (100%)
 rename docs/{ => components}/components.md (100%)
 rename docs/{ => components}/custom_component.md (100%)
 rename docs/{ => components}/generic_component.md (100%)
 create mode 100644 docs/overrides/main.html
 create mode 100644 docs/stylesheets/extra.css

diff --git a/docs/announcements/CC_25M_community.md b/docs/announcements/CC_25M_community.md
new file mode 100644
index 000000000..fd0aab41f
--- /dev/null
+++ b/docs/announcements/CC_25M_community.md
@@ -0,0 +1,30 @@
+# 25 million Creative Commons image dataset released
+
+[Fondant](https://fondant.ai) is an open-source project that aims to simplify and speed up 
+large-scale data processing by making containerized components reusable across pipelines & 
+execution environments, shared within the community.
+
+A current challenge for generative AI is compliance with copyright laws. For this reason, 
+Fondant has developed a data-processing pipeline to create a 500-million dataset of Creative 
+Commons images to train a latent diffusion image generation model that respects copyright. Today,
+as a first step, we are releasing a 25-million sample dataset and invite the open source 
+community to collaborate on further refinement steps.
+
+Fondant offers tools to download, explore and process the data. The current example pipeline 
+includes a component for downloading the urls, a simple file type filter, one for downloading 
+the images and one for deduplicating the urls. Additional processing components which could be 
+contributed include, in order of priority:
+
+* Image-based deduplication
+* Visual quality / aesthetic quality estimation
+* Watermark detection
+* Not safe for work (NSFW) content detection
+* Face detection
+* Personal Identifiable Information (PII) detection
+* Text detection
+* AI generated image detection
+* Any components that you propose to develop
+
+The Fondant team also invites contributors to the core framework and is looking for feedback on 
+the framework’s usability and for suggestions for improvement. Contact us at 
+[info@fondant.ai](mailto:info@fondant.ai) and/or join our [discord](https://discord.gg/HnTdWhydGp).
\ No newline at end of file
diff --git a/docs/announcements/CC_25M_press_release.md b/docs/announcements/CC_25M_press_release.md
new file mode 100644
index 000000000..6468a6a94
--- /dev/null
+++ b/docs/announcements/CC_25M_press_release.md
@@ -0,0 +1,28 @@
+# 25 million Creative Commons image dataset released
+
+> Fondant is an open-source project that aims to enable compliant, large-scale processing in a
+> simple and cost-efficient way. As a first step, we have developed a pipeline to create a  Creative Commons image dataset and are releasing a first 25 million sample with a call to action to help develop additional data processing pipelines.
+
+[Fondant](https://fondant.ai) simplifies and speeds up large-scale data processing by making
+self-contained pipeline components reusable across pipelines, infrastructures and shareable
+within the community. By offering a library of ready-to-use, off-the-shelf components and a
+standardized way of building and combining them with custom components, it significantly reduces
+the time required to build and maintain data processing infrastructure for generative AI
+applications in production.
+
+Supported by [Flanders innovation & entrepreneurship](https://vlaio.be) and European AI Service
+Provider [ML6](https://ml6.eu), Fondant developed a pipeline to create a
+[dataset](https://huggingface.co/datasets/fondantai/fondant-cc-25m) of over 500 million Creative
+Commons-licensed images from Common Crawl to train an image-generation model that respects
+copyright. Now we are releasing a first 25 million sample dataset with tools to download,
+explore and process the data. We are inviting developers and data enthusiasts to collaborate on
+large-scale data processing pipelines by building custom components for advanced filtering and
+captioning and to contribute to the core framework. We are also looking for feedback on the
+framework’s usability with suggestions for improvement. Contact us at
+[info@fondant.ai](mailto:info@fondant.ai) and/or join our [discord](https://discord.gg/HnTdWhydGp)
+to help realize this vision.
+
+[Creative Commons](https://creativecommons.org) is a non-profit organization which provides
+licenses that allow other creators to reuse one’s work under certain conditions.
+[Common Crawl](https://commoncrawl.org) is a non-profit organization which publishes monthly
+archives of the public Internet.
diff --git a/docs/component_spec.md b/docs/components/component_spec.md
similarity index 100%
rename from docs/component_spec.md
rename to docs/components/component_spec.md
diff --git a/docs/components.md b/docs/components/components.md
similarity index 100%
rename from docs/components.md
rename to docs/components/components.md
diff --git a/docs/custom_component.md b/docs/components/custom_component.md
similarity index 100%
rename from docs/custom_component.md
rename to docs/components/custom_component.md
diff --git a/docs/generic_component.md b/docs/components/generic_component.md
similarity index 100%
rename from docs/generic_component.md
rename to docs/components/generic_component.md
diff --git a/docs/overrides/main.html b/docs/overrides/main.html
new file mode 100644
index 000000000..4ec5b52bc
--- /dev/null
+++ b/docs/overrides/main.html
@@ -0,0 +1,9 @@
+{% extends "base.html" %}
+
+{% block announce %}
+    <p style="text-align: center">
+        We released a 25 million Creative Commons image dataset!
+        <a href="/announcements/CC_25M_community/"
+           style="color: white; text-decoration: underline">Read more</a>
+    </p>
+{% endblock %}
\ No newline at end of file
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
new file mode 100644
index 000000000..861d222cc
--- /dev/null
+++ b/docs/stylesheets/extra.css
@@ -0,0 +1,5 @@
+.md-banner {
+    background-color: OrangeRed;          /* This setting prevents the Material header from
+    imposing into
+     the space of the banner! */
+}
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index 610a0635d..050b06cdf 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -23,9 +23,12 @@ theme:
       toggle:
         icon: material/brightness-4
         name: Switch to light mode
+  custom_dir: docs/overrides
   features:
     - content.code.copy
     - navigation.tracking
+extra_css:
+  - stylesheets/extra.css
 nav:
   - Home: index.md
   - Getting Started: getting_started.md
@@ -35,13 +38,15 @@ nav:
     - Implement custom components: guides/implement_custom_components.md
   - Building a pipeline: pipeline.md
   - Components:
-    - Components: components.md
-    - Creating custom components: custom_component.md
-    - Read / write components: generic_component.md
-    - Component spec: component_spec.md
+    - Components: components/components.md
+    - Creating custom components: components/custom_component.md
+    - Read / write components: components/generic_component.md
+    - Component spec: components/component_spec.md
   - Data explorer: data_explorer.md
   - Infrastructure: infrastructure.md
   - Manifest: manifest.md
+  - Announcements:
+      - announcements/CC_25M_community.md
 
 plugins:
   - mkdocstrings

From 7713d663b773a6fed95af697c73356a929b986f1 Mon Sep 17 00:00:00 2001
From: Robbe Sneyders <robbe.sneyders@ml6.eu>
Date: Wed, 27 Sep 2023 15:39:28 +0200
Subject: [PATCH 2/3] Remove prefix slash in banner url

---
 docs/overrides/main.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/overrides/main.html b/docs/overrides/main.html
index 4ec5b52bc..c506c3d98 100644
--- a/docs/overrides/main.html
+++ b/docs/overrides/main.html
@@ -3,7 +3,7 @@
 {% block announce %}
     <p style="text-align: center">
         We released a 25 million Creative Commons image dataset!
-        <a href="/announcements/CC_25M_community/"
+        <a href="announcements/CC_25M_community/"
            style="color: white; text-decoration: underline">Read more</a>
     </p>
 {% endblock %}
\ No newline at end of file

From 32d43efa9d39304b1b399d08f24351f903f24c93 Mon Sep 17 00:00:00 2001
From: Robbe Sneyders <robbe.sneyders@ml6.eu>
Date: Wed, 27 Sep 2023 15:53:31 +0200
Subject: [PATCH 3/3] Replace notes on getting started page by admonition notes

---
 docs/getting_started.md | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/docs/getting_started.md b/docs/getting_started.md
index f11c54e75..9f17acd86 100644
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@@ -1,8 +1,14 @@
 # Getting started
 
-Note: To execute the pipeline locally, you must have docker compose, Python >=3.8 and Git installed on your system.
+!!! note
 
-Note: For Apple M1/M2 ship users: - Make sure that Docker uses linux/amd64 platform and not arm64. - In Docker Dashboards’ Settings<Features in development, make sure to uncheck Use containerid for pulling and storing images .
+    To execute the pipeline locally, you must have docker compose, Python >=3.8 and Git 
+    installed on your system.
+
+!!! note
+
+    For Apple M1/M2 ship users: - Make sure that Docker uses linux/amd64 platform and not 
+    arm64. - In Docker Dashboards’ Settings<Features in development, make sure to uncheck Use containerid for pulling and storing images.
 
 For demonstration purposes, we provide sample pipelines in the Fondant GitHub repository. A great starting point is the pipeline that loads and filters creative commons images. To follow along with the upcoming instructions, you can clone the [repository](https://github.com/ml6team/fondant) and navigate to the `examples/pipelines/filter-cc-25m` folder.