Skip to content

Commit

Permalink
Add (or update) robots.txt and ai.txt to block AI crawlers (#4077)
Browse files Browse the repository at this point in the history
* Add robots.txt and ai.txt to documentation site

* Add ai.txt and remove trailing slash from robots.txt

* Add robots.txt and ai.txt to frontend
  • Loading branch information
sarayourfriend authored Apr 16, 2024
1 parent 4051522 commit b2f3fd6
Show file tree
Hide file tree
Showing 8 changed files with 289 additions and 3 deletions.
60 changes: 60 additions & 0 deletions api/api/templates/ai.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Spawning AI
# Prevent datasets from using the following file types

User-Agent: *
Disallow: *.txt
Disallow: *.pdf
Disallow: *.doc
Disallow: *.docx
Disallow: *.odt
Disallow: *.rtf
Disallow: *.tex
Disallow: *.wks
Disallow: *.wpd
Disallow: *.wps
Disallow: *.html
Disallow: *.bmp
Disallow: *.gif
Disallow: *.ico
Disallow: *.jpeg
Disallow: *.jpg
Disallow: *.png
Disallow: *.svg
Disallow: *.tif
Disallow: *.tiff
Disallow: *.webp
Disallow: *.aac
Disallow: *.aiff
Disallow: *.amr
Disallow: *.flac
Disallow: *.m4a
Disallow: *.mp3
Disallow: *.oga
Disallow: *.opus
Disallow: *.wav
Disallow: *.wma
Disallow: *.mp4
Disallow: *.webm
Disallow: *.ogg
Disallow: *.avi
Disallow: *.mov
Disallow: *.wmv
Disallow: *.flv
Disallow: *.mkv
Disallow: *.py
Disallow: *.js
Disallow: *.java
Disallow: *.c
Disallow: *.cpp
Disallow: *.cs
Disallow: *.h
Disallow: *.css
Disallow: *.php
Disallow: *.swift
Disallow: *.go
Disallow: *.rb
Disallow: *.pl
Disallow: *.sh
Disallow: *.sql
Disallow: /
Disallow: *
33 changes: 33 additions & 0 deletions api/api/templates/robots.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,36 @@ Disallow: /v1/auth/

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: Omgili
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: cohere-ai
Disallow: /
8 changes: 5 additions & 3 deletions api/conf/urls/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@
path("admin/", admin.site.urls),
path("healthcheck/", HealthCheck.as_view(), name="health"),
path("v1/", include(versioned_paths)),
] + [
path(
"robots.txt/",
f"{file}",
TemplateView.as_view(
template_name="robots.txt",
template_name=file,
content_type="text/plain",
),
),
)
for file in ["robots.txt", "ai.txt"]
]
60 changes: 60 additions & 0 deletions documentation/ai.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Spawning AI
# Prevent datasets from using the following file types

User-Agent: *
Disallow: *.txt
Disallow: *.pdf
Disallow: *.doc
Disallow: *.docx
Disallow: *.odt
Disallow: *.rtf
Disallow: *.tex
Disallow: *.wks
Disallow: *.wpd
Disallow: *.wps
Disallow: *.html
Disallow: *.bmp
Disallow: *.gif
Disallow: *.ico
Disallow: *.jpeg
Disallow: *.jpg
Disallow: *.png
Disallow: *.svg
Disallow: *.tif
Disallow: *.tiff
Disallow: *.webp
Disallow: *.aac
Disallow: *.aiff
Disallow: *.amr
Disallow: *.flac
Disallow: *.m4a
Disallow: *.mp3
Disallow: *.oga
Disallow: *.opus
Disallow: *.wav
Disallow: *.wma
Disallow: *.mp4
Disallow: *.webm
Disallow: *.ogg
Disallow: *.avi
Disallow: *.mov
Disallow: *.wmv
Disallow: *.flv
Disallow: *.mkv
Disallow: *.py
Disallow: *.js
Disallow: *.java
Disallow: *.c
Disallow: *.cpp
Disallow: *.cs
Disallow: *.h
Disallow: *.css
Disallow: *.php
Disallow: *.swift
Disallow: *.go
Disallow: *.rb
Disallow: *.pl
Disallow: *.sh
Disallow: *.sql
Disallow: /
Disallow: *
1 change: 1 addition & 0 deletions documentation/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ def add_ext_to_path():
)

html_static_path = ["_static"]
html_extra_path = ["robots.txt", "ai.txt"]

html_show_copyright = False

Expand Down
35 changes: 35 additions & 0 deletions documentation/robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: Omgili
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: cohere-ai
Disallow: /
60 changes: 60 additions & 0 deletions frontend/src/static/ai.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Spawning AI
# Prevent datasets from using the following file types

User-Agent: *
Disallow: *.txt
Disallow: *.pdf
Disallow: *.doc
Disallow: *.docx
Disallow: *.odt
Disallow: *.rtf
Disallow: *.tex
Disallow: *.wks
Disallow: *.wpd
Disallow: *.wps
Disallow: *.html
Disallow: *.bmp
Disallow: *.gif
Disallow: *.ico
Disallow: *.jpeg
Disallow: *.jpg
Disallow: *.png
Disallow: *.svg
Disallow: *.tif
Disallow: *.tiff
Disallow: *.webp
Disallow: *.aac
Disallow: *.aiff
Disallow: *.amr
Disallow: *.flac
Disallow: *.m4a
Disallow: *.mp3
Disallow: *.oga
Disallow: *.opus
Disallow: *.wav
Disallow: *.wma
Disallow: *.mp4
Disallow: *.webm
Disallow: *.ogg
Disallow: *.avi
Disallow: *.mov
Disallow: *.wmv
Disallow: *.flv
Disallow: *.mkv
Disallow: *.py
Disallow: *.js
Disallow: *.java
Disallow: *.c
Disallow: *.cpp
Disallow: *.cs
Disallow: *.h
Disallow: *.css
Disallow: *.php
Disallow: *.swift
Disallow: *.go
Disallow: *.rb
Disallow: *.pl
Disallow: *.sh
Disallow: *.sql
Disallow: /
Disallow: *
35 changes: 35 additions & 0 deletions frontend/src/static/robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: Omgili
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: cohere-ai
Disallow: /

0 comments on commit b2f3fd6

Please sign in to comment.