Sweary is an R package that contains a database of swear words from different languages, cherry picked by native speakers.
The development version of this package can be installed using devtools:
devtools::install_github("pdrhlik/sweary")
Language | Language code | Number of swear words |
---|---|---|
Czech | cs | 57 |
German | de | 99 |
English | en | 39 |
French (Canada) | fr-CA | 20 |
Greek | gr | 13 |
Macedonian | mk | 64 |
Polish | pl | 41 |
Romanian | ro | 38 |
Slovak | sk | 28 |
Total | 9 langs | 399 |
All languages are stored in a swear_words
data frame.
library(sweary)
head(swear_words)
## # A tibble: 6 x 2
## word language
## <chr> <chr>
## 1 buzerant cs
## 2 čubka cs
## 3 čurák cs
## 4 čůrák cs
## 5 debil cs
## 6 dement cs
You can only extract one language that you are interested in.
en_swear_words <- get_swearwords("en")
head(en_swear_words)
## # A tibble: 6 x 2
## word language
## <chr> <chr>
## 1 arse en
## 2 arsehole en
## 3 ass en
## 4 asshole en
## 5 bitch en
## 6 bollocks en
If you are not comfortable with git
and pull requests, you can just
follow steps 1-3. After you create the file, send it to me via
email with a subject New sweary
language: {LANG_CODE}. We will acknowledge you in the README after we
approve of the changes.
- Choose a new language.
Find its two letter ISO 639-1 code.
If the language you are creating is a certain dialect (e.g. Canadian French), find its IETF language tag in this language code table. - Create a language file.
Place the file indata-raw/swear-word-lists/{LANG_CODE}_{LANG_NAME}
.
Examples:- English:
data-raw/swear-word-lists/en_English
- Canadian French:
data-raw/swear-word-lists/fr-CA_French (Canada)
Note that spaces and parentheses in file names are allowed.
- English:
- Fill in the file with swear words. Following rules must apply:
- One swear word per line with no trailing whitespace.
- All words must be lowercase.
- The list must only contain unique words.
- The list must be sorted alphabetically.
- Make sure all the tests pass.
You can do that using a development function calledbuild_sweary()
. It becomes available when yougit clone
the repository and calldevtools::load_all()
. Or pressingCtrl+Shift+L
in RStudio. Learn more about calling this function using?build_sweary
. - Create a pull request.
The idea first appeared after the South Park text analysis lightning talk at the Why R? 2018 conference in Wrocław. All the contributors will be acknowledged as the work progresses.
Here we would like to say BIG THANKS to native speakers that help us with swear words dictionaries:
- Czech - Patrik Drhlík
- English - Patrik Drhlík
- French (Canada) - Marc-André Désautels
- German - Peter Meißner
- Greek - Anonymous
- Macedonian - novica
- Polish - Michal Czyz
- Romanian - Alexandru Supeanu
- Slovak - Šimon Žďárský