-
Notifications
You must be signed in to change notification settings - Fork 7
/
README.Rmd
799 lines (586 loc) · 25.8 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
if (requireNamespace("knitr", quietly = TRUE)){
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
dpi = 92,
fig.retina = 2
)
}
# Get minimum R requirement
dep <- as.vector(read.dcf('DESCRIPTION')[, 'Depends'])
rvers <- substring(dep, 7, nchar(dep)-1)
# m <- regexpr('R *\\\\(>= \\\\d+.\\\\d+.\\\\d+\\\\)', dep)
# rm <- regmatches(dep, m)
# rvers <- gsub('.*(\\\\d+.\\\\d+.\\\\d+).*', '\\\\1', dep)
# Function for TOC
# https://gist.github.com/gadenbuie/c83e078bf8c81b035e32c3fc0cf04ee8
```
# cvms <a href='https://github.com/LudvigOlsen/cvms'><img src='man/figures/cvms_logo_242x280_250dpi.png' align="right" height="140" /></a>
**Cross-Validation for Model Selection**
**Authors:** [Ludvig R. Olsen](https://www.ludvigolsen.dk/) ( [email protected] ), Hugh Benjamin Zachariae <br/>
**License:** [MIT](https://opensource.org/license/mit) <br/>
**Started:** October 2016
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/cvms)](https://cran.r-project.org/package=cvms)
[![metacran downloads](https://cranlogs.r-pkg.org/badges/cvms)](https://cran.r-project.org/package=cvms)
[![minimal R version](https://img.shields.io/badge/R%3E%3D-`r rvers`-6666ff.svg)](https://cran.r-project.org/)
[![Codecov test coverage](https://codecov.io/gh/ludvigolsen/cvms/branch/master/graph/badge.svg)](https://app.codecov.io/gh/ludvigolsen/cvms?branch=master)
[![GitHub Actions CI status](https://github.com/ludvigolsen/cvms/actions/workflows/R-check.yaml/badge.svg?branch=master)](https://github.com/ludvigolsen/cvms/actions/workflows/R-check.yaml?branch=master)
[![AppVeyor build status](https://ci.appveyor.com/api/projects/status/github/LudvigOlsen/cvms?branch=master&svg=true)](https://ci.appveyor.com/project/LudvigOlsen/cvms)
[![DOI](https://zenodo.org/badge/71063931.svg)](https://zenodo.org/badge/latestdoi/71063931)
## Overview
R package for model evaluation and comparison.
* **Cross-validate** one or multiple regression or classification models with relevant evaluation metrics in a **tidy** format.
* **Validate** the best model on a test set and compare it to a **baseline** evaluation.
* Perform **hyperparameter tuning** with *grid search*.
* **Evaluate** predictions from an external model.
* Extract the observations that were the **most challenging** to predict.
Currently supports regression (`'gaussian'`), binary classification (`'binomial'`), and (some functions only) multiclass classification (`'multinomial'`). Many of the functions allow **parallelization**, e.g. through the `doParallel` package.
**NEW**: Our new [application](https://huggingface.co/spaces/ludvigolsen/plot_confusion_matrix) for plotting confusion matrices with `plot_confusion_matrix()` without any code is now available on [Huggingface Spaces](https://huggingface.co/spaces/ludvigolsen/plot_confusion_matrix).
### Main functions
| Function | Description |
|:----------------------|:-------------------------------------------------------------------|
|`cross_validate()` |Cross-validate linear models with `lm()`/`lmer()`/`glm()`/`glmer()` |
|`cross_validate_fn()` |Cross-validate a custom model function |
|`validate()` |Validate linear models with (`lm`/`lmer`/`glm`/`glmer`) |
|`validate_fn()` |Validate a custom model function |
|`evaluate()` |Evaluate predictions with a large set of metrics |
|`baseline()`</br>`baseline_gaussian()`</br>`baseline_binomial()`</br>`baseline_multinomial()` |Perform baseline evaluations of a dataset |
### Evaluation utilities
| Function | Description |
|:----------------------|:-------------------------------------------------------------------|
|`confusion_matrix()` |Create a confusion matrix from predictions and targets |
|`evaluate_residuals()` |Evaluate residuals from a regression task |
|`most_challenging()` |Find the observations that were the most challenging to predict |
|`summarize_metrics()` |Summarize numeric columns with a set of descriptors |
### Formula utilities
| Function | Description |
|:------------------------|:-------------------------------------------------------------------|
|`combine_predictors()` |Generate model formulas from a list of predictors |
|`reconstruct_formulas()` |Extract formulas from output tibble |
|`simplify_formula()` |Remove inline functions with more from a formula object |
### Plotting utilities
| Function | Description |
|:-------------------------|:-------------------------------------------------------------------|
|`plot_confusion_matrix()` |Plot a confusion matrix (see also our [no-code application](https://huggingface.co/spaces/ludvigolsen/plot_confusion_matrix)) |
|`plot_metric_density()` |Create a density plot for a metric column |
|`font()` |Set font settings for plotting functions (currently only `plot_confusion_matrix()`) |
|`sum_tile_settings()` |Set settings for sum tiles in `plot_confusion_matrix()` |
### Custom functions
| Function | Description |
|:--------------------------|:--------------------------------------------------------|
|`model_functions()` |Example model functions for `cross_validate_fn()` |
|`predict_functions()` |Example predict functions for `cross_validate_fn()` |
|`preprocess_functions()` |Example preprocess functions for `cross_validate_fn()` |
|`update_hyperparameters()` |Manage hyperparameters in custom model functions |
### Other utilities
| Function | Description |
|:----------------------|:-------------------------------------------------------------------|
|`select_metrics()` |Select the metric columns from the output |
|`select_definitions()` |Select the model-defining columns from the output |
|`gaussian_metrics()`<br />`binomial_metrics()`<br />`multinomial_metrics()` |Create list of metrics for the common `metrics` argument |
|`multiclass_probability_tibble()` |Generate a multiclass probability tibble |
### Datasets
| Name | Description |
|:--------------------------|:-------------------------------------------------------------------|
|`participant.scores` |Made-up experiment data with 10 participants and two diagnoses |
|`wines` |A list of wine varieties in an approximately Zipfian distribution |
|`musicians` |Made-up data on 60 musicians in 4 groups for multiclass classification |
| `predicted.musicians` |Predictions by 3 classifiers of the 4 classes in the `musicians` dataset |
|`precomputed.formulas` |Fixed effect combinations for model formulas with/without two- and three-way interactions |
|`compatible.formula.terms` |162,660 pairs of compatible terms for building model formulas with up to 15 fixed effects |
## Table of Contents
```{r toc, echo=FALSE}
cvms:::render_toc("README.Rmd")
```
## Important News
>
> Check `NEWS.md` for the full list of changes.
>
* Version `1.2.0` contained **multiple breaking changes**. Please see `NEWS.md`. (18th of October 2020)
## Installation
CRAN:
> `install.packages("cvms")`
Development version:
> `install.packages("devtools")`
>
> `devtools::install_github("LudvigOlsen/groupdata2")`
>
> `devtools::install_github("LudvigOlsen/cvms")`
## Vignettes
`cvms` contains a number of vignettes with relevant use cases and descriptions:
> `vignette(package = "cvms")` # for an overview
# Examples
## Attach packages
```{r warning=FALSE, message=FALSE}
library(cvms)
library(groupdata2) # fold() partition()
library(knitr) # kable()
library(dplyr) # %>% arrange()
```
## Load data
The dataset `participant.scores` comes with `cvms`:
```{r}
data <- participant.scores
```
## Fold data
Create a grouping factor for subsetting of folds using `groupdata2::fold()`. Order the dataset by the folds:
```{r}
# Set seed for reproducibility
set.seed(7)
# Fold data
data <- fold(
data = data, k = 4,
cat_col = 'diagnosis',
id_col = 'participant') %>%
arrange(.folds)
# Show first 15 rows of data
data %>% head(15) %>% kable()
```
## Cross-validate a single model
### Gaussian
```{r warning=FALSE, message=FALSE}
CV1 <- cross_validate(
data = data,
formulas = "score ~ diagnosis",
fold_cols = '.folds',
family = 'gaussian',
REML = FALSE
)
# Show results
CV1
# Let's take a closer look at the different parts of the output
# Metrics and formulas
CV1 %>% select_metrics() %>% kable()
# Just the formulas
CV1 %>% select_definitions() %>% kable()
# Nested predictions
# Note that [[1]] picks predictions for the first row
CV1$Predictions[[1]] %>% head() %>% kable()
# Nested results from the different folds
CV1$Results[[1]] %>% kable()
# Nested model coefficients
# Note that you have the full p-values,
# but kable() only shows a certain number of digits
CV1$Coefficients[[1]] %>% kable()
# Additional information about the model
# and the training process
CV1 %>% select(14:19, 21) %>% kable()
CV1$Process[[1]]
```
### Binomial
```{r fig.width = 4.25, fig.height = 4.25, fig.align='center'}
CV2 <- cross_validate(
data = data,
formulas = "diagnosis~score",
fold_cols = '.folds',
family = 'binomial'
)
# Show results
CV2
# Let's take a closer look at the different parts of the output
# We won't repeat the parts too similar to those in Gaussian
# Metrics
CV2 %>% select(1:9) %>% kable(digits = 5)
CV2 %>% select(10:15) %>% kable()
# Confusion matrix
CV2$`Confusion Matrix`[[1]] %>% kable()
# Plot confusion matrix
plot_confusion_matrix(CV2$`Confusion Matrix`[[1]], add_sums = TRUE)
```
## Cross-validate multiple models
### Create model formulas
```{r}
model_formulas <- c("score ~ diagnosis", "score ~ age")
mixed_model_formulas <- c("score ~ diagnosis + (1|session)",
"score ~ age + (1|session)")
```
### Cross-validate fixed effects models
```{r}
CV3 <- cross_validate(
data = data,
formulas = model_formulas,
fold_cols = '.folds',
family = 'gaussian',
REML = FALSE
)
# Show results
CV3
```
### Cross-validate mixed effects models
```{r}
CV4 <- cross_validate(
data = data,
formulas = mixed_model_formulas,
fold_cols = '.folds',
family = 'gaussian',
REML = FALSE
)
# Show results
CV4
```
## Repeated cross-validation
Instead of only dividing our data into folds once, we can do it multiple times and average the results. As the models can be ranked differently with different splits, this is generally preferable.
Let's first add some extra fold columns. We will use the `num_fold_cols` argument to add 3 *unique* fold columns. We tell `fold()` to *keep* the existing fold column and simply add three extra columns. We could also choose to remove the existing fold column, if, for instance, we were changing the number of folds (`k`). Note, that the original fold column will be renamed to `".folds_1"`.
```{r}
# Set seed for reproducibility
set.seed(2)
# Ungroup data
# Ootherwise we would create folds within the existing folds
data <- dplyr::ungroup(data)
# Fold data
data <- fold(
data = data,
k = 4,
cat_col = 'diagnosis',
id_col = 'participant',
num_fold_cols = 3,
handle_existing_fold_cols = "keep"
)
# Show first 15 rows of data
data %>% head(10) %>% kable()
```
Now, let's cross-validate the four fold columns. We use `paste0()` to create the four column names:
```{r}
CV5 <- cross_validate(
data = data,
formulas = c("diagnosis ~ score",
"diagnosis ~ score + age"),
fold_cols = paste0(".folds_", 1:4),
family = 'binomial'
)
# Show results
CV5
# Subset of the results per fold for the first model
CV5$Results[[1]] %>% select(1:8) %>% kable()
```
## Cross-validating custom model functions
`cross_validate_fn()` allows us to cross-validate a custom model function, like a support vector machine or a neural network. It works with regression (`gaussian`), binary classification (`binomial`), and multiclass classification (`multinomial`).
It is required to pass a model function and a predict function. Further, it is possible to pass a preprocessing function and a list of hyperparameter values to test with grid search. You can check the requirements for these functions at `?cross_validate_fn`.
### SVM
Let's cross-validate a support-vector machine using the `svm()` function from the `e1071` package.
First, we will create a model function. You can do anything you want inside it, as long as it takes the arguments `train_data`, `formula`, and `hyperparameters` and returns a fitted model object:
```{r eval=requireNamespace("e1071")}
# Create model function
#
# train_data : tibble with the training data
# formula : a formula object
# hyperparameters : a named list of hyparameters
svm_model_fn <- function(train_data, formula, hyperparameters){
# Note that `formula` must be passed first
# when calling svm(), otherwise it fails
e1071::svm(
formula = formula,
data = train_data,
kernel = "linear",
type = "C-classification",
probability = TRUE
)
}
```
We also need a predict function. This will usually wrap the `stats::predict()` function. The point is to ensure that the predictions have the correct format. In this case, we want a single column with the probability of the positive class.
Note, that you do not need to use the `formula`, `hyperparameters`, and `train_data` arguments within your function. These are there for the few cases, where they are needed.
```{r}
# Create predict function
#
# test_data : tibble with the test data
# model : fitted model object
# formula : a formula object
# hyperparameters : a named list of hyparameters
# train_data : tibble with the training data
svm_predict_fn <- function(test_data, model, formula, hyperparameters, train_data){
# Predict the test set with the model
predictions <- stats::predict(
object = model,
newdata = test_data,
allow.new.levels = TRUE,
probability = TRUE
)
# Extract the probabilities
# Usually the predict function will just
# output the probabilities directly
probabilities <- dplyr::as_tibble(
attr(predictions, "probabilities")
)
# Return second column
# with probabilities of positive class
probabilities[[2]]
}
```
With these functions defined, we can cross-validate the support-vector machine:
```{r eval=requireNamespace("e1071")}
# Cross-validate svm_model_fn
CV6 <- cross_validate_fn(
data = data,
model_fn = svm_model_fn,
predict_fn = svm_predict_fn,
formulas = c("diagnosis ~ score", "diagnosis ~ age"),
fold_cols = '.folds_1',
type = 'binomial'
)
CV6
```
### Naïve Bayes
Let's try with a naïve Bayes classifier as well. First, we will define the model function:
```{r eval=requireNamespace("e1071")}
# Create model function
#
# train_data : tibble with the training data
# formula : a formula object
# hyperparameters : a named list of hyparameters
nb_model_fn <- function(train_data, formula, hyperparameters){
e1071::naiveBayes(
formula = formula,
data = train_data
)
}
```
And the predict function:
```{r}
# Create predict function
#
# test_data : tibble with the test data
# model : fitted model object
# formula : a formula object
# hyperparameters : a named list of hyparameters
# train_data : tibble with the training data
nb_predict_fn <- function(test_data, model, formula, hyperparameters, train_data){
stats::predict(
object = model,
newdata = test_data,
type = "raw",
allow.new.levels = TRUE)[, 2]
}
```
With both functions specified, we are ready to cross-validate our naïve Bayes classifier:
```{r eval=requireNamespace("e1071")}
CV7 <- cross_validate_fn(
data = data,
model_fn = nb_model_fn,
predict_fn = nb_predict_fn,
formulas = c("diagnosis ~ score", "diagnosis ~ age"),
type = 'binomial',
fold_cols = '.folds_1'
)
CV7
```
## Extracting the most challenging observations
If we wish to investigate why some observations are harder to predict than others, we should start by identifying the most challenging observations. This can be done with `most_challenging()`.
Let's first extract the predictions from some of the cross-validation results:
```{r eval=requireNamespace("e1071")}
glm_predictions <- dplyr::bind_rows(CV5$Predictions, .id = "Model")
svm_predictions <- dplyr::bind_rows(CV6$Predictions, .id = "Model")
nb_predictions <- dplyr::bind_rows(CV7$Predictions, .id = "Model")
predictions <- dplyr::bind_rows(
glm_predictions,
svm_predictions,
nb_predictions,
.id = "Architecture"
)
predictions[["Target"]] <- as.character(predictions[["Target"]])
predictions
```
Now, let's find the *overall most difficult to predict* observations. `most_challenging()` calculates the `Accuracy`, `MAE`, and `Cross-Entropy` for each prediction. We can then extract the observations with the ~20% highest `MAE` scores. Note that `most_challenging()` works with grouped data frames as well.
```{r eval=requireNamespace("e1071")}
challenging <- most_challenging(
data = predictions,
prediction_cols = "Prediction",
type = "binomial",
threshold = 0.20,
threshold_is = "percentage"
)
challenging
```
We can then extract the difficult observations from the dataset. First, we add an index to the dataset. Then, we perform a right-join, to only get the rows that are in the `challenging` data frame.
```{r eval=requireNamespace("e1071")}
# Index with values 1:30
data[["Observation"]] <- seq_len(nrow(data))
# Add information to the challenging observations
challenging <- data %>%
# Remove fold columns for clarity
dplyr::select(-c(.folds_1, .folds_2, .folds_3, .folds_4)) %>%
# Add the scores
dplyr::right_join(challenging, by = "Observation")
challenging %>% kable()
```
Note: You may have to scroll to the right in the table.
## Evaluating predictions
We can also evaluate predictions from a model trained outside `cvms`. This works with regression (`'gaussian'`), binary classification (`'binomial'`), and multiclass classification (`'multinomial'`).
### Gaussian evaluation
Extract the targets and predictions from the first cross-validation we performed and evaluate it with `evaluate()`. We group the data frame by the `Fold` column to evaluate each fold separately:
```{r}
# Extract the predictions from the first cross-validation
predictions <- CV1$Predictions[[1]]
predictions %>% head(6) %>% kable()
# Evaluate the predictions per fold
predictions %>%
group_by(Fold) %>%
evaluate(
target_col = "Target",
prediction_cols = "Prediction",
type = "gaussian"
)
```
### Binomial evaluation
We can do the same for the predictions from the second, binomial cross-validation:
```{r}
# Extract the predictions from the second cross-validation
predictions <- CV2$Predictions[[1]]
predictions %>% head(6) %>% kable()
# Evaluate the predictions per fold
predictions %>%
group_by(Fold) %>%
evaluate(
target_col = "Target",
prediction_cols = "Prediction",
type = "binomial"
)
```
### Multinomial evaluation
We will use the `multiclass_probability_tibble()` helper to generate a data frame with predicted probabilities for three classes, along with the predicted class and the target class. Then, we will 1) evaluate the three probability columns against the targets (preferable format), and 2) evaluate the predicted classes against the targets:
```{r}
# Create dataset for multinomial evaluation
multiclass_data <- multiclass_probability_tibble(
num_classes = 3, # Here, number of predictors
num_observations = 30,
apply_softmax = TRUE,
add_predicted_classes = TRUE,
add_targets = TRUE)
multiclass_data
# Evaluate probabilities
# One prediction column *per class*
ev <- evaluate(
data = multiclass_data,
target_col = "Target",
prediction_cols = paste0("class_", 1:3),
type = "multinomial"
)
ev
# The one-vs-all evaluations
ev$`Class Level Results`[[1]]
# Evaluate the predicted classes
# One prediction column with the class names
evaluate(
data = multiclass_data,
target_col = "Target",
prediction_cols = "Predicted Class",
type = "multinomial"
)
```
## Baseline evaluations
While it's common to find the chance-level baseline analytically (in classification tasks), it's often possible to get a better evaluation than that by chance. Hence, it is useful to check the range of our metrics when randomly guessing the probabilities.
Usually, we use `baseline()` on our test set at the start of our modeling process, so we know what level of performance we should beat.
Note: Where `baseline()` works with all three families (`gaussian`, `binomial` and `multinomial`),
each family also has a wrapper function (e.g. `baseline_gaussian()`) that is easier to use. We use those here.
Start by partitioning the dataset:
```{r}
# Set seed for reproducibility
set.seed(1)
# Partition the dataset
partitions <- groupdata2::partition(
participant.scores,
p = 0.7,
cat_col = 'diagnosis',
id_col = 'participant',
list_out = TRUE
)
train_set <- partitions[[1]]
test_set <- partitions[[2]]
```
### Binomial baseline
Approach: `n` random sets of predictions are evaluated against the dependent variable in the test set. We also evaluate a set of all `0`s and a set of all `1`s.
Create the baseline evaluations:
```{r}
# Perform binomial baseline evaluation
# Note: It's worth enabling parallelization (see ?baseline examples)
binomial_baseline <- baseline_binomial(
test_data = test_set,
dependent_col = "diagnosis",
n = 100
)
binomial_baseline$summarized_metrics
```
On average, we can expect an `F1` score of approximately `0.481`. The maximum `F1` score achieved by randomly guessing was `0.833` though. That's likely because of the small size of the test set, but it illustrates how such information could be useful in a real-life scenario.
The `All_1` row shows us that we can achieve an `F1` score of `0.667` by always predicting `1`. Some model architectures, like neural networks, have a tendency to always predict the majority class. Such a model is quite useless of course, why it is good to be aware of the performance it could achieve. We could also check the confusion matrix for such a pattern.
```{r}
binomial_baseline$random_evaluations
```
We can plot the distribution of `F1` scores from the random evaluations:
```{r fig.width = 6, fig.height = 4.5, fig.align='center'}
# First, remove the NAs from the F1 column
random_evaluations <- binomial_baseline$random_evaluations
random_evaluations <- random_evaluations[!is.na(random_evaluations$F1),]
# Create density plot for F1
plot_metric_density(baseline = random_evaluations,
metric = "F1", xlim = c(0, 1))
```
### Multinomial baseline
Approach: Creates one-vs-all (binomial) baseline evaluations for `n` sets of random predictions against the dependent variable, along with sets of `all class x,y,z,...` predictions.
Create the baseline evaluations:
```{r warning=FALSE}
multiclass_baseline <- baseline_multinomial(
test_data = multiclass_data,
dependent_col = "Target",
n = 100
)
# Summarized metrics
multiclass_baseline$summarized_metrics
```
The `CL_` measures describe the `Class Level Results` (aka. one-vs-all evaluations). One of the classes have a maximum `Balanced Accuracy` score of `0.770`, while the maximum `Balanced Accuracy` in the random evaluations is `0.664`.
```{r}
# Summarized class level results for class 1
multiclass_baseline$summarized_class_level_results %>%
dplyr::filter(Class == "class_1") %>%
tidyr::unnest(Results)
# Random evaluations
# Note, that the class level results for each repetition
# are available as well
multiclass_baseline$random_evaluations
```
### Gaussian baseline
Approach: The baseline model `(y ~ 1)`, where `1` is simply the intercept (i.e. mean of `y`), is fitted on `n` random subsets of the training set and evaluated on the test set. We also perform an evaluation of the model fitted on the entire training set.
We usually wish to establish whether our predictors add anything useful to our model. We should thus at least do better than a model without any predictors.
Create the baseline evaluations:
```{r}
gaussian_baseline <- baseline_gaussian(
test_data = test_set,
train_data = train_set,
dependent_col = "score",
n = 100
)
gaussian_baseline$summarized_metrics
```
The `All_rows` row tells us the performance when fitting the intercept model on the full training set. It is quite close to the mean of the random evaluations.
```{r}
gaussian_baseline$random_evaluations
```
Plot the density plot for `RMSE`:
```{r fig.width = 6, fig.height = 3.5, fig.align='center'}
plot_metric_density(baseline = gaussian_baseline$random_evaluations,
metric = "RMSE")
```
In this instance, the `All_rows` row might have been enough, as the subsets mainly add higher `RMSE` scores.
## Generate model formulas
Instead of manually typing all possible model formulas for a set of fixed effects (including the possible interactions), `combine_predictors()` can do it for you (with some constraints).
When including interactions, >200k formulas have been precomputed for up to 8 fixed effects, with a maximum interaction size of 3, and a maximum of 5 fixed effects per formula. It's possible to further limit the generated formulas.
We can also append a random effects structure to the generated formulas.
```{r}
combine_predictors(
dependent = "y",
fixed_effects = c("a", "b", "c"),
random_effects = "(1|d)"
)
```
If two or more fixed effects should not be in the same formula, like an effect and its log-transformed version, we can provide them as sublists.
```{r}
combine_predictors(
dependent = "y",
fixed_effects = list("a", list("b", "log_b")),
random_effects = "(1|d)"
)
```