Note: this judge makes extensive use of R environments to separate student code from test code. If you are not familiar with environments, it might be useful to read this.
The test file for a basic exercise can look like this:
context({
testcase('The correct method was used', {
testEqual("test$alternative", function(studentEnv) { studentEnv$test$alternative }, 'two.sided')
testEqual("test$method", function(studentEnv) { studentEnv$test$method }, ' Two Sample t-test')
})
testcase('p value is correct', {
testEqual("test$p.value", function(studentEnv) { studentEnv$test$p.value }, 0.175)
})
}, preExec = {
set.seed(20190322)
})
context({
testcase('x has the correct length', {
testEqual("length(x)", function(studentEnv) { length(studentEnv$x) }, 100)
})
})
Let's unpack what happens here.
First of all, something you can't see in the example code above. Dodona groups contexts in tabs. These are represented in the R judge by the files containing the test code. The name of the file (without the .R
) extension is used to name the tab. A file should contain one or more calls to context
. Tabs are ordered lexicographically by their filename. To make sure that tabs can be in a logical order, leading digits followed by a dash (-
) are also stripped from the filename.
A context represents one execution of the student code. It is generally desirable to have as many contexts as possible, since students can filter by incorrect contexts. The context
function does a few things:
- It creates a clean environment based on the global environment. Students have access to the global environment, but don't have access to the testing code or variables used in the testing code (the testing code is executed in a similar environment that is never bound).
- It executes the code passed through the
preExec
argument in this clean environment. This can be used for setting the seed (as in this example), but also to set variables or define functions that students can then use. NOTE: thepreExec
argument is not executed in the environment where the tests are run. If you need this, you will need to do this yourself. - It executes the student's code in the clean environment. If the code errors out or generates warnings/messages these are caught and handled. An error will interrupt the execution and set the
runtime error
state for the the submission. It also adds a message to the context containing the error message generated by R. Warnings and messages do not interrupt execution, but are also added as messages to the context. - It executes the first argument (a code block containing testcases) in the test environment.
Note that the student code is executed once for each call to context
. Technically, this allows the student to store intermediate results in the global environment. The use of this is limited, so we don't see this as a problem.
The contextWithRmd
function does the same as the context
function but it expects the student code to be in the R Markdown format. The R chunks are evaluated as before and the markdown text is ignored during evalutaion.
An extra contextWithImage
function also exists. This function takes the same arguments, but adds an image to the output if it was generated by the student while their code is executed. By default, this function will make the output wrong if the expected image wasn't generated. This behaviour can be changed by setting the optional failIfAbsent
parameter to FALSE
.
For introductory exercises students often use R as a calculator and do not store the result of an expression as a variable in their script. For such scripts the eval function that executes the parsed script of the student does not store this result as a variable in the test environment. However, it simply returns the value to the caller. The result of the evaluation is injected into the test environment under the name evaluationResult
. A simple test using this could look like this:
context({
testcase('the correct value is calculated', {
testEqual("Result", function(studentEnv) { studentEnv$evaluationResult }, 42)
})
})
Testcases group a number of related tests. The first argument of the testcase
function is a description of that related group. The second argument is a code block (containing tests) which will be executed by the testcase
function. There is little functionality in the testcase
function. It is mostly used as a wrapper for the Dodona concept.
A test is an actual evaluation of correctness. Multiple test*
functions are available and are explained in more detail below. The only constant thing for tests are the first three arguments:
- A description of the test. Preferably, this is something the student can copy-paste into their local R environment (e.g.
length(x)
,test$p.value
, etc.). - A function extracting the value to be tested from the student's environment. This function should take one argument (
env
) and return a value. - The expected value. This expected value is compared to the value extracted by the second argument.
The testEqual
function uses the base::all.equal
function internally to determine whether the two values are equal. Any parameters that can be passed to all.equal
can be passed to testEqual
(but the first three arguments need to be as described above). In addition, one can pass a comparator
argument to testEqual
. This comparator
should be a function that takes two arguments (generated
and expected
, in that order) and returns TRUE
or FALSE
. If this argument is passed, the comparator is used instead of all.equal
. Any named arguments passed to test_equal
that are not known by testEqual
are passed to all.equal
or comparator
depending on what is used. There is also an extra formatter
argument that can be passed. formatter
should be a function that takes a single argument and returns its argument formatted to the test's liking.
The testIdentical
function uses the base::identical
function internally to determine whether the two values are equal. Any parameters that can be passed to identical
can be passed to testIdentical
(but the first three arguments need to be as described above). The formatter
argument described above can also be passed to this function.
The testImage
function is a special case, since it won't actually add a test to the output. Instead, it only expects one argument: a function taking the environment, that will generate an image when called. By default, this function will make the output wrong if the expected image wasn't generated. This behaviour can be changed by setting the optional failIfAbsent
parameter to FALSE
.
The testDF
function can be used to test the equality of
dataframes. By default row and column order are ignored. If you do not
want this, pass the ignore_col_order
and ignore_row_order
arguments as FALSE
(when applicable). Again, a custom comparator
can be passed if necessary. The feedback in Dodona will show the first
five rows of the dataframe(s).
The testGGPlot
function can be used to test the equality of GGPlot
s. Aside from the usual description
, generated
and expected
arguments it has some optional arguments:
show_expected = TRUE
A logical value indicating whether the solution plot should be shown to the student if thetestGGPlot
function determines the solution to be incorrect. If set toFALSE
the student won't be able to compare their plot with the solution which could make an exercise really hard to solve. Please handle with care.test_data = TRUE
A logical value indicating whether the input data to theggplot
function should be verified. This test will succeed if all columns from the solution have a corresponding column in the given input with the same column name and data.test_geom = TRUE
A logical value indicating whether the geometric layers should be checked. For each layer the parameters and aesthetics are verified. This test method also takes into account the default aesthetics set in theggplot
function itself.test_facet = TRUE
A logical value indicating whether the facet layer should be tested. This test supportsfacet_grid
andfacet_wrap
layers and can compare them to one another (e.g. afacet_grid
with only 1 row/column can be equal to afacet_wrap
and vice versa).ignore_facet_type = TRUE
A logical value indicating whether the facet type should be tested when testing the facet layer. Even thoughfacet_grid
andfacet_wrap
return similar graphs, the instructor may want to forcefacet_grid
as it can create clearer graphs when faceting with multiple variables.test_label = FALSE
A logical value indicating whether the label layers should be tested.test_scale = FALSE
A logical value indicating whether the scale of the axis should be tested.
Note: Because we want the testing of the
ggplot
to be as flexible as possible the test functions are all made in a way that the given solutionggplot
is the plot that defines the minimal requirements for the student plot. When writing exercises this is a very important aspect to keep in mind. For example we don't recommend testing plots where you defined parameters to be used in geometric layers in theggplot
function itself. This because the test function would test for these parameters in every geometric layer in the student plot even when they are not used.
Note: When testing ggplots we recommend using geom layers instead of stat layers. Both provide the same functionalities but tests written with geom layers will also work for ggplots with stat layers, this is not he case for tests written with stat layers.
The testFunctionUsed
function is a function you can use to test if a certain function is used in the student code. The function takes 1 parameter: the name of the function you want to make sure the student used.
The testFunctionUsedInVar
function is a function you can use to test if a certain function is used in the assignation of a certain variable in the student code. It can also detect indirect assignations: testFunctionUsedInVar("mean", "a")
will add a correct test to the feedback if the student code is a <- b <- mean(1)
. As you can see in the example the function takes 2 parameters. The first parameter should be the name of the function you want to test for. The second parameter is the name of the variable where the given function should be used in its assignment.
The testHtest
function is a function you can use to test objects of the htest
class (e.g. result of t.test
function). Aside from the usual description
, generated
and expected
arguments it has some optional arguments:
test_p_value = TRUE
A logical value indicating whether the p-value should be tested.test_interval = TRUE
A logical value indicating whether the confidence interval should be tested.test_statistic = FALSE
A logical value indicating whether test-statistic should be tested.test_alternative = FALSE
A logical value indicating whether the alternative should be tested.test_confidence_level = FALSE
A logical value indicating whether the confidence level should be tested.test_method = FALSE
A logical value indicating whether the used method should be tested.
The testMultipleChoice
function is a function you can use to test multiple choice questions. This function can handle multiple right anwsers if they are passed in a vector. Aside from the usual description
, generated
and expected
arguments it has the following arguments:
possible_answers
A vector/list containing all possible options for the multiple choice question. The options can be integer or character values.verify_answer = FALSE
A logical value indicating whether the given answer should be tested. When set toTRUE
the judge will tell the student if his answer is wrong or correct. When set toFALSE
the judge will only test if the given answer is valid (contained withinpossible_answers
).give_feedback = TRUE
A logical value indicating whether the judge should show the student where their mistake is.feedback = NULL
A list containing optional extra information about why a certain option is wrong. This should be a named list with the options as names when your options are characters.show_expected = FALSE
A logical value indicating whether the judge should show the correct anwser to the student.
⚠️ We do not recommend using this test method because it won't deliver an optimal experience for students nor teachers.
If you would like to see multiple choice questions implemented in Dodona you can voice your support in this Dodona issue.