add support for minimization of amplified tests #54

monperrus · 2017-01-31T14:08:47Z

Motivation: During amplification, there is some neutral test evolution happening. This results in very long and unreadable tests. However, many changes in the amplified test are not required. The goal is minimization is to reduce the size and increase the readability of amplified test cases.

What: Implement a minimization algorithm (such as delta-debugging) to remove useless statements in amplified test cases.

Hints: For instance, useless statement are local variable that are set and never modified such as Object myObject = null; The local variable should be in-lined in this case. For tests that expect an exception, every statement after the exception the one that throws it can be removed.

monperrus · 2017-12-08T09:32:23Z

initial attempt in #154

sbihel · 2018-02-23T16:36:47Z

I think it would make sense to remove all amplifications that have no impact on the increase of mutation score.

Simple instrumentation could be used to detect useless generated assertions.

As for input amplification, I think we have to define a limit:

an added input can be removed;
a modified input can be reverted to its original state (i.e. minimising the amplification/noise).

Because if we apply general unit test or even source code minimisation it might be harder for the developer to identify the original test? And they can apply general-purpose minimisation on their own anyway.

monperrus · 2018-02-23T17:55:12Z

See also the idea of "delta debugging" to minimize.

danglotb · 2018-02-23T22:35:10Z

I think it would make sense to remove all amplifications that have no impact on the increase of mutation score.

Yes, that the idea.

See also the idea of "delta debugging" to minimize.

The major con with this approach, is the time consumption. In fact, it will take "a lot" of execution of PIT, and so a lot of time.

Simple instrumentation could be used to detect useless generated assertions.

What do you suggest?

In addition to this, we introduce comments in amplified tests and I think they create a lot of noise. Maybe we could first, remove them, when we aim at presenting amplified test to developers.

Would you think that this minimization should be automatically done, and enabled by default, or we should provide it, as an "external service tool" of DSpot?

monperrus · 2018-02-24T16:27:57Z

Would you think that this minimization should be automatically done, and enabled by default,

yes, I think so, in order to maximize the prettiness of the generated tests, so that people like them, also by their look'n'feel. (In Dspot, we generate tests for humans, not for machines)

sbihel · 2018-02-25T17:33:44Z

Simple instrumentation could be used to detect useless generated assertions.

What do you suggest?

I was thinking of adding a call to a counter after each added assertion. The test would be executed on the new detected mutants and if an assertion never lowers the counter then that means that it never fails, thus is useless.

In addition to this, we introduce comments in amplified tests and I think they create a lot of noise. Maybe we could first, remove them, when we aim at presenting amplified test to developers.

If comments were removed, we (DSpot or the developer) would have to rely on a diff to identify the amplifications, right? Would that be a good solution? By that I mean, does the pretty printer of Spoon generate a source code with the same style as the test given as an input?

Would you think that this minimization should be automatically done, and enabled by default,

yes, I think so, in order to maximize the prettiness of the generated tests, so that people like
them, also by their look'n'feel. (In Dspot, we generate tests for humans, not for machines)

It would also be easier to interact with the main amplification process. To have a more powerful interface.

danglotb · 2018-02-26T21:14:02Z

I was thinking of adding a call to a counter after each added assertion. The test would be executed on the new detected mutants and if an assertion never lowers the counter then that means that it never fails, thus is useless.

The problem is, that we execute the mutation analysis through maven goals. So, it is a new JVM, we will need serialization to obtain infos about the runs and it is kinda of tricky, right?

By that I mean, does the pretty printer of Spoon generate a source code with the same style as the test given as an input?

I think you can rely on the print of Spoon.

It would also be easier to interact with the main amplification process. To have a more powerful interface.

We need to minimize only test that have been selected.

In one hand, if there is a selection it means that the minimization is tight to the selection, right?

In the other hand, some minimization can be done regardless any test criterion such as the in-lining of local variable.

I set up some classes and a test about that: #338. I'm gonna at least this general minimization, using static analysis of the program.

WDYT?

sbihel · 2018-02-27T08:50:58Z

The problem is, that we execute the mutation analysis through maven goals. So, it is a new JVM, we will need serialization to obtain infos about the runs and it is kinda of tricky, right?

What if each test wrote a report in a file?

In the other hand, some minimization can be done regardless any test criterion such as the in-lining of local variable.

Yes but what I don't really understand is that it will modify the original test. What if the author of the test though it was clearer to use a variable?

danglotb · 2018-02-27T12:21:42Z

What if each test wrote a report in a file?

It will be the same than serialization / deserialization. I have some issues here.

During the mutation analysis:
In the case an assertion never fail, (I am not sure it happens but w/e), we can remove it.
In the case an assertion fails, there are two cases:

it detects an already detected mutants by the original test suite.
it detects a new mutant.

In addition to this, we have another dimensions: What we do with the amplified test?

Does the amplified test is an improved version of an existing test, and in this case, the 1. should be kept, since the amplified test is meant to replace the original test.
The amplified test has a new semantic, derived from a existing test, in this case the 1. should be removed, since we will keep the original test and the new test.

I'll think about it.

Yes but what I don't really understand is that it will modify the original test. What if the author of the test though it was clearer to use a variable?

You made a point here. Maybe we should only minimize what DSpot added. We may rely on name convention of local variables, DSpot names them something like __DSPOT_XX. We may also only in-line local variable initialized with literals.

In any case, we won't be able to satisfy everybody, and need to make choices.

sbihel · 2018-02-27T13:40:06Z

In addition to this, we have another dimensions: What we do with the amplified test?

Does the amplified test is an improved version of an existing test, and in this case, the 1. should be kept, since the amplified test is meant to replace the original test.

The amplified test has a new semantic, derived from a existing test, in this case the 1. should be removed, since we will keep the original test and the new test.

I agree. In the second case, would we still want new mutants to be located in the same method?

sbihel · 2018-02-28T18:29:44Z

It will be the same than serialization / deserialization. I have some issues here.

Would INRIA/spoon#1874 be useful?

danglotb · 2018-03-07T21:22:49Z

Hi @sbihel

Would you mind to have look to #354

I propose a minimizer for the ChangeDetectorSelector.

The ChangeDetectorSelector runs amplified test against a "modifier" version of the same program and will keep only amplified test that fail.

The goal is to have amplified tests that encode a change, e.g a new feature or a regression bug.

My idea is to perform a delta-diff on assertions, i.e. remove one by one assertions and see if the amplified test still fail.

WDYT?

sbihel · 2018-03-08T10:15:35Z

Hi @danglotb,

Wouldn't we need a list of input programs to have all mutants detected by the test case?

Thanks for your efforts 👍

danglotb · 2018-03-09T12:01:42Z

As I said, some minimization are related to the test criterion used.

For instance, if I use the mutation score as a test criterion, the minimization must keep the mutation score obtained after the amplification.

Here, I am talking about another test criterion: encode a behavioral changes.

The point is, with this selector, that we obtain amplified tests that pass on a given version, and fail on the other one. Such amplified tests, encode the, desired or undesired, behavioral changes.

In one hand, when I say desired, it means that maybe, the developer want that the behavior of the program changes, i.e. it creates a new feature or fix something.
In the other hand, when I say undesired, it might be a regression bug. Something that was working before, but does not anymore on the changed version. It means that amplification are able to capture something that was not captured before.

In both case, we win, because we can enhance the test suite.

Back to the minimization of a such test criterion, Do you think that we should only keep assertions that make the amplified test fails? If yes, does the failure should be the same?

sbihel · 2018-03-12T15:00:31Z

If a behavioural change is detected, that means we keep both versions in the test suite. And thus we can apply general minimisation on the amplified version, using the improved criterion for the combined tests.

I was thinking that a generated assertion could be a duplicate of an existing one. In that case the new assertion would falsely be useful. But if we focus on amplified assertions, with the delta-diff we would detect them.

And I think we should only keep amplified assertions that make the test fail because it enforces clarity on the generated test. If we wanted to keep the exact same failures as before, would it not greatly reduce the range of acceptable amplifications?

monperrus · 2018-06-25T14:09:52Z

there are two kinds of minimization

input minimization (removing method calls, etc)
assertion minimization (removing useless assertions or assertions redundant with others)

monperrus · 2018-07-10T14:21:01Z

See also: Fine-grained test minimization. | Arash Vahabzadeh, Andrea Stocco, Ali Mesbah
0001 | ICSE | 2018
URL: https://dblp.org/rec/conf/icse/VahabzadehS018

monperrus · 2018-11-16T13:33:14Z

RW:

An empirical study of the effects of minimization on the fault detection capabilities of test suites
Test set size minimization and fault detection effectiveness: A case study in a space application
On the effect of test-suite reduction on automatically generated model-based tests
Regression testing minimization, selection and prioritization: a survey

danglotb added the feature label Feb 1, 2017

monperrus added task available and removed feature labels Apr 25, 2017

monperrus mentioned this issue Jun 29, 2017

to generate test modifications instead of test additions #131

Closed

monperrus mentioned this issue Dec 8, 2017

ClassCastException on xwiki-commons-xml #257

Closed

monperrus added feature research labels Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for minimization of amplified tests #54

add support for minimization of amplified tests #54

monperrus commented Jan 31, 2017 •

edited

Loading

monperrus commented Dec 8, 2017

sbihel commented Feb 23, 2018 •

edited

Loading

monperrus commented Feb 23, 2018 via email

danglotb commented Feb 23, 2018

monperrus commented Feb 24, 2018 via email

sbihel commented Feb 25, 2018

danglotb commented Feb 26, 2018

sbihel commented Feb 27, 2018

danglotb commented Feb 27, 2018

sbihel commented Feb 27, 2018

sbihel commented Feb 28, 2018

danglotb commented Mar 7, 2018

sbihel commented Mar 8, 2018

danglotb commented Mar 9, 2018

sbihel commented Mar 12, 2018

monperrus commented Jun 25, 2018

monperrus commented Jul 10, 2018

monperrus commented Nov 16, 2018 •

edited

Loading

add support for minimization of amplified tests #54

add support for minimization of amplified tests #54

Comments

monperrus commented Jan 31, 2017 • edited Loading

monperrus commented Dec 8, 2017

sbihel commented Feb 23, 2018 • edited Loading

monperrus commented Feb 23, 2018 via email

danglotb commented Feb 23, 2018

monperrus commented Feb 24, 2018 via email

sbihel commented Feb 25, 2018

danglotb commented Feb 26, 2018

sbihel commented Feb 27, 2018

danglotb commented Feb 27, 2018

sbihel commented Feb 27, 2018

sbihel commented Feb 28, 2018

danglotb commented Mar 7, 2018

sbihel commented Mar 8, 2018

danglotb commented Mar 9, 2018

sbihel commented Mar 12, 2018

monperrus commented Jun 25, 2018

monperrus commented Jul 10, 2018

monperrus commented Nov 16, 2018 • edited Loading

monperrus commented Jan 31, 2017 •

edited

Loading

sbihel commented Feb 23, 2018 •

edited

Loading

monperrus commented Nov 16, 2018 •

edited

Loading