-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate state of the test node to help adapters to give always expected information to the extensions/data consumers #3759
Comments
Just so that it's documented here, I raised the question of whether it's "illegal" (or, more specifically, troublesome) to the platform whether xUnit.net sent an "in-progress" message and then never got around to sending a result message, which in part inspired this issue. Our generalized use case is: test framework (which is extensible) says "I'm starting to run this test", and then says "I've decided not to run this test" (aka, its result state is "NotRun"). This is contractually one of the result states that we support, in addition to the more traditional "Passed", "Failed", and "Skipped". As the runner (which is separate from the test framework), we receive these messages at independent times: an In practical terms, the "in-the-box test framework" ( So my original concern was: Is it an error condition to send Microsoft Testing Platform an "in progress" node for a test, and then never send the "execution state" node for that test? The additional data point that made me ask that is this: Microsoft Testing Platform does not have a "not run" execution state, but the Test Explorer UI most definitely has a "not run" execution state (it's the state all tests are in before they've been run once). Using TPv2 today, we preserve this "not run" execution state successfully; using Microsoft Testing Platform, the only way for us to preserve the "not run" execution state is to simply not send Microsoft Testing Platform an "execution state" node. Sending any of the three legal "execution state" nodes would mark the test visually as either passed, failed, or skipped; none of those is technically correct. So we just don't send anything, in an attempt to preserve the proper Test Explorer UX. Anyway, that's the background from the xUnit.net perspective. Here is our command line experience vs. the MTP command line experience w/rt not run tests:
|
Thanks @bradwilson for the sample |
I can come up with many arguments for and against this approach. So I am undecided. :) Having a "strict" mode where we can account for the amount of tests is nice, but as you describe we don't know about how tests are being executed. We can only assume the most basic things like: a failed test means a failed run, but even that falls apart when we introduce test re-runs. Similarly for tests bound to dynamic data (like query from a db), there can be different count per run. So this all only works if we would send list of tests to run, and expected getting all of them back, but that still has downsides (as we can see in VS Test Explorer that adopted this approach some time ago), you have to serialize and send all the test cases, account for the dynamic data tests that might not be the same as before, and account for tests that were added in the meantime. |
Unstable data sets are especially problematic for pre-enumerated data, so frequently commented on that I wrote a FAQ page about it. 😂 https://xunit.net/faq/theory-data-stability-in-vs |
Does this mean my page is no longer necessary? I haven't had anybody complain about unstable data sets in quite a while, so maybe the problem disappeared and I never noticed. 😂 Until TestDriven.NET decided to stop shipping versions, that was my preferred test runner, so my personal use of Test Explorer is fairly recent. |
Yes that is how it currently works in VS with vstest, you send a full list of previously discovered tests to the runner to run them. In your case you serialize the parameters and recieve them back so tests run "as before", and re-running the tests will be deterministic, even though it will be outdated if the underlying source of data is dynamic. Testing platform imho does not do this, and will re-run all with a filter. Preferring the data being up to date, rather than tests that TE already knows about being re-runnable. |
The new filter will end up skipping new data as well as data that no longer exists, because having only the test case ID and not the serialized test case means we must perform a rediscovery, so unfortunately we will end up in the same scenario as “run all”: new data ignored (though at least not run), and old data that no longer exists will not be run despite ostensibly being in the requested filter. |
At the moment we don't do it, we decided to rely only on the concept of test id and avoid all the roundtrip of serialized stuff in favor of "hash id" handled by the adapter for performance(serialization, deserialization, network and storage) and simplicity reason. The adapter should provide unique id for what it decides is "a test". I also think that have "deterministic"(in the sense of freeze the parameters) run is not ideal, I change my tests and I do run all, I'd like to see all my tests to run also the new one, so that run all should really run all and update the ux accordingly. Rebuild should re-discover the new ids. It's on the adapter decide if fold unstable tests with common id or different id(hash based on args) and in case of run a no more available id the run will result in 0 run with the ux that should take it as "there's no test with this id anymore". In our internal prototype we codegen the [ids,test] dictionary to be quick to find out single test(also if the balance between how much codegen you have to do+the build time+jitting is not always better than plain reflection where metadata anyway are already loaded for runtime needs). Let's see if it will be enough this model or we can think to something different. |
That's an opinion, and not one that's shared by everybody. There are people who believe (and I am one of them) that if we discover 5 tests and then say "run all" with no changes to the source, then those 5 tests should run. In the past, that was not true, because Test Explorer would say "Run All" without context, and if the data set isn't stable, it's impossible to get those 5 tests again. The only way to rationalize this was to tell people to turn off pre-enumeration of unstable datasets.
There is a UX difference between "I wrote a new test" or "I changed a test" vs. "I just built my source, I see all the tests, now run them". What you're describing is the former, and what my post was written about was the latter. For your scenario, this is really up to Test Explorer to do the right thing here with "run all" by making tests that no longer exist disappear, and showing the results for new tests that it wasn't informed about ahead of time. As previously noted, simply doing a build + discover + run all isn't going to be sufficient when datasets aren't stable (not to mention the re-discover pass, aside from being ineffective, would also be a waste of time). I will say that even now with TPv2, trying to use This is behavior we can't control. Test Explorer is the one that needs to behave better here.
It does. But with unstable datasets, discovery will hand you back something ephemeral: you can't run it unless you give me back the serialized data set. This part does at least work with TPv2, and I'm concerned that with MTP this is going to be a regression for people: they won't be able to look at a single data row test in Test Explorer and know for certain that they'll be able to run it unless they know the dataset is stable.
xUnit.net does not (and cannot) know that data is unstable. Only the developer who writes the data source can know this. That's why we tell people to turn off pre-enumeration with unstable data because of the way Test Explorer has been broken with this in the past. It sounds like we will continue to have to make this recommendation because the design of MTP makes it impossible for us to present an unstable data row to Test Explorer and allow the user to run it. |
This is true, but is it correct? Suppose I have a database/csv I discover and I get 5 tests, I go to the datasource and I remove 2 and I run all and 5 will run. Is it correct to say that I'm running the correct set of expected tests? Why you think that the "set of the input" is not part of the tests? I see dataset and source at same level here, a change in one or other could break my tests assertions. I could remove one test and a bug manifest because with old set I did some mess with "statics"(suppose a code bug) and I'm affecting the other tests.
I think that the UX should behave better in this case, with different icons to notify that something changed. But run a "temporary" snapshot looks to me is not running what will run when I'll open the PR or in CI.
I need to understand the "real gestures" that cause unstable datasets. Because if we're talking about a sample where I have a non static source like a csv or a database that I need to expand on the fly and I cannot know using simply reflection(or sources elements) ho many cases I have, my feeling is that the case when we're "unstable" is during the "development" of that specific test, until I've added all my tests I iterate on the discovery and run(and here like above the ux should be better). Or if the input is not stable when "committed" it's not clear to me what user expect to assert, should we assert deterministic case or?
Why the adapter cannot expand and generate the id from the |
I think you're conflating two things:
I have no argument that the first is something that should be expected and embraced. What I want is for Test Explorer to be improved to handle it better so that it doesn't require a manual build & discovery stage just to ensure that all the data rows will be correctly reported. For the 2nd case, we already tell developers to disable theory pre-enumeration when they know the data is not reliably stable. This will be even more important with MTP than TPv2, because at least with TPv2 users could reliably run individual rows that were discovered; without test case serialization, that capability is no longer available. I can verify that with MTP and 17.12 Preview 2, Test Explorer still behaves poorly in the face of unstable data sets. If the source code changes, it appears to re-run discovery before "Run All", but if the source does not change, it does not re-perform the discovery, and therefore leaves the outdated data row in place and does not report the new data row at all. That includes data that is random/unpredictable, and I also verified that if I read the data from a data file (marked as "copy if newer", and thus part of the compilation process). So this goes back to why we tell people to stop pre-enumeration when they know the dataset may be unstable. If Test Explorer can be improved to handle this situation better, then it would certainly be better for users with unstable datasets (and we would be able to soften the language of the documentation page).
We do that. I'm not sure why you think we don't.
Returning a stable ID in the fact of an unchanging data row is already how it works today (with both TPv2 and MTP).
This filtering does work when the data has not changed, because the ID is stable. The problem is fundamentally one that has to be fixed in Test Explorer; we can't fix how it handles unexpected situations (like a test being removed and/or a test being added between discovery and execution).
We let the user make that decision, because they're the only ones who can. Otherwise, the only thing we could do would be to disable pre-enumeration entirely, and our users would be very unhappy with this decision. 99%+ of users have stable datasets and we'd be punishing the vast majority of users for the sake of a few. |
I need to do another check, I recall when I was running xunit tests suite that for one dataset tests I was running one test and the result were more than one with the same id, I thought was this case of folding, but maybe I've misread.
Ok got it now what you mean, I never saw case like that in our codebases or in my past, I see it a bit antipattern and I've usually discouraged it, I would expect deterministic dataset values or it's hard not have flakiness or make assertions, but for now I think it's fine I don't see these cases as so frequent and if it will be a problem we can add the a specific "metadata" property that an adapter can add and we send back in an opaque way like today. Today we're paying that serialization/deserialization+networking+space in every case.
I agree on this, I'd like too. |
Ah, you're talking about the case where we have multiple test results for a single test case. Yes, this most commonly happens when users disable theory pre-enumeration (or the data is not serializable). Because of our need to support TPv2, the serializability requirement still exists in v3, and as previously mentioned, we recommend users disable pre-enumeration for unstable datasets. Even if we threw away TPv2 compatibility and removed the serializability requirement, we're still stuck (worse now, IMO) with the unstable dataset problem that mandates that we return a single test case that ends up with multiple test results.
Personally, I agree, but we don't prevent our users from doing it. 😂 |
Ref #3478 (comment)
At the moment "the platform" is nothing more than a IPlatformOutputDevice that collects an print in console or in server mode sent states outside. We don't have validation on what adapters are sending. We could think to add a "strict" verification of the state of every test node at least from the perspective of the lifecycle expected by the platform "at minimum"(you can push what you want if other extensions understand your shared model, like trx), like "discovered(non mandatory) -> in-progress(not mandatory) -> executionstate(mandatory)" rejecting with an exception if you push out of this expected state. cc: @Evangelink
The text was updated successfully, but these errors were encountered: