-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tapret_wlt_receiving_opret: de-ignore to show bug #6
Conversation
622804a
to
4fe648d
Compare
I'm not sure this is helpful but I've noticed a strange thing: before rebasing this PR the CI was succeeding https://github.com/RGB-WG/rgb-tests/actions/runs/10809438685/job/29984552488 By running the test locally it always failed though, so I'm not sure it was just a strange behavior from the CI or if it means the error is not deterministic |
I did
UPD: The test fails on CI but always succeeds locally |
The test contain an issue: when I save the consignment I see that |
It might be that the issue is this one: RGB-WG/rgb-std#270 |
858ef03
to
27d4fc6
Compare
No, this is not the reason of undeterminism of that test. I've added a commit here that proves the test fails even if the mining method makes sure the TX gets included in a block |
No, I am talking not about the test failure, but about success. The test succeeds and not all of txes are mined. |
The test with the added code (which adds |
I read your comment after seeing your ACK on the other PR and merging it... Also I do not know why github auto-closed this PR! |
The non-determinism is about txes being not mined. I do not see how this can be related to the rgb code... RGB std doesn't touch indexers neither sign nor broadcasts transactions |
In RGB-WG/rgb-std#272 (review) you say:
Here you say:
So which of these statements is the latest one? The problem I see is the fact that somehow |
The fact that the test passes every time with the fixing PR (RGB-WG/rgb-std#272) included while the exact same test is non-deterministic without the fixing PR to me is a proof that the non-deterministic behavior is not caused by the test but by the PR.
The test with the fix always passes. The same test without the fix sometimes passes and sometimes doesn't.
The fact the PR doesn't touch anything wallet- or transaction-related doesn't exclude the possibility that the code changed/removed by the PR was causing the non-determinism. As stated, if you are still convinced the non-determinism is about mining please prove it. |
No it doesn't prove it. Test may remain non-deterministic. Let's say before the fixing PR there were two scenarios how it can run, A and B, and the CI runs them non-deterministically. A always fail, B always succeeds. The PR fixes A scenario failure. Thus, test always succeeds now, while both non-deterministic scenarios are still there.
So this doesn't demonstrate that non-determinism is gone: see the explanation above.
Well, I have seen that After the PR I still see in the YAML just one transaction in the history instead of the three. Thus, the scenario B with no payment 1 happening is still there; it is just the fact that PR fixes scenario A makes test to always succeed.
It is not about being convienced; there is just no other options taking the facts into the account:
So the problem for non-determinism is somewhere in this specific test function. There could be no other logical conclusion. |
27d4fc6
to
15492ce
Compare
I dug deeper, in order to find what was causing the non-deterministic outcome of the test. let mut attempts = 10;
loop {
mine(resume);
if self.get_tx_height(txid).is_some() {
break;
}
attempts -= 1;
if attempts == 0 {
panic!("TX is not getting mined");
}
} A deeper inspection confirms that the issue was not with TX mining as you instead thought. Comparing a successful test run to a failing one what I see is:
This discrepancy clearly doesn't come from the TX not being confirmed. Therefore, I continued debugging and what I found is that the non-deterministic behavior of the test was due to a different order in the calls to Debugging this in depth allowed me to understand that the succeeding case was not actually succeeding and therefore that the test was not complete enough. So I've added a 4th and 5th transfers to this PR, that now make the test always fail. On top of this PR, I re-checked the fix in RGB-WG/rgb-std#272 and it seems we still have a bug. Check this branch to see it yourself. Therefore I've just opened a new issue RGB-WG/rgb-std#275 to tackle this. |
I was not talking about that. I am talking about completely different thing. Please just use the fixed branches which do not ever fail. Than check what you see in the YAML of consignments and you will see the tx is not mined (the last consignment has just one witness!). For a bunch of times (>dozen). Than it appears there (now you will see three witnesses in the last consignment) - for another dozen of times. In both cases the test succeeds. What switches this is not clear. This shows that the test runs and succeeds, but its execution path is non-deterministic and somehow transaction is missed. What you checked is that you confirmed that there is a bug with anchors which was fixed - yes, we already know that. But it is unrelated to the problem I am talking about. |
I haven't seen the behavior you reported and I don't understand how a TX could not get mined. Please prove what you're saying by adding some assertions to the code because it's important to understand if tests are really bugged or if there are other non-deterministic behaviors in RGB (other than the one I've just proved). |
15492ce
to
def083a
Compare
Yes, I was trying to explain the problem of this non-determinism in tx mining since a week but it seems I have failed all the time.
I do not know the codebase structure well enough to understand which assertions have to be added and where - so the only way how I can "prove" this is to propose you to look into the YAML consignments after multiple test runs (the way I did it myself) |
def083a
to
19b79ff
Compare
We always discussed (both here and in private messages) about non-determinism in the context of the test that was sometimes failing and sometimes succeeding. For example here you said:
So yes, communication failed. Anyway we need to make sure there are no more non-deterministic behaviors, both in the tests and in RGB code.
I've looked at consignments several times while I was debugging and I've never seen what you're describing. Anyway, since statements like this needs to be demonstrated with code, I've added some assertions to the test to check the expected number of The result was quite unexpected:
So, getting into details, in both
assert_eq!(consignment.bundles.len(), 1);
assert_eq!(consignment.terminals.len(), 1);
assert_eq!(consignment.bundles.len(), 2);
assert_eq!(consignment.terminals.len(), 0); While for the consignment of the third transfer we have that:
assert_eq!(consignment.bundles.len(), 2);
assert_eq!(consignment.terminals.len(), 1);
assert_eq!(consignment.bundles.len(), 0);
assert_eq!(consignment.terminals.len(), 0); Not sure this is the cause of RGB-WG/rgb-std#275 but this definitely seems like a bug. Anyway, to me these assertions prove there is no such case of a TX not getting mined and that the only non-deterministic behavior was in |
19b79ff
to
fe4642c
Compare
The problem I witnessed is the following: there are two scenarios in which test runs. The scenarios are following:
Before RGB-WG/rgb-std#272 (1) was failing and (2) was suceeding (you haven't witnessed that, but I had and reported you in telegram private messages before doing PR 272). After PR 272 both scenarios become succeeding, but they both are still there (that is what I meant "the non-determinism was there before and is still there, even though the test succeeds"). The strangest part is that test stays on one of the scenarios for ~10-20 runs in a raw, and then "switches" to another scenario, stays there for another ~20 runs, switches back and so on. So this is undetectable with a CI which does just a single run. You need to run many-many times and than eventually you will witness the switch. I have broken my head trying to understand how that can be possible and how to hunt this issue down. I have no suggestions which additonal assetions are required - I will look much more into the new code you've done over the weekend, run it on the server for hundred times and will hunt for the assertions fired. |
Basically this seems to be the bug I am talking about: sometimes the third consignment spends genesis, not seeing the previous transactions! The fact that it doesn't happen all the time. I see you run the first test for 100 times, but you fail on the first assertion. Try to run it on different days and record how many transactions the consignment has in the third transfer - and you will see the two scenarious I am talking about. The same thing should be true also without RGB-WG/rgb-std#275 - I just assume you need to wait some time before runs (few hrs? I do not know). I can't imagine what can influence that thing. Maybe stopping docker containers and starting them again... |
fe4642c
to
4a242d6
Compare
I've opened #16 with the |
So assertions on the number of bundles in the last transfer + 100 times + running different times should work, yes |
I've ran the tests of #16 several times on the CI on different days/hours and as you can see they always passed. So can we now agree the tests have a deterministic behavior? |
The problem with determinism is that it can't be proven through tests: only formally. So for now I agree that we do not have a replicate non-deterministic behavior. But I won't feel safe until I will find out what I was experiencing in the past and why it has gone. BTW introduction of nix builds should really help with determinism. However I have zero experience with it - nor time to understand how to use it. So maybe we will leave that to the future |
Is it possible that you were checking the wrong consignments? I mean, it's really hard to believe that without changing the code a non-deterministic behavior became deterministic out of the blue.
I'm not sure about this. IMO the system determinism shouldn't affect the deterministic (or non-deterministic) behavior of a test. Also in a real world scenario users usually don't use nix, so IMO it's better to run tests in widely used systems |
I run just one test, and checked all produced consignments, so very low probability
That what surprises me and I am tend to believe that this is something with how mining works in dockerized environments on different platforms and in different server restarts. Nix fixes this
From what I know it is actually one of the main reasons for the non-determinisim --- and the reason why Nix and Guix were created first hand
We need both |
There are 2 wallets and 3 consignments involved, so I don't think the probability is so low. Also, speaking about probabilities, what is more likely? A human error or a software bug that only shows up on your machine? Also, to get a better picture now that we have the asserts, does the test in #16 pass every time on the machine where you've seen the non-deterministic behavior?
Docker creates an environment that basically only shares the kernel with the host system, with all other components built deterministically and in a reproducible way, which is why one of its main uses is to have the same behavior between development, test and production systems. Nix does something similar, with one of its advantages being the ability to roll-back changes to system configuration if something goes wrong, but that's not useful in this context. All in all, I fail to see the difference between Docker and Nix in this context and expect Docker to do a perfect job in isolating differences across systems.
No, this is not the reason Nix was created. It was created to address challenges in software package management, reproducibility, and system configuration. So its purpose is for system determinism, not for test determinism.
It makes sense to have both. We have Docker to get a reproducible testing environment, that can be reproduced in CI and locally. We can run tests in different conditions and on different machines, and we already do. It's just that when reporting an issue it makes more sense to have it as reproducible as possible to reduce the debugging effort. |
If you are convinced, than there is no issue, since I have no time or desire for re-convincing you in something you are sure. Since you are convinced and there is nothing to do, we can close this now, right? |
Now that #16 shows the behavior to be deterministic I am convinced, unless you show some new results that tell otherwise. I would be happier if you were now convinced as well, or provided some new data that can let us move forward in the debugging.
This PR cannot be closed since the 4th transfer of the test is failing and the "fixing" PR (RGB-WG/rgb-std#272) produces a wrong consignment on the 3rd transfer. Therefore this needs to stay open until RGB-WG/rgb-std#275 gets solved and a new release containing the fix will be done. |
I think I have found the reason there were less bundles in the consignments: RGB-WG/rgb-std@54f19cd#diff-59abbfa17998502bbb68f7980e463575b31ddeba39bd9f0635ac812575f7013bR1514-R1522 That fragment was removing terminal if there were more than one bundle per witness - which was the case for tapret/opret combined transfer. That's why the consignment was missing the bundles. And yes, transfer with missing bundles were succeeding since it was not invalid; and there are no checks on whether the assets are really moved with the transfer (except attempt to spend them later). |
4a242d6
to
652630f
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6 +/- ##
==========================================
+ Coverage 36.37% 36.42% +0.05%
==========================================
Files 281 279 -2
Lines 42390 42806 +416
==========================================
+ Hits 15421 15594 +173
- Misses 26969 27212 +243
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
To be merged once we have a fix for this. Run
cargo test tapret_wlt_receiving_opret
to see the issue:i.e.
wlt_2
considers the consignment (created bywlt_1
) invalid, the sender is trying to spend opret and tapret allocations together. This issue is similar to RGB-WG/rgb-std#246, but there the sender was failing to make the transfer