-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test with latest actors version #6998
Conversation
713afa3
to
337c65c
Compare
Rebasing on #7011. The tests now pass locally when doing this. I believe this was a timing issue (I'm now using a faster computer). However, this has me a bit concerned (for the tests, not the network). Now that we're starting from a later actors version, we may have to prove pre-seal sectors immediately where before the proving epoch would start later. In theory, we could start in the middle of of an active proving window. I'm pretty sure that's what was happening, actually... |
Given that the tests pass reliably in CI, I think we're probably fine. |
Hm. So I still see this issue if I enable TestDealWithMarketAndMinerNode (at least locally). Now, this test was disabled for a reason, but this is still interesting. |
337c65c
to
faf893e
Compare
Properly handle genesis in actors tests. Fast-forward upgrading to actors v13 doesn't work because there needs to be at least a day between v0 genesis and v13 (due to differences in miner cron).
faf893e
to
dcff06b
Compare
5000, // after | ||
-1, // before | ||
162, // while sealing | ||
560, // after upgrade deal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was happening a bit too early and causing a test failure.
-1, // before | ||
162, // while sealing | ||
530, // after upgrade deal | ||
5000, // after |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't really a point to this. It was just testing with the previous actors version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -584,14 +587,23 @@ func currentEpochBlockReward(ctx context.Context, vm *vm.VM, maddr address.Addre | |||
} | |||
|
|||
// TODO: This hack should move to reward actor wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this TODO still represent our current plans? If so I can look into scoping this work on the actors side. However it seems like our idea of this has changed and this is no longer considered a hack. If so let's remove the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would move into the reward actor wrapper. That is, the shim in chain/actors/builtin/reward. We have some abstractions there around message sending.
@@ -208,11 +216,15 @@ func (n *Ensemble) Miner(miner *TestMiner, full *TestFullNode, opts ...NodeOpt) | |||
genm *genesis.Miner | |||
) | |||
|
|||
// Default 2KiB sector for the network version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems like this should be an ensemble option so the default can be overridden, though maybe this should be left to whoever first wants to use non-2KiB sectors in an itest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm going to leave it to them.
} | ||
|
||
var DefaultEnsembleOpts = ensembleOpts{ | ||
pastOffset: 10000000 * time.Second, // time sufficiently in the past to trigger catch-up mining. | ||
upgradeSchedule: stmgr.UpgradeSchedule{{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the fact that this is so nicely expressed as GenesisNetworkVersion(build.NewestNetworkVersion)
makes me wonder if we'd be slightly better off with a DefaultEnsembleOpts of type []EnsembleOpts
applied to a zero valued ensembleOpts
. The name ensembleOpts
could be changed to something different, ensembleCfg
?, to make this a little less confusing.
Not very important just riffing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'd much rather have a set of default options (also avoids cases where the default "config" gets modified accidentally). But that'll be a future cleanup.
@@ -63,7 +65,7 @@ func testWindowPostUpgrade(t *testing.T, blocktime time.Duration, nSectors int, | |||
require.NoError(t, err) | |||
|
|||
t.Log("Running one proving period") | |||
waitUntil := di.PeriodStart + di.WPoStProvingPeriod + 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this what fixed the "proving deadline 0 in deadline 1" issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that issue appeared to be "my old laptop was too slow to compute 2k sector proofs and seal 2k sectors at the same time".
Here, I just needed to wait a full proving period instead of waiting until our next proving period started. It likely worked before because the sectors happened to align on the right proving periods for this to "just work". But the new code is "more correct".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see that issue in the "+20". Termination happens at the end of the deadline always. If we need "slack", something is wrong.
We were already trying to do this, but had some issues:
This patch:
TODO figure out why it's failing:When I remove the check for t_aux files (do we need them with fauxseals?), post still doesn't work. I'm seeing an error about trying to prove deadline 0 when we're already in deadline 1, which is suspiciously like Deadlines can get messed up around the start of a new network specs-actors#1453.Answer: in case 2, my machine was just too slow. CI and my new machine are faster.