Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foundational work to get a cluster of pools for integration testing #1731

Merged
merged 8 commits into from
Jun 9, 2020

Conversation

KtorZ
Copy link
Member

@KtorZ KtorZ commented Jun 8, 2020

Issue Number

N/A

Overview

  • 5a263bc
    📍 move content of 'Test.Utils.Ports' to 'Cardano.Wallet.Network.Ports'

  • 6ed9efc
    📍 rework the shelley launcher code so that we can start a cluster of nodes
    using 'withCluster ... ... 0' will result in the same behavior as before. Right now, pools would
    exists as process but not on chain. In order to exist on chain, pools need to be registered, which
    will be done in a separate commit.

  • 1a8b720
    📍 fix database path

  • 6745ea9
    📍 Regenerate nix

  • 6b7b947
    📍 assign sensible default ordering to 'listAllTransactions'

Comments

This should make it possible to run a cluster of pools as simply as:

        let numberOfPools = 3
        withCluster tr Info numberOfPools $ \socketPath block0 (gp,vData) -> ...

So far, the pools won't have any stake and won't therefore be any useful.. I'll give them some stake in another PR.

KtorZ added 3 commits June 8, 2020 09:17
using 'withCluster ... ... 0' will result in the same behavior as before. Right now, pools would
exists as process but not on chain. In order to exist on chain, pools need to be registered, which
will be done in a separate commit.
@KtorZ KtorZ requested a review from paweljakubas June 8, 2020 08:31
@KtorZ KtorZ self-assigned this Jun 8, 2020
@KtorZ KtorZ requested a review from Anviking June 8, 2020 08:31
-> IO (Either ProcessHasExited a)
withCluster tr severity n action = do
ports <- randomUnusedTCPPorts (n + 1)
print $ permutations ports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably we want to use our logging infrastructure someday in the future - if not doing this now maybe add TO_DO to not miss it later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we have now print - and maybe use our logging here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, something more descriptive than

[(32403,[])]

would be nice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry. I was really confused by your comment at first and didn't get why. This line is actually an oversight. I removed it 👍

Copy link
Contributor

@paweljakubas paweljakubas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like sound scaffolding for what is coming next

--
-- >>> permutations [1,2,3]
-- [(1,[2,3]), (2, [1,3]), (3, [1,2])]
permutations :: Eq a => [a] -> [(a, [a])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@paweljakubas paweljakubas Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then fmap with f(x:xs)=[x,[xs]] ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work because it's does not actually compute all permutations (unlike Data.List.permutations). Using Data.List.permutations on the example above, we would also get [(1, [3,2]), (2, [3,1]), (3, [2,1])]. So, we should also nub + sort the second element which is arguably simpler than the current function 🤔 ..

Changed for:

    rotate :: Ord a => [a] -> [(a, [a])]
    rotate = nub . fmap (\(x:xs) -> (x, sort xs)) . permutations


forM_ (init $ permutations ports) $ \(port, peers) -> do
withAsync
(withStakePool tr severity (port, peers) $ readChan doneGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the n-th withStakePool in the n-1-th's action?

Will all nodes stop if on stake pool goes down?

Copy link
Member Author

@KtorZ KtorZ Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the n-th withStakePool in the n-1-th's action?

What?

Will all nodes stop if on stake pool goes down?

No. If one pools doesn't start for some reason, others will be cancelled. But once they've all started, if one dies, the cluster will keep running.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What?

Something like this:

withStakePool args $ do
    withStakePool args $ do
        withStakePool args $ do
            .
              . n times
                .

Slower to start, but maybe simpler, and should stop all if one goes down.

Copy link
Member

@Anviking Anviking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks neat

@KtorZ
Copy link
Member Author

KtorZ commented Jun 8, 2020

bors r+

iohk-bors bot added a commit that referenced this pull request Jun 8, 2020
1731: Foundational work to get a cluster of pools for integration testing r=KtorZ a=KtorZ

# Issue Number

<!-- Put here a reference to the issue this PR relates to and which requirements it tackles -->

N/A

# Overview

<!-- Detail in a few bullet points the work accomplished in this PR -->
- 5a263bc
  📍 **move content of 'Test.Utils.Ports' to 'Cardano.Wallet.Network.Ports'**
  
- 6ed9efc
  📍 **rework the shelley launcher code so that we can start a cluster of nodes**
  using 'withCluster ... ... 0' will result in the same behavior as before. Right now, pools would
exists as process but not on chain. In order to exist on chain, pools need to be registered, which
will be done in a separate commit.

- 1a8b720
  📍 **fix database path**
  
- 6745ea9
  📍 **Regenerate nix**
  
- 6b7b947
  📍 **assign sensible default ordering to 'listAllTransactions'**
  
# Comments

<!-- Additional comments or screenshots to attach if any -->

This should make it possible to run a cluster of pools as simply as:

```hs
        let numberOfPools = 3
        withCluster tr Info numberOfPools $ \socketPath block0 (gp,vData) -> ...
```

So far, the pools won't have any _stake_ and won't therefore be any useful.. I'll give them some stake in another PR. 

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: KtorZ <[email protected]>
Co-authored-by: IOHK <[email protected]>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jun 8, 2020

Build failed

@KtorZ KtorZ force-pushed the KtorZ/shelley-stake-pool-setup branch from 19559c4 to 771b90f Compare June 8, 2020 17:40
@KtorZ
Copy link
Member Author

KtorZ commented Jun 8, 2020

bors r+

iohk-bors bot added a commit that referenced this pull request Jun 8, 2020
1731: Foundational work to get a cluster of pools for integration testing r=KtorZ a=KtorZ

# Issue Number

<!-- Put here a reference to the issue this PR relates to and which requirements it tackles -->

N/A

# Overview

<!-- Detail in a few bullet points the work accomplished in this PR -->
- 5a263bc
  📍 **move content of 'Test.Utils.Ports' to 'Cardano.Wallet.Network.Ports'**
  
- 6ed9efc
  📍 **rework the shelley launcher code so that we can start a cluster of nodes**
  using 'withCluster ... ... 0' will result in the same behavior as before. Right now, pools would
exists as process but not on chain. In order to exist on chain, pools need to be registered, which
will be done in a separate commit.

- 1a8b720
  📍 **fix database path**
  
- 6745ea9
  📍 **Regenerate nix**
  
- 6b7b947
  📍 **assign sensible default ordering to 'listAllTransactions'**
  
# Comments

<!-- Additional comments or screenshots to attach if any -->

This should make it possible to run a cluster of pools as simply as:

```hs
        let numberOfPools = 3
        withCluster tr Info numberOfPools $ \socketPath block0 (gp,vData) -> ...
```

So far, the pools won't have any _stake_ and won't therefore be any useful.. I'll give them some stake in another PR. 

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: KtorZ <[email protected]>
Co-authored-by: IOHK <[email protected]>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jun 8, 2020

Build failed

group <- waitAll
if length (catMaybes group) /= n then do
cancelAll
throwIO $ ProcessHasExited
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is mixing thrown exceptions with IO (Either e a). If the bft node fails, it will return Left. If the others fail to start, they will throw.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've changed that later actually while continuing working on this, but this isn't visible from this PR yet

staking:
slotsPerKESPeriod: 86400
slotLength: 8
slotLength: 0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a short slot. So if the tests get paused for 100x0.2=20secs (for whatever reason), they will not be able to continue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tweaked according to what the ledger / consensus teams are using for testing as well; it seems fairly okay though. I don't get why the tests wouldn't be able to continue on if they pause for 20s 🤔 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The epoch length is 20s and I think if an entire epoch's worth of slots are missed then it will not be able to elect slot leaders for the next epoch. But it should be ok anyway. Nobody's going to press ctrl-z during integration tests and expect them to pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster is made of one BFT leader + 3 pools though, so there's always the BFT leader capable of producing blocks. I am not sure how that influence the slot leader election however 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody's going to press ctrl-z during integration tests and expect them to pass.

I guess that's also a fair assumption :)

Comment on lines +29 to +30
, findPort
, randomUnusedTCPPorts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we wouldn't have test util functions in the core library.
But I think you couldn't put it into the core-integration library due to a package cycle. Is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no. I needed to share them and Cardano.Wallet.Network.Ports felt like a logical place to put them. I see your point about the "non production" aspect of these two. I believe they could fit in core-integration as well 🤔

@rvl
Copy link
Contributor

rvl commented Jun 9, 2020

The error from bors does look like a legitimate test failure:

Failures:

  src/Test/Integration/Scenario/API/Shelley/Transactions.hs:887:5:
  1) API Specifications TRANS_LIST_02,03x - Can limit/order results with start, end and order
       uncaught exception: IOException of type UserError
       user error (expected a successful response but got an error: DecodeFailure "Error in $: parsing [] failed, expected Array, but encountered Object: Response {responseStatus = Status {statusCode = 400, statusMessage = \"Bad Request\"}, responseVersion = HTTP/1.1, responseHeaders = [(\"Transfer-Encoding\",\"chunked\"),(\"Date\",\"Mon, 08 Jun 2020 12:15:56 GMT\"),(\"Server\",\"Warp/3.3.5\")], responseBody = \"{\\\"code\\\":\\\"start_time_later_than_end_time\\\",\\\"message\\\":\\\"The specified start time '2020-06-08T12:15:56.471958Z' is later than the specified end time '2020-06-08T12:15:55.871958Z'.\\\"}\", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}")

  To rerun use: --match "/API Specifications/TRANS_LIST_02,03x - Can limit/order results with start, end and order/"

Randomized with seed 1856737488

Finished in 223.8642 seconds
549 examples, 1 failure, 11 pending
builder for '/nix/store/6f339cm8hhd0k0xrdxmwbldkv8awg4yg-cardano-wallet-shelley-test-integration-2020.6.5-check.drv' failed with exit code 1

@KtorZ
Copy link
Member Author

KtorZ commented Jun 9, 2020

@rvl Yes it does, although I thought I fixed it in 6b7b947 but I obviously didn't. I only re-run once locally after this commit and it was probably luck that made it pass :/

One second is actually pretty large and we ended up in cases where,
adding one second to the time of the first transaction would make it
later than the second transaction! Causing some tests to fail for a
wrong reason. This delta needs to be at most the slot length, so I've
made it 100ms.
@KtorZ
Copy link
Member Author

KtorZ commented Jun 9, 2020

bors r+

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Jun 9, 2020

@iohk-bors iohk-bors bot merged commit 0233f53 into master Jun 9, 2020
@iohk-bors iohk-bors bot deleted the KtorZ/shelley-stake-pool-setup branch June 9, 2020 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants