Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get the bare metal setup working on my Epyc 7313p #944

Open
blenessy opened this issue Oct 20, 2024 · 2 comments
Open

Cannot get the bare metal setup working on my Epyc 7313p #944

blenessy opened this issue Oct 20, 2024 · 2 comments

Comments

@blenessy
Copy link
Contributor

I could follow the Bare metal setup without any issues on a newly installed Ubuntu 24.10 system (with Linux 6.11 containing the SNP Host patches) without issues:

AFAIK I enabled SNP correctly and upgraded to the latest SEV firmware correctly:

root@epyc-7313p:~# dmesg | grep -i 'SEV\|ccp'
[   13.774631] ccp 0000:44:00.1: enabling device (0000 -> 0002)
[   13.778684] ccp 0000:44:00.1: no command queues available
[   13.780132] ccp 0000:44:00.1: sev enabled
[   13.780141] ccp 0000:44:00.1: psp enabled
[   13.841416] ccp 0000:44:00.1: SEV firmware update successful
[   14.807197] ccp 0000:44:00.1: SEV API:1.55 build:21
[   14.807204] ccp 0000:44:00.1: SEV-SNP API:1.55 build:21
[   14.816262] kvm_amd: SEV enabled (ASIDs 509 - 509)
[   14.816265] kvm_amd: SEV-ES enabled (ASIDs 1 - 508)
[   14.816267] kvm_amd: SEV-SNP enabled (ASIDs 1 - 508)

From the Emojivoto guide, I managed to:

  1. Install the runtime with the fix in node-installer: has too little memory #943.
  2. Install the Coordinator without problems
  3. Generate metadata.json and friends with contrast generate
  4. Set the correct Min. TCP for my system:
       "MinimumTCB": {
          "BootloaderVersion": 3,
          "TEEVersion": 0,
          "SNPVersion": 22,
          "MicrocodeVersion": 211
        }

My problem is that running the following command times out: contrast set -c "${coordinator}:1313" --coordinator-policy-hash c36809d83e5b2c7853e95ed08434ff2b7bca4ae1b471229d66dcf712918fcf6f deployment/

Here is the interesting parts of the coordinator-0 kubectl log:

time=2024-10-20T19:21:52.864Z level=INFO msg="Logger initialized" level=INFO
time=2024-10-20T19:21:52.865Z level=INFO msg="Coordinator started"
time=2024-10-20T19:21:52.874Z level=INFO msg="csi device not identified, assuming first start, formatting"
time=2024-10-20T19:21:53.065Z level=INFO msg="csi device mounted to state disk mount point" dev=/dev/csi0 mountPoint=/mnt/state
time=2024-10-20T19:21:53.067Z level=INFO msg="Coordinator user API listening"
time=2024-10-20T19:21:53.067Z level=INFO msg="Coordinator mesh API listening"
time=2024-10-20T19:22:36.438Z level=INFO msg="Issue called" issuer.tee-type=snp
time=2024-10-20T19:22:36.444Z level=INFO msg="Retrieved report" issuer.tee-type=snp issuer.reportRaw=020000000000000000000300000000000000000000000000000000000000000000000000000000000000000000000000000000000100000003000000000016d301000000000000000000000000000000d3ff55398259bc3b0551013278c170ef3db2db1c64e872f6fb3bee319e895e2c7c4a35a4d692bc284e06c9b031ddb5391bffdc605272042a69b005bed0caec18cde33fb25b0af5f9ae88485ec6c0e86bd1c67d2fe722b38084084b12118a0b1b85d2dd0fd7645b5257ec773f85aba61cc36809d83e5b2c7853e95ed08434ff2b7bca4ae1b471229d66dcf712918fcf6f000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000fd197969dcf013aeb3391edbe5a19c73c3f755daaf6a55d625026782e543df1affffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff03000000000016d30000000000000000000000000000000000000000000000000f8f087080b26bbb05e94cd452344a55cb34c86ad96d31b04e4b9756222b1c40891051636cd27110babb5ca8fa2fc7efaed6d301676daa7834b149222d95ba1e03000000000016d3153701001537010003000000000016d30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006a9bed2db23542b11d93627e6ab108d3c23850e2ff4abbe57a62ff1b67995722ba657127a3f5946a9a04cc09298777c50000000000000000000000000000000000000000000000007acf46cd5638ee0fc50dce35b8043bbb0519ba8ad5785d0ef4c0b91a606ff9d54d70e76b3b12987e7c12c0e4f080ee880000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ecae0c0f950243b1afa20ae2e0d565b6300000000800000000000000000000000000000000000000000000000000000008000000010fa000
time=2024-10-20T19:23:06.439Z level=INFO msg="Issue called" issuer.tee-type=snp
...
@Freax13
Copy link
Contributor

Freax13 commented Oct 21, 2024

The TCB and the policy hash match the values in the attestation report. Your setup looks good to me.

My problem is that running the following command times out: contrast set -c "${coordinator}:1313" --coordinator-policy-hash c36809d83e5b2c7853e95ed08434ff2b7bca4ae1b471229d66dcf712918fcf6f deployment/

Can you set --log-level debug and try again?

Can you double-check that you can reach the port at ${coordinator}:1313 from the machine where you're executing contrast set?

@blenessy
Copy link
Contributor Author

  1. Having be away for a while, I shut down my dev. Epyc 7313. After a fresh start the problem in question magically disappeared - iow. the following command succeeded:
    contrast set -c "${coordinator}:1313" \
      --coordinator-policy-hash c36809d83e5b2c7853e95ed08434ff2b7bca4ae1b471229d66dcf712918fcf6f deployment/ \
      --log-level debug
  2. Then I retried the same command (1) again... and failure.
  3. Then I did a systemctl restart k3s
  4. Then I retried the same command (1) again... and success.

It does seem like there is some idempotency problem here. Attaching logs as requested.
contrast-cli.log
contrast-0.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants