Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support TEE logging and support running eliza in Intel SGX #1470

Merged
merged 21 commits into from
Jan 9, 2025

Conversation

ShuochengWang
Copy link
Contributor

@ShuochengWang ShuochengWang commented Dec 26, 2024

Relates to:

Keywords: TEE, Intel SGX, Logging, Attestation, Verification, Gramine.

Risks

Low

Background

What does this PR do?

This PR introduces support for TEE (Trusted Execution Environment) logging and enables the Eliza application to run within Intel SGX (Software Guard Extensions).

As Eliza is a fully autonomous AI agent capable of running within a TEE, we need to demonstrate to the outside world that we are indeed operating within a TEE. This allows external parties to verify that our actions are protected by the TEE and that they are entirely executed by Eliza, without any third-party interference. Therefore, it is necessary to leverage TEE's remote attestation and establish a TEE logging mechanism to prove that these operations are entirely and autonomously performed by Eliza within the TEE.

Meanwhile, the existing plugin-tee only supports running Eliza in dstack TDX CVM. However, although TDX is more convenient to use, Intel SGX remains a highly popular TEE in production environments. With the help of Gramine LibOS, it is possible to support running Eliza in SGX, thereby enabling the deployment of Eliza in a broader range of TEE scenarios.

What kind of change is this?

Features

  1. Support running Eliza in SGX
  2. Add plugin-sgx to support SGX attestation
  3. Add plugin-tee-log to support TEE logging (Gramine SGX and Dstack TDX)
  4. Add REST API in client-direct to support retrieve TEE logs

Details

TEE Logging Mechanism:

  1. Key Pair Generation and Attestation:

    • During startup, each agent generates a key pair and creates a remote attestation for the public key. The private key is securely stored in the TEE's encrypted memory. The agent's relevant information, along with the public key and attestation, is recorded in a local database. A new key pair is generated each time the agent is updated or restarted to ensure key security.
  2. Log Recording:

    • For each log entry, basic information is recorded, including agentId, roomId, userId, type, content, and timestamp. This information is concatenated and signed using the agent's corresponding private key to ensure verifiability. The verification process follows this trust chain:
      • Verify the attestation.
      • Trust the public key contained in the attestation.
      • Use the public key to verify the signature.
      • Trust the complete log record.
  3. Data Storage:

    • All log data must be stored in the TEE's encrypted file system in production environments. Storing data in plaintext is prohibited to prevent tampering.
  4. Log Extraction for Verification:

    • Third parties can extract TEE logs for verification purposes. Two types of information can be extracted:
      • Agent Information: This includes the agent's metadata, public key, and attestation, which can be used to verify the agent's public key.
      • Log Information: Required logs can be extracted, with the agent's attestation and public key used to verify the signature, ensuring that each record remains untampered.
  5. Integrity Protection:

    • When users extract TEE logs via the REST API, the results are hashed, and an attestation is generated. After extraction, users can verify the attestation by comparing the hash value contained within it to the extracted results, thereby ensuring the integrity of the data.

Documentation changes needed?

Need to add new documentation about TEE logging and how to run Eliza in SGX

Testing

Where should a reviewer start?

  1. SGX Gramine support
  2. plugin-sgx
  3. plugin-tee-log
  4. other parts

Detailed testing steps

Test SGX support

First, you need to prepare the SGX environment and install the Gramine dependencies according to https://gramine.readthedocs.io/en/stable/index.html

Then, start eliza in SGX:

pnpm i
pnpm build

# Start default character
SGX=1 make start
# Start specific character
SGX=1 make start -- --character "character/trump.character.json"

Test TEE logging

To get started, prepare the TEE environment. Both dstack TDX and Gramine SGX are supported.

Next, enable TEE logging by configuring the .env file:

ENABLE_TEE_LOG=true 

The logging isn't integrated for actions by default, you need to integrate the logging for the actions you want to log. For example, if you want to log the Continue action of plugin-bootstrap, you can do the following:

First, add plugin-tee-log to the dependencies of plugin-bootstrap:

"@elizaos/plugin-tee-log": "workspace:*",

Then, add the following code to the Continue action:

import {
    ServiceType,
    ITeeLogService,
} from "@elizaos/core";


// In the handler of the action
    handler: async (
        runtime: IAgentRuntime,
        message: Memory,
        state: State,
        options: any,
        callback: HandlerCallback
    ) => {
        // Continue the action

        // Log the action
        const teeLogService = runtime
            .getService<ITeeLogService>(ServiceType.TEE_LOG)
            .getInstance();
        if (teeLogService.log(
                runtime.agentId,
                message.roomId,
                message.userId,
                "The type of the log, for example, Action:CONTINUE",
                "The content that you want to log"
            )
        ) {
            console.log("Logged TEE log successfully");
        }

        // Continue the action
    }

@HashWarlock HashWarlock self-requested a review December 27, 2024 17:26
@HashWarlock
Copy link
Collaborator

This looks great! For testing quickly with one of my SGX machines, can I run the gramine docker image and clone this repo to deploy an agent?

@HashWarlock
Copy link
Collaborator

HashWarlock commented Dec 27, 2024

Steps I took to test on my sgx machine using gramine docker image :) I'm debugging more on this, but I'll try with install of gramine on the machine next.

docker pull gramineproject/gramine
docker run --device /dev/sgx_enclave --rm -it gramineproject/gramine bash
gramine-sgx-gen-private-key
apt-get update
apt-get install git vim
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
nvm install v23.3.0
npm install -g pnpm
pnpm i
pnpm build
# Here this fails bc when pnpm build runs it does not find the plugin-tee dependency
cd packages/plugin-tee-log
pnpm i
# I also had to edit line for checking if TEE type is TDX in the teeLogService.ts file
cd - # go back to root directory
pnpm build
cp .env.example .env
SGX=1 make start -- --character "character/c3po.character.json"
# Fails now due to permission denied /dev/sgx_enclave

@HashWarlock
Copy link
Collaborator

@ShuochengWang Were you able to run the agent in SGX without any constraints? What specs was this tested on? I have an SGX1 chip & I hit the OOM error and crashes my machine. Typically, I've seen OOM errors for agents with up to 8GB of memory so this may be something to take into account with running Eliza in SGX.
telegram-cloud-photo-size-1-5149959516878649432-y

@ShuochengWang
Copy link
Contributor Author

@ShuochengWang Were you able to run the agent in SGX without any constraints? What specs was this tested on? I have an SGX1 chip & I hit the OOM error and crashes my machine. Typically, I've seen OOM errors for agents with up to 8GB of memory so this may be something to take into account with running Eliza in SGX. telegram-cloud-photo-size-1-5149959516878649432-y

Running Eliza in SGX does not have any special restrictions. However, there are two points to note:

  • The Node.js path needs to be configured.
  • A larger EPC memory size needs to be configured. In my tests, running Eliza smoothly in SGX requires configuring 64GB of EPC memory. Such a large memory footprint is caused by some indirect dependencies of Eliza on WASM. WASM requests a significant amount of memory during initialization, and configuring a smaller memory size will result in OOM (Out of Memory) errors.

I will soon update a detailed step-by-step guide for running Eliza in SGX.

@ShuochengWang
Copy link
Contributor Author

ShuochengWang commented Dec 28, 2024

SGX=1 make start -- --character "character/c3po.character.json"

Hi, I tested it in SGX using a clean project and have added a quick start guide for running it in SGX.

During the following steps, the only issue I encountered was a build error caused by the missing plugin-tee dependency in plugin-tee-log. This error can be resolved by simply running the build command again. However, this is an issue that needs to be addressed later.

Other than that, everything worked smoothly during the test. If you encounter any other issues while following these steps, please feel free to let me know.

Note: Currently, I have set the SGX EPC size (memory size) to 64 GB in the Gramine manifest. In reality, Eliza itself does not require this much memory. However, some dependencies of Eliza rely on WebAssembly (WASM), and initializing WASM demands a significant amount of memory. If the memory is insufficient, you may encounter the following error:

RangeError: WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for a new instance.

To mitigate this issue, I configured the enclave size to 64 GB. This is a temporary workaround, and we need to optimize WASM memory usage in the future.


Quick Start

First, you need to prepare a SGX enabled machine.

Then, you can use the following command to start a Gramine Docker container:

sudo docker run -it --name eliza_sgx \
    --mount type=bind,source={your_eliza_path},target=/root/eliza \
    --device /dev/sgx/enclave \
    --device /dev/sgx/provision \
    gramineproject/gramine:stable-jammy

After entering the docker, you can use the following command to prepare the Eliza environment:

# Generate the private key for signing the SGX enclave
gramine-sgx-gen-private-key

cd /root/eliza/

# Install nodejs and pnpm
# Node.js will be installed at `/usr/bin/node`.
# Gramine will utilize this path as the default Node.js location to run Eliza.
# If you prefer to use nvm for installing Node.js, please ensure to specify the Node.js path in the Makefile, as the installation path for nvm is not `/usr/bin/node`.
apt update
apt install -y build-essential
apt install -y curl
curl -fsSL https://deb.nodesource.com/setup_23.x | bash -
apt install -y nodejs=23.3.0-1nodesource1
npm install -g pnpm

# Build Eliza
pnpm i
# The build may fail on the first attempt due to the missing `plugin-tee` dependency in `plugin-tee-log`. Simply run the build command again to resolve the issue.
# TODO: fix the build issue
pnpm build

# Copy the .env.example file to .env
cp .env.example .env
# Edit the .env file

# Start Eliza in SGX
SGX=1 make start -- --character "character/c3po.character.json"

@HashWarlock
Copy link
Collaborator

Switch branch to merge to develop

@ShuochengWang ShuochengWang changed the base branch from main to develop December 31, 2024 06:24
@ShuochengWang
Copy link
Contributor Author

Switch branch to merge to develop

Done, already switched to the develop branch.

@ShuochengWang
Copy link
Contributor Author

Done, the code has been updated with no conflicts, and all checks have passed successfully. @HashWarlock

@ShuochengWang
Copy link
Contributor Author

The previous build error has been fixed. The issue was related to the build order. Apart from plugin-tee-log, other plugins such as plugin-evm also depend on plugin-tee. These dependencies need to be managed by Turbo to control the build order.

HashWarlock
HashWarlock previously approved these changes Jan 2, 2025
Copy link
Collaborator

@HashWarlock HashWarlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HashWarlock HashWarlock changed the title Feat: support TEE logging and support running eliza in Intel SGX feat: support TEE logging and support running eliza in Intel SGX Jan 2, 2025
@ShuochengWang
Copy link
Contributor Author

Due to updates in the develop branch, some conflicts have arisen, and I have just resolved all of them. Please review the changes again (no functional updates, only merging develop branch). I hope we can merge it as soon as possible to avoid new conflicts. Thanks a lot @HashWarlock

HashWarlock
HashWarlock previously approved these changes Jan 3, 2025
@HashWarlock
Copy link
Collaborator

@ShuochengWang great work! Thanks for updating. @odilitime how do we look here?

@ShuochengWang
Copy link
Contributor Author

Is there any update? I hope we can merge it as soon as possible. I’ve resolved conflicts several times before, and now new conflicts have emerged again... And resolving conflicts requires another round of review...

@HashWarlock
Copy link
Collaborator

Is there any update? I hope we can merge it as soon as possible. I’ve resolved conflicts several times before, and now new conflicts have emerged again... And resolving conflicts requires another round of review...

Hm, i can't merge it now bc of conflicts 😕 which timezone are you in? Ill try to sync on time so I can merge it in when conflicts resolved. I think I should be able to do merge now

@HashWarlock
Copy link
Collaborator

@ShuochengWang reach out to me on telegram (hashwarlock) or ping me here when the conflicts get resolved then I will merge ASAP

@ShuochengWang
Copy link
Contributor Author

@HashWarlock Sorry for the late reply—been swamped lately. All conflicts are fixed now, so it's good to merge. Mind giving it another look?

@HashWarlock HashWarlock enabled auto-merge January 9, 2025 17:31
Copy link
Collaborator

@HashWarlock HashWarlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShuochengWang Last thing is to update lock file based on failed test then it can be merged. Nvm, guess it auto merged when it was fixed

@HashWarlock HashWarlock merged commit 0d532c5 into elizaOS:develop Jan 9, 2025
5 of 6 checks passed
0xpi-ai pushed a commit to 0xpi-ai/NayariAI that referenced this pull request Jan 15, 2025
feat: support TEE logging and support running eliza in Intel SGX
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants