Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add e2e test for prover, update prover protocol version #2975

Merged
merged 169 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
169 commits
Select commit Hold shift + click to select a range
42ee7e7
initial commit
Artemka374 Sep 27, 2024
b87c38d
add to CI
Artemka374 Oct 2, 2024
05ac4b9
Merge branch 'refs/heads/main' into afo/prover-e2e
Artemka374 Oct 2, 2024
2541e0e
some more impl
Artemka374 Oct 3, 2024
f5ca585
add binaries, add --dev flag to prover init
Artemka374 Oct 3, 2024
d2509a9
fix condition in init
Artemka374 Oct 3, 2024
9fea585
kinda finalize workflow
Artemka374 Oct 3, 2024
a3c76c5
fix workflow
Artemka374 Oct 3, 2024
87ced96
make it run only e2e workflow
Artemka374 Oct 3, 2024
f9fe8e0
stub changes to push docker images
Artemka374 Oct 3, 2024
8605630
update workflow
Artemka374 Oct 3, 2024
ca8a499
fix workflow
Artemka374 Oct 3, 2024
9127f5b
fix image tag
Artemka374 Oct 3, 2024
aff7bab
add reth data folder
Artemka374 Oct 4, 2024
6ef981b
add foundry to cuda images
Artemka374 Oct 4, 2024
8c4b4b2
allow forge
Artemka374 Oct 4, 2024
5ab5d2e
allow murky
Artemka374 Oct 4, 2024
cc9cbda
add exceptions
Artemka374 Oct 4, 2024
320afa6
add exceptions
Artemka374 Oct 4, 2024
a9bec3d
add exceptions
Artemka374 Oct 4, 2024
a1d6499
add exceptions
Artemka374 Oct 4, 2024
12e76e0
Merge branch 'main' into afo/zk-env-docker-stub
Artemka374 Oct 4, 2024
e14956c
fix
Artemka374 Oct 4, 2024
b07a885
add exceptions
Artemka374 Oct 4, 2024
dc8029e
add exceptions
Artemka374 Oct 4, 2024
aceef71
fix
Artemka374 Oct 4, 2024
2d72661
containers up
Artemka374 Oct 4, 2024
76173e6
add verbose
Artemka374 Oct 4, 2024
b59251d
remove containers
Artemka374 Oct 4, 2024
d189b5d
add reth to docker-compose up
Artemka374 Oct 7, 2024
8f35acb
try adding some logging
Artemka374 Oct 7, 2024
85700c5
use correct flag
Artemka374 Oct 7, 2024
4132824
try with localnet_up
Artemka374 Oct 7, 2024
dc90131
try without localnet up
Artemka374 Oct 7, 2024
cf58a82
try with gpu once again
Artemka374 Oct 7, 2024
7bec998
update docker compose
Artemka374 Oct 7, 2024
eb10bd7
return workflow back
Artemka374 Oct 7, 2024
7f227e5
add profile
Artemka374 Oct 7, 2024
323da40
return exceptions back
Artemka374 Oct 7, 2024
e26377d
remove gcc
Artemka374 Oct 7, 2024
7b78577
install prover CLI
Artemka374 Oct 7, 2024
60dc847
install prover CLI for 11.8
Artemka374 Oct 7, 2024
b8677f1
return gcc back
Artemka374 Oct 7, 2024
923b923
check gcc version
Artemka374 Oct 7, 2024
c62eac9
update images to ubuntu 22.04
Artemka374 Oct 7, 2024
25055cc
return rocksdb deps back
Artemka374 Oct 7, 2024
dad4df6
remove PPA
Artemka374 Oct 7, 2024
1e8d17f
install only liburing and libclang with ppa
Artemka374 Oct 7, 2024
4db4e8b
remove dependencies
Artemka374 Oct 7, 2024
55da64f
move prover CLI to next steps
Artemka374 Oct 7, 2024
2aedc6f
print gcc version
Artemka374 Oct 7, 2024
3963786
use gcc-10
Artemka374 Oct 7, 2024
26cec46
use gcc-10
Artemka374 Oct 7, 2024
aafe5a7
use ubuntu 22.04 for 11.8
Artemka374 Oct 8, 2024
c415e19
remove gcc version
Artemka374 Oct 8, 2024
321661e
Merge branch 'refs/heads/afo/zk-env-docker-stub' into afo/prover-e2e
Artemka374 Oct 8, 2024
0b14bac
Merge branch 'refs/heads/main' into afo/prover-e2e
Artemka374 Oct 8, 2024
0239641
return uring back
Artemka374 Oct 8, 2024
223fe48
fix chain
Artemka374 Oct 8, 2024
5b5df63
make prover db setup with dev flag
Artemka374 Oct 8, 2024
0e970c3
install prover CLI in CI
Artemka374 Oct 8, 2024
707546f
update build prover binaries
Artemka374 Oct 8, 2024
d3dce1d
run PJM
Artemka374 Oct 8, 2024
60c9dd3
fix a few things
Artemka374 Oct 8, 2024
7816900
print current status in prover jobs checker
Artemka374 Oct 8, 2024
8a68666
add sccache
Artemka374 Oct 8, 2024
b06e585
increase timeouts a bit, try fix binaries
Artemka374 Oct 8, 2024
7de1d0a
try fix binaries
Artemka374 Oct 8, 2024
3eb01ce
fix binaries, fix ports
Artemka374 Oct 8, 2024
a208295
fix env vars
Artemka374 Oct 8, 2024
3da7dfc
fix binary
Artemka374 Oct 8, 2024
065f989
increase timeout
Artemka374 Oct 8, 2024
cad5d95
increase timeout
Artemka374 Oct 8, 2024
9c37a0f
increase timeout
Artemka374 Oct 8, 2024
9126508
fix binary
Artemka374 Oct 8, 2024
de98d01
remove prover CLI
Artemka374 Oct 9, 2024
3836907
add waiting for batch
Artemka374 Oct 9, 2024
a7111b7
fix binaries
Artemka374 Oct 9, 2024
3ef8e82
decrease timeout
Artemka374 Oct 9, 2024
57e0fdf
Merge branch 'refs/heads/main' into afo/prover-e2e
Artemka374 Oct 9, 2024
cd63c34
remove skip serializing
Artemka374 Oct 9, 2024
b62e7c0
remove curl
Artemka374 Oct 9, 2024
2d2631a
use circuit prover instead of prover+WVG
Artemka374 Oct 9, 2024
125a045
add wvg count to circuit prover params
Artemka374 Oct 9, 2024
2d54545
add more sleep
Artemka374 Oct 9, 2024
462f6da
add more sleep
Artemka374 Oct 9, 2024
73fcf87
Merge branch 'main' into afo/prover-e2e
Artemka374 Oct 9, 2024
9c8c2a5
run prover from binary
Artemka374 Oct 9, 2024
6c8a366
Merge remote-tracking branch 'origin/afo/prover-e2e' into afo/prover-e2e
Artemka374 Oct 9, 2024
f72f4eb
add sleep again
Artemka374 Oct 9, 2024
a43cfbf
fix server
Artemka374 Oct 9, 2024
42aa5e9
return regular prover back
Artemka374 Oct 9, 2024
9f47209
Merge branch 'main' into afo/prover-e2e
Artemka374 Oct 9, 2024
5452052
increase timeouts
Artemka374 Oct 9, 2024
8a80096
Merge remote-tracking branch 'origin/afo/prover-e2e' into afo/prover-e2e
Artemka374 Oct 9, 2024
6db7558
increase timeout to see logs, add check for database
Artemka374 Oct 9, 2024
e5e0c91
fmt
Artemka374 Oct 10, 2024
af206de
add clean_gpu binary
Artemka374 Oct 10, 2024
53aa050
add uploading logs, fix clean_gpu
Artemka374 Oct 10, 2024
1b23b6d
mkdir logs
Artemka374 Oct 10, 2024
7f163c4
fix ampersand
Artemka374 Oct 10, 2024
f85cd68
try fixing clean_gpu binary
Artemka374 Oct 10, 2024
37e776f
try fix logs
Artemka374 Oct 10, 2024
c9387a9
test gpu binary
Artemka374 Oct 10, 2024
9f043ca
fix prover logs
Artemka374 Oct 10, 2024
870d07d
fix binary
Artemka374 Oct 10, 2024
4f62cc0
fix binary
Artemka374 Oct 10, 2024
1ea3a24
fix binary
Artemka374 Oct 10, 2024
aa6d980
Merge branch 'refs/heads/main' into afo/prover-e2e
Artemka374 Oct 10, 2024
46acabd
log nvidia-smi
Artemka374 Oct 10, 2024
b4ad77e
Merge branch 'refs/heads/main' into afo/prover-e2e
Artemka374 Oct 10, 2024
8136b4c
try other ways to get prover PID
Artemka374 Oct 10, 2024
ec6d212
try other ways to get prover PID
Artemka374 Oct 10, 2024
52c333e
try other ways to get prover PID
Artemka374 Oct 11, 2024
36e79e3
add pid: host in docker compose
Artemka374 Oct 11, 2024
5d13574
try to nvidia-smi from container
Artemka374 Oct 11, 2024
3b0fc67
add some sleep
Artemka374 Oct 11, 2024
355d592
temporarily show logs
Artemka374 Oct 11, 2024
94b29f6
try using ps ax to get PIDs
Artemka374 Oct 11, 2024
84b679a
try fix binaries
Artemka374 Oct 11, 2024
ab949b7
try fix binaries
Artemka374 Oct 11, 2024
80c50d6
add chmod
Artemka374 Oct 11, 2024
120f73f
fix binary
Artemka374 Oct 11, 2024
2319cd7
fix binary
Artemka374 Oct 11, 2024
8725978
make http URL dynamic
Artemka374 Oct 11, 2024
4c4e5eb
add circuit prover to zki
Artemka374 Oct 11, 2024
456e5fe
fix argument name
Artemka374 Oct 11, 2024
46c9726
Merge branch 'refs/heads/afo/add-circuit-prover-to-zki' into afo/prov…
Artemka374 Oct 11, 2024
2b6386c
use circuit prover in workflow
Artemka374 Oct 11, 2024
a3fa769
correct name
Artemka374 Oct 11, 2024
e2a4bc8
Merge branch 'main' into afo/prover-e2e
Artemka374 Oct 11, 2024
fea8e0a
fix clean gpu
Artemka374 Oct 11, 2024
10c0b31
Merge remote-tracking branch 'origin/afo/prover-e2e' into afo/prover-e2e
Artemka374 Oct 11, 2024
b98ea2b
rename binary
Artemka374 Oct 11, 2024
5400339
fmt
Artemka374 Oct 11, 2024
8ce693c
enable logging for gateway
Artemka374 Oct 11, 2024
0333192
fix running components
Artemka374 Oct 11, 2024
f19e123
fix lint, decrease timeouts
Artemka374 Oct 11, 2024
66eaf7c
try producing logs
Artemka374 Oct 11, 2024
0fd772a
print memory usage
Artemka374 Oct 11, 2024
c79b116
print CPU usage
Artemka374 Oct 11, 2024
8c451b2
try with wvg again
Artemka374 Oct 11, 2024
6936b30
wvg as background
Artemka374 Oct 11, 2024
577dc41
use binary for prover, fix url of api
Artemka374 Oct 12, 2024
f08bd51
remove logging CPU
Artemka374 Oct 12, 2024
0ace759
try circuit prover again with less wvg
Artemka374 Oct 12, 2024
35bb0b3
add sleep + runtime logging
Artemka374 Oct 14, 2024
5ffd58e
try another runner
Artemka374 Oct 14, 2024
2769c5f
remove sleep
Artemka374 Oct 14, 2024
317a3f7
some polishing
Artemka374 Oct 14, 2024
2562c3f
NL
Artemka374 Oct 14, 2024
4d41da0
return serialize_if back
Artemka374 Oct 14, 2024
8ae88db
return status back
Artemka374 Oct 14, 2024
2890dcd
Merge branch 'refs/heads/main' into afo/prover-e2e
Artemka374 Oct 14, 2024
eb10619
update to zkstack
Artemka374 Oct 14, 2024
c8b1878
fix kill_prover binary
Artemka374 Oct 14, 2024
2998c25
use more WVGs
Artemka374 Oct 14, 2024
d32c2e8
decrease WVG count
Artemka374 Oct 14, 2024
578bed9
address comments
Artemka374 Oct 14, 2024
bdc5817
address comments
Artemka374 Oct 14, 2024
7fc4eaa
address comments
Artemka374 Oct 15, 2024
be52864
address comments
Artemka374 Oct 15, 2024
7f9e475
address comments
Artemka374 Oct 15, 2024
f7fe1ee
Merge branch 'main' into afo/prover-e2e
Artemka374 Oct 15, 2024
267e664
fix DB URL
Artemka374 Oct 15, 2024
6925bad
Merge remote-tracking branch 'origin/afo/prover-e2e' into afo/prover-e2e
Artemka374 Oct 15, 2024
2cd39ff
fix database
Artemka374 Oct 15, 2024
9636b3c
Merge branch 'refs/heads/main' into afo/prover-e2e
Artemka374 Oct 16, 2024
c106936
update prover protocol version
Artemka374 Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions .github/workflows/ci-prover-e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
name: Workflow for testing prover component end-to-end
on:
workflow_call:

jobs:
e2e-test:
runs-on: [ matterlabs-ci-gpu-l4-runner-prover-tests ]
env:
RUNNER_COMPOSE_FILE: "docker-compose-gpu-runner-cuda-12-0.yml"

steps:
- uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4
with:
submodules: "recursive"
fetch-depth: 0

- name: Setup environment
run: |
echo ZKSYNC_HOME=$(pwd) >> $GITHUB_ENV
echo $(pwd)/bin >> $GITHUB_PATH
echo IN_DOCKER=1 >> .env
echo "SCCACHE_GCS_BUCKET=matterlabs-infra-sccache-storage" >> .env
echo "SCCACHE_GCS_SERVICE_ACCOUNT=gha-ci-runners@matterlabs-infra.iam.gserviceaccount.com" >> .env
echo "SCCACHE_GCS_RW_MODE=READ_WRITE" >> .env
echo "RUSTC_WRAPPER=sccache" >> .env

mkdir -p prover_logs

- name: Start services
run: |
run_retried docker-compose -f ${RUNNER_COMPOSE_FILE} pull
mkdir -p ./volumes/postgres ./volumes/reth/data
docker-compose -f ${RUNNER_COMPOSE_FILE} --profile runner up -d --wait
ci_run sccache --start-server

- name: Init
run: |
ci_run git config --global --add safe.directory "*"
ci_run chmod -R +x ./bin

ci_run ./zkstack_cli/zkstackup/install -g --path ./zkstack_cli/zkstackup/zkstackup || true
ci_run zkstackup -g --local

ci_run zkstack chain create \
--chain-name proving_chain \
--chain-id sequential \
--prover-mode gpu \
--wallet-creation localhost \
--l1-batch-commit-data-generator-mode rollup \
--base-token-address 0x0000000000000000000000000000000000000001 \
--base-token-price-nominator 1 \
--base-token-price-denominator 1 \
--set-as-default true \
--ignore-prerequisites

ci_run zkstack ecosystem init --dev --verbose
ci_run zkstack prover init --dev --verbose

echo "URL=$(grep "http_url" ./chains/proving_chain/configs/general.yaml | awk '{ print $2 }')" >> $GITHUB_ENV
- name: Build prover binaries
run: |
ci_run cargo build --release --workspace --manifest-path=prover/Cargo.toml
- name: Prepare prover subsystem
run: |
ci_run zkstack prover init-bellman-cuda --clone --verbose
ci_run zkstack prover setup-keys --mode=download --region=us --verbose
- name: Run server
run: |
ci_run zkstack server --uring --chain=proving_chain --components=api,tree,eth,state_keeper,commitment_generator,proof_data_handler,vm_runner_protective_reads,vm_runner_bwip &>prover_logs/server.log &
- name: Run Gateway
run: |
ci_run zkstack prover run --component=gateway --docker=false &>prover_logs/gateway.log &
- name: Run Prover Job Monitor
run: |
ci_run zkstack prover run --component=prover-job-monitor --docker=false &>prover_logs/prover-job-monitor.log &
- name: Wait for batch to be passed through gateway
Artemka374 marked this conversation as resolved.
Show resolved Hide resolved
env:
DATABASE_URL: postgres://postgres:notsecurepassword@localhost:5432/zksync_prover_localhost_proving_chain
BATCH_NUMBER: 1
INTERVAL: 30
Artemka374 marked this conversation as resolved.
Show resolved Hide resolved
TIMEOUT: 300
run: |
PASSED_ENV_VARS="DATABASE_URL,BATCH_NUMBER,INTERVAL,TIMEOUT" \
ci_run ./bin/prover_checkers/batch_availability_checker
- name: Run Witness Generator
run: |
ci_run zkstack prover run --component=witness-generator --round=all-rounds --docker=false &>prover_logs/witness-generator.log &
- name: Run Circuit Prover
run: |
ci_run zkstack prover run --component=circuit-prover --witness-vector-generator-count=10 --docker=false &>prover_logs/circuit_prover.log &
- name: Wait for prover jobs to finish
env:
DATABASE_URL: postgres://postgres:notsecurepassword@localhost:5432/zksync_prover_localhost_proving_chain
BATCH_NUMBER: 1
INTERVAL: 30
TIMEOUT: 1200
run: |
PASSED_ENV_VARS="DATABASE_URL,BATCH_NUMBER,INTERVAL,TIMEOUT" \
ci_run ./bin/prover_checkers/prover_jobs_status_checker

- name: Kill prover & start compressor
run: |
sudo ./bin/prover_checkers/kill_prover

ci_run zkstack prover run --component=compressor --docker=false &>prover_logs/compressor.log &
- name: Wait for batch to be executed on L1
env:
DATABASE_URL: postgres://postgres:notsecurepassword@localhost:5432/zksync_prover_localhost_proving_chain
BATCH_NUMBER: 1
INTERVAL: 30
TIMEOUT: 600
run: |
PASSED_ENV_VARS="BATCH_NUMBER,DATABASE_URL,URL,INTERVAL,TIMEOUT" \
ci_run ./bin/prover_checkers/batch_l1_status_checker

- name: Upload logs
uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0
if: always()
with:
Artemka374 marked this conversation as resolved.
Show resolved Hide resolved
name: prover_logs
path: prover_logs

- name: Show sccache logs
if: always()
run: |
ci_run sccache --show-stats || true
ci_run cat /tmp/sccache_log.txt || true
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,12 @@ jobs:
name: CI for Prover Components
uses: ./.github/workflows/ci-prover-reusable.yml

e2e-for-prover:
name: E2E Test for Prover Components
needs: changed_files
if: ${{(needs.changed_files.outputs.prover == 'true' || needs.changed_files.outputs.all == 'true') && !contains(github.ref_name, 'release-please--branches') }}
uses: ./.github/workflows/ci-prover-e2e.yml

ci-for-docs:
needs: changed_files
if: needs.changed_files.outputs.docs == 'true'
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/zk-environment-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,10 @@ jobs:
- docker/zk-environment/Dockerfile
- .github/workflows/zk-environment-publish.yml
zk_env_cuda_11_8:
- docker/zk-environment/20.04_amd64_cuda_11_8.Dockerfile
- docker/zk-environment/22.04_amd64_cuda_11_8.Dockerfile
- .github/workflows/zk-environment-publish.yml
zk_env_cuda_12:
- docker/zk-environment/20.04_amd64_cuda_12_0.Dockerfile
- docker/zk-environment/22.04_amd64_cuda_12_0.Dockerfile
- .github/workflows/zk-environment-publish.yml

get_short_sha:
Expand Down Expand Up @@ -245,7 +245,7 @@ jobs:
if: ${{ (steps.condition.outputs.should_run == 'true') || (github.event_name == 'workflow_dispatch' && inputs.build_cuda) }}
uses: docker/build-push-action@5176d81f87c23d6fc96624dfdbcd9f3830bbe445 # v6.5.0
with:
file: docker/zk-environment/20.04_amd64_cuda_${{ matrix.cuda_version }}.Dockerfile
file: docker/zk-environment/22.04_amd64_cuda_${{ matrix.cuda_version }}.Dockerfile
push: ${{ ( github.event_name == 'push' && github.ref == 'refs/heads/main' ) || (github.event_name == 'workflow_dispatch' && inputs.build_cuda) }}
tags: |
us-docker.pkg.dev/matterlabs-infra/matterlabs-docker/zk-environment-cuda-${{ matrix.cuda_version }}:latest
Expand Down
40 changes: 40 additions & 0 deletions bin/prover_checkers/batch_availability_checker
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/usr/bin/env bash

Artemka374 marked this conversation as resolved.
Show resolved Hide resolved
set -o errexit
set -o pipefail

# Configuration
# DATABASE_URL - The URL of the prover database to connect to
# BATCH_NUMBER - The batch number to check availability for
# INTERVAL - Time interval for polling in seconds
# TIMEOUT - Timeout of script in seconds

# Start timer
START_TIME=$(date +%s)

# Loop to query periodically
while true; do
# Calculate the elapsed time
CURRENT_TIME=$(date +%s)
ELAPSED_TIME=$((CURRENT_TIME - START_TIME))

# Check if the timeout has been reached
if [ $ELAPSED_TIME -ge $TIMEOUT ]; then
echo "Timeout reached. Failing CI..."
exit 1 # Exit with non-zero status to fail CI
fi

# Run the SQL query and capture the result
RESULT=$(psql $DATABASE_URL -c "SELECT count(*) FROM witness_inputs_fri WHERE l1_batch_number = $BATCH_NUMBER;" -t -A)

# Check if the result is 1
if [ "$RESULT" -eq 1 ]; then
echo "Query result is 1. Success!"
exit 0 # Exit with zero status to succeed CI
else
echo "Batch is not available yet. Retrying in $INTERVAL seconds..."
fi

# Wait for the next interval
sleep $INTERVAL
done
54 changes: 54 additions & 0 deletions bin/prover_checkers/batch_l1_status_checker
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/usr/bin/env bash

set -o errexit
set -o pipefail

# Needs following configuration
# URL - URL of the API endpoint
# INTERVAL - Time interval for polling in seconds
# TIMEOUT - Timeout of script in seconds

# Start timer
START_TIME=$(date +%s)

echo "URL: $URL"

# Loop to query periodically
while true; do
# Calculate the elapsed time
CURRENT_TIME=$(date +%s)
ELAPSED_TIME=$((CURRENT_TIME - START_TIME))

# Check if the timeout has been reached
if [ $ELAPSED_TIME -ge $TIMEOUT ]; then
echo "Timeout reached. Failing CI..."
exit 1 # Exit with non-zero status to fail CI
fi

# Run the curl request and capture the response
RESPONSE=$(curl --silent --request POST \
--url $URL \
--header 'Content-Type: application/json' \
--data '{
"jsonrpc": "2.0",
"id": 1,
"method": "zks_getBlockDetails",
"params": [1]
}')

# Parse the executedAt field using jq
EXECUTED_AT=$(echo $RESPONSE | jq -r '.result.executedAt')

# Check if executedAt is not null
if [ "$EXECUTED_AT" != "null" ] && [ -n "$EXECUTED_AT" ]; then
echo "executedAt is not null: $EXECUTED_AT"
echo "true"
exit 0 # Exit with zero status to succeed CI
else
DATABASE_STATUS=$(psql $DATABASE_URL -c "SELECT status FROM proof_compression_jobs_fri WHERE l1_batch_number = $BATCH_NUMBER;" -t -A)
echo "executedAt is null, database status is $DATABASE_STATUS, retrying in $INTERVAL seconds..."
fi

# Wait for the next interval
sleep $INTERVAL
done
12 changes: 12 additions & 0 deletions bin/prover_checkers/kill_prover
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/usr/bin/env bash

set -o errexit
set -o pipefail

# Use pkill to find and kill processes using circuit prover
if ! pkill -f 'zksync_circuit_prover|zkstack prover run --component=circuit-prover'; then
echo "No processes are currently using the GPU."
exit 0
fi

echo "All GPU-related processes have been killed."
42 changes: 42 additions & 0 deletions bin/prover_checkers/prover_jobs_status_checker
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/usr/bin/env bash

set -o errexit
set -o pipefail

# Configuration
# DATABASE_URL - The URL of the prover database to connect to
# BATCH_NUMBER - The batch number to check readiness for
# INTERVAL - Time interval for polling in seconds
# TIMEOUT - Timeout of script in seconds

# Start timer
START_TIME=$(date +%s)

# Loop to query periodically
while true; do
# Calculate the elapsed time
CURRENT_TIME=$(date +%s)
ELAPSED_TIME=$((CURRENT_TIME - START_TIME))

# Check if the timeout has been reached
if [ $ELAPSED_TIME -ge $TIMEOUT ]; then
echo "Timeout reached. Failing CI..."
exit 1 # Exit with non-zero status to fail CI
fi

# Run the SQL query and capture the result
RESULT=$(psql $DATABASE_URL -c "SELECT count(*) FROM proof_compression_jobs_fri WHERE l1_batch_number = $BATCH_NUMBER AND status = 'queued';" -t -A)

# Check if the result is 1
if [ "$RESULT" -eq 1 ]; then
echo "Query result is 1. Success!"
exit 0 # Exit with zero status to succeed CI
else
STATUS=$(psql $DATABASE_URL -c "SELECT COUNT(*), status FROM prover_jobs_fri WHERE l1_batch_number = $BATCH_NUMBER GROUP BY status;" -t -A)
echo "Current status is $STATUS"
echo "Retrying in $INTERVAL seconds..."
fi

# Wait for the next interval
sleep $INTERVAL
done
2 changes: 1 addition & 1 deletion core/node/proof_data_handler/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ pub async fn run_server(
mut stop_receiver: watch::Receiver<bool>,
) -> anyhow::Result<()> {
let bind_address = SocketAddr::from(([0, 0, 0, 0], config.http_port));
tracing::debug!("Starting proof data handler server on {bind_address}");
tracing::info!("Starting proof data handler server on {bind_address}");
Artemka374 marked this conversation as resolved.
Show resolved Hide resolved
let app = create_proof_processing_router(blob_store, connection_pool, config, commitment_mode);

let listener = tokio::net::TcpListener::bind(bind_address)
Expand Down
13 changes: 10 additions & 3 deletions docker-compose-gpu-runner-cuda-12-0.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ services:
reth:
restart: always
image: "ghcr.io/paradigmxyz/reth:v1.0.6"
ports:
- 127.0.0.1:8545:8545
volumes:
- type: bind
source: ./volumes/reth/data
Expand All @@ -12,11 +14,9 @@ services:
target: /chaindata

command: node --dev --datadir /rethdata --http --http.addr 0.0.0.0 --http.port 8545 --http.corsdomain "*" --dev.block-time 300ms --chain /chaindata/reth_config
ports:
- 127.0.0.1:8545:8545

zk:
image: ghcr.io/matter-labs/zk-environment:cuda-12-0-latest
image: ghcr.io/matter-labs/zk-environment:cuda-12_0-latest
depends_on:
- reth
- postgres
Expand Down Expand Up @@ -49,11 +49,18 @@ services:
- /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools
env_file:
- ./.env
extra_hosts:
- "host:host-gateway"
profiles:
- runner
network_mode: host
pid: host
deploy:
resources:
reservations:
devices:
- capabilities: [ gpu ]

postgres:
image: "postgres:14"
command: postgres -c 'max_connections=200'
Expand Down
7 changes: 6 additions & 1 deletion docker-compose-gpu-runner.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ services:
- 127.0.0.1:8545:8545

zk:
image: "ghcr.io/matter-labs/zk-environment:cuda-11-8-latest"
image: "ghcr.io/matter-labs/zk-environment:cuda-11_8-latest"
container_name: zk
depends_on:
- reth
Expand All @@ -40,6 +40,11 @@ services:
- GITHUB_WORKSPACE=$GITHUB_WORKSPACE
env_file:
- ./.env
extra_hosts:
- "host:host-gateway"
profiles:
- runner
network_mode: host
deploy:
resources:
reservations:
Expand Down
Loading
Loading