Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for memory profiling #727

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# Memory profiler output
memory-profiling_*
profile-*
profile/memory-profiling_*

# CLI history
.history
5 changes: 5 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -163,3 +163,8 @@ vergen-gitcl = { version = "1", features = ["build", "cargo", "rustc"] }
codegen-units = 1
lto = true
strip = true

[profile.release-with-debug]
debug = true
inherits = "release"
strip = false
28 changes: 28 additions & 0 deletions profile/Dockerfile.profile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Image that wraps Seafowl with a bytehound binary and records
# memory allocations for profiling
#
# To build run just the bytehound layer run
# DOCKER_BUILDKIT=1 docker build --target bytehound -f Dockerfile.profile -t splitgraph/bytehound .
#
# To build the full image run
# DOCKER_BUILDKIT=1 docker build -f Dockerfile.profile -t splitgraph/seafowl:profile ..

FROM rust:slim AS bytehound

RUN apt-get update && \
apt-get install -y git protobuf-compiler ca-certificates npm && \
npm install -g yarn && \
git clone https://github.com/koute/bytehound.git && \
cd bytehound && \
cargo build --release -p bytehound-preload && \
cargo build --release -p bytehound-cli

FROM ubuntu AS profile

RUN mkdir profiles && mkdir seafowl-data
COPY target/aarch64-unknown-linux-gnu/release-with-debug/seafowl seafowl
COPY --from=bytehound /bytehound/target/release/libbytehound.so libbytehound.so

ENV MEMORY_PROFILER_OUTPUT=profiles/memory-profiling_%e_%t_%p.dat
ENV LD_PRELOAD=./libbytehound.so
ENTRYPOINT [ "./seafowl" ]
60 changes: 60 additions & 0 deletions profile/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
## Setup

If you're on MacOS you'll need cross-compilation tools, since the profiling Seafowl binary is
compiled on the host (compiling inside the container leads to OOMs). Grab one from
https://github.com/messense/homebrew-macos-cross-toolchains.

The build bytehound and the profiler (Seafowl wrapped in bytehound) images run

```shell
$ just build-bytehound
$ just build-profiler
```

## Measuring

To actually start profiling run

```shell
$ just profile
docker run -p 8080:8080 -p 47470:47470 -v .:/profiles -v `realpath ../seafowl.toml`:/seafowl.toml -v `realpath ../../seafowl-data`:/seafowl-data splitgraph/seafowl:profile -c /seafowl.toml
2024-11-06T14:12:11.519272Z INFO main ThreadId(01) seafowl: Starting Seafowl 0.5.8
2024-11-06T14:12:11.519390Z INFO main ThreadId(01) seafowl: Loading the configuration from /seafowl.toml
2024-11-06T14:12:11.538033Z INFO tokio-runtime-worker ThreadId(12) seafowl: Starting the Arrow Flight frontend on 0.0.0.0:47470
2024-11-06T14:12:11.538268Z INFO tokio-runtime-worker ThreadId(12) seafowl: Starting the PostgreSQL frontend on 127.0.0.1:6432
2024-11-06T14:12:11.538275Z WARN tokio-runtime-worker ThreadId(12) seafowl: The PostgreSQL frontend doesn't have authentication or encryption and should only be used in development!
2024-11-06T14:12:11.538321Z INFO tokio-runtime-worker ThreadId(12) seafowl: Starting the HTTP frontend on 0.0.0.0:8080
...
```

and then run your workload against the HTTP/gRPC endpoint.

Bytehound continually dumps the allocation data into a single file for each run

```shell
$ tree -h | grep mem
├── [3.8G] memory-profiling_seafowl_1730902150_1.dat
├── [4.3G] memory-profiling_seafowl_1730902331_1.dat
```

Once you're done with profiling press ctrl + c to stop the container.

## Browsing

To see the data in a web UI either explicitly load a one or more files

```shell
just view memory-profiling_seafowl_1730902331_1.dat
```

or load all recorded files

```shell
just view
```

and open `localhost:9999`.

Note that you should strive to keep the recorded profiles under 5GB since otherwise the server will
fail to load them. If you must profile a long running process, but want to work around this consider
toggling the recording on and off via `docker kill -s SIGUSR1 seafowl-profiler`.
27 changes: 27 additions & 0 deletions profile/justfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
build-bytehound:
DOCKER_BUILDKIT=1 docker build --target bytehound -f Dockerfile.profile -t splitgraph/bytehound .

# TODO: generailze to support amd64
build-profiler:
CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=aarch64-linux-gnu-gcc
cargo build --target aarch64-unknown-linux-gnu --profile release-with-debug
DOCKER_BUILDKIT=1 docker build -f Dockerfile.profile -t splitgraph/seafowl:profile ..

profile:
docker run --rm --name seafowl-profiler \
-p 8080:8080 -p 47470:47470 \
-v .:/profiles \
-v `realpath ../seafowl.toml`:/seafowl.toml \
-v `realpath ../../seafowl-data`:/seafowl-data \
splitgraph/seafowl:profile -c /seafowl.toml

view *files='memory-profiling_*':
docker run --rm --name seafowl-profile-server \
-p 9999:9999 \
-v .:/profiles \
-w /profiles \
\splitgraph/bytehound \
/bytehound/target/release/bytehound server -i 0.0.0.0 -p 9999 {{files}}

clean:
rm -rf memory-profiling_*
2 changes: 1 addition & 1 deletion seafowl.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ bind_port = 6432
bind_host = "0.0.0.0"

[frontend.flight]
bind_host = "127.0.0.1"
bind_host = "0.0.0.0"