Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16827: Integrate kafka native-image with system tests #16046

Merged
merged 13 commits into from
May 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,4 +60,5 @@ jmh-benchmarks/src/main/generated
storage/kafka-tiered-storage/

docker/test/report_*.html
kafka.Kafka
__pycache__
17 changes: 12 additions & 5 deletions bin/kafka-run-class.sh
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ fi

# JMX settings
if [ -z "$KAFKA_JMX_OPTS" ]; then
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false "
KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maynot be required as default is true for this system property

Copy link
Contributor Author

@kagarwal06 kagarwal06 May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct for JVM.
But the native binary requires it explicitly. Hence added this.

fi

# JMX port to use
Expand Down Expand Up @@ -340,9 +340,16 @@ CLASSPATH=${CLASSPATH#:}
# If Cygwin is detected, classpath is converted to Windows format.
(( WINDOWS_OS_FORMAT )) && CLASSPATH=$(cygpath --path --mixed "${CLASSPATH}")

# Launch mode
if [ "x$DAEMON_MODE" = "xtrue" ]; then
nohup "$JAVA" $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS $KAFKA_JMX_OPTS $KAFKA_LOG4J_CMD_OPTS -cp "$CLASSPATH" $KAFKA_OPTS "$@" > "$CONSOLE_OUTPUT_FILE" 2>&1 < /dev/null &
# If KAFKA_MODE=native, it will bring up Kafka in the native mode.
# It expects the Kafka executable binary to be present at $base_dir/kafka.Kafka.
# This is specifically used to run system tests on native Kafka - by bringing up Kafka in the native mode.
if [[ "x$KAFKA_MODE" == "xnative" ]] && [[ "$*" == *"kafka.Kafka"* ]]; then
exec $base_dir/kafka.Kafka start --config "$2" $KAFKA_LOG4J_CMD_OPTS $KAFKA_JMX_OPTS $KAFKA_OPTS
Copy link
Contributor

@omkreddy omkreddy May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont we need to pass $KAFKA_HEAP_OPTS to native run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the native-image docs

When executing a native image, suitable Java heap settings will be determined automatically based on the system configuration and the used GC. 

Though they have given mechanism to override it explicitly, I am making use of the automatic mechanism.

else
exec "$JAVA" $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS $KAFKA_JMX_OPTS $KAFKA_LOG4J_CMD_OPTS -cp "$CLASSPATH" $KAFKA_OPTS "$@"
# Launch mode
if [ "x$DAEMON_MODE" = "xtrue" ]; then
nohup "$JAVA" $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS $KAFKA_JMX_OPTS $KAFKA_LOG4J_CMD_OPTS -cp "$CLASSPATH" $KAFKA_OPTS "$@" > "$CONSOLE_OUTPUT_FILE" 2>&1 < /dev/null &
else
exec "$JAVA" $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS $KAFKA_JMX_OPTS $KAFKA_LOG4J_CMD_OPTS -cp "$CLASSPATH" $KAFKA_OPTS "$@"
fi
fi
5 changes: 3 additions & 2 deletions core/src/main/scala/kafka/docker/KafkaDockerWrapper.scala
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ package kafka.docker

import kafka.Kafka
import kafka.tools.StorageTool
import kafka.utils.Exit
import kafka.utils.{Exit, Logging}
import net.sourceforge.argparse4j.ArgumentParsers
import net.sourceforge.argparse4j.impl.Arguments.store
import net.sourceforge.argparse4j.inf.Namespace

import java.nio.charset.StandardCharsets
import java.nio.file.{Files, Path, Paths, StandardCopyOption, StandardOpenOption}

object KafkaDockerWrapper {
object KafkaDockerWrapper extends Logging {
def main(args: Array[String]): Unit = {
val namespace = parseArguments(args)
val command = namespace.getString("command")
Expand All @@ -48,6 +48,7 @@ object KafkaDockerWrapper {
StorageTool.main(formatCmd)
case "start" =>
val configFile = namespace.getString("config")
info("Starting Kafka server in the native mode.")
Kafka.main(Array(configFile))
case _ =>
throw new RuntimeException(s"Unknown operation $command. " +
Expand Down
38 changes: 12 additions & 26 deletions docker/native/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,40 +20,26 @@ ARG kafka_url
WORKDIR /app

ENV KAFKA_URL=$kafka_url
COPY native-image-configs native-image-configs
ENV NATIVE_IMAGE_PATH="native-image"
ENV KAFKA_DIR="/app/kafka"
ENV NATIVE_CONFIGS_DIR="/app/native-image-configs"
ENV KAFKA_LIBS_DIR="$KAFKA_DIR/libs"
ENV TARGET_PATH="$KAFKA_DIR/kafka.Kafka"

RUN mkdir kafka; \
COPY native-image-configs $NATIVE_CONFIGS_DIR
COPY native_command.sh native_command.sh

RUN mkdir $KAFKA_DIR; \
microdnf install wget; \
wget -nv -O kafka.tgz "$KAFKA_URL"; \
wget -nv -O kafka.tgz.asc "$kafka_url.asc"; \
tar xfz kafka.tgz -C kafka --strip-components 1; \
wget -nv -O kafka.tgz.asc "$KAFKA_URL.asc"; \
tar xfz kafka.tgz -C $KAFKA_DIR --strip-components 1; \
wget -nv -O KEYS https://downloads.apache.org/kafka/KEYS; \
gpg --import KEYS; \
gpg --batch --verify kafka.tgz.asc kafka.tgz; \
rm kafka.tgz ; \
cd kafka ; \
# Build the native-binary of the apache kafka using graalVM native-image.
native-image --no-fallback \
--enable-http \
--enable-https \
--allow-incomplete-classpath \
--report-unsupported-elements-at-runtime \
--install-exit-handlers \
--enable-monitoring=jmxserver,jmxclient,heapdump,jvmstat \
-H:+ReportExceptionStackTraces \
-H:+EnableAllSecurityServices \
-H:EnableURLProtocols=http,https \
-H:AdditionalSecurityProviders=sun.security.jgss.SunProvider \
-H:ReflectionConfigurationFiles=/app/native-image-configs/reflect-config.json \
-H:JNIConfigurationFiles=/app/native-image-configs/jni-config.json \
-H:ResourceConfigurationFiles=/app/native-image-configs/resource-config.json \
-H:SerializationConfigurationFiles=/app/native-image-configs/serialization-config.json \
-H:PredefinedClassesConfigurationFiles=/app/native-image-configs/predefined-classes-config.json \
-H:DynamicProxyConfigurationFiles=/app/native-image-configs/proxy-config.json \
--verbose \
-march=compatibility \
-cp "libs/*" kafka.docker.KafkaDockerWrapper \
-o kafka.Kafka
/app/native_command.sh $NATIVE_IMAGE_PATH $NATIVE_CONFIGS_DIR $KAFKA_LIBS_DIR $TARGET_PATH


FROM alpine:latest
Expand Down
8 changes: 2 additions & 6 deletions docker/native/native-image-configs/resource-config.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,9 @@
}, {
"pattern":"\\Qkafka/kafka-version.properties\\E"
}, {
"pattern":"\\Qlinux/amd64/libzstd-jni-1.5.5-6.so\\E"
"pattern":"\\Qlinux/amd64/libzstd-jni-1.5.6-3.so\\E"
}, {
"pattern":"\\Qlinux/aarch64/libzstd-jni-1.5.5-6.so\\E"
}, {
"pattern":"\\Qlinux/amd64/libzstd-jni-1.5.5-11.so\\E"
}, {
"pattern":"\\Qlinux/aarch64/libzstd-jni-1.5.5-11.so\\E"
"pattern":"\\Qlinux/aarch64/libzstd-jni-1.5.6-3.so\\E"
}, {
"pattern":"\\Qnet/jpountz/util/linux/amd64/liblz4-java.so\\E"
}, {
Expand Down
43 changes: 43 additions & 0 deletions docker/native/native_command.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# $1 - The path of the GraalVM native-image. This binary is used to compile Java applications ahead-of-time into a standalone native binary.
# $2 - The path of the directory that contains the native-image configuration files.
# $3 - The path of the directory that contains the Apache Kafka libs.
# $4 - The path of the resulting Kafka native binary after the build process.

$1 --no-fallback \
--enable-http \
--enable-https \
--allow-incomplete-classpath \
--report-unsupported-elements-at-runtime \
--install-exit-handlers \
--enable-monitoring=jmxserver,jmxclient,heapdump,jvmstat \
-H:+ReportExceptionStackTraces \
-H:+EnableAllSecurityServices \
-H:EnableURLProtocols=http,https \
-H:AdditionalSecurityProviders=sun.security.jgss.SunProvider \
-H:ReflectionConfigurationFiles="$2"/reflect-config.json \
-H:JNIConfigurationFiles="$2"/jni-config.json \
-H:ResourceConfigurationFiles="$2"/resource-config.json \
-H:SerializationConfigurationFiles="$2"/serialization-config.json \
-H:PredefinedClassesConfigurationFiles="$2"/predefined-classes-config.json \
-H:DynamicProxyConfigurationFiles="$2"/proxy-config.json \
--verbose \
-march=compatibility \
-cp "$3/*" kafka.docker.KafkaDockerWrapper \
-o "$4"
31 changes: 30 additions & 1 deletion tests/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,34 @@
# limitations under the License.

ARG jdk_version=openjdk:8
FROM $jdk_version AS build-native-image

WORKDIR /build

COPY native/ native

ARG KAFKA_MODE
ARG GRAALVM_URL="https://github.com/graalvm/graalvm-ce-builds/releases/download/jdk-21.0.1/graalvm-community-jdk-21.0.1_linux-aarch64_bin.tar.gz"

ENV NATIVE_IMAGE_PATH="/build/graalvm/bin/native-image"
ENV NATIVE_CONFIGS_DIR="/build/native/native-image-configs"
ENV KAFKA_LIBS_DIR="/build/kafka/libs"
ENV KAFKA_BIN_DIR="/build/kafka-binary"
ENV TARGET_PATH="$KAFKA_BIN_DIR/kafka.Kafka"

RUN mkdir $KAFKA_BIN_DIR

RUN if [ "$KAFKA_MODE" = "native" ]; then \
apt update && apt install -y sudo build-essential libz-dev zlib1g-dev curl jq coreutils libffi-dev cmake pkg-config libfuse-dev && apt-get -y clean ; \
mkdir graalvm ; \
curl -L "$GRAALVM_URL" -o graalvm.tar.gz ; \
tar -xzf graalvm.tar.gz -C graalvm --strip-components=1 ; \
mkdir kafka ; \
tar xfz native/kafka.tgz -C kafka --strip-components 1 ; \
rm graalvm.tar.gz kafka.tgz ; \
/build/native/native_command.sh $NATIVE_IMAGE_PATH $NATIVE_CONFIGS_DIR $KAFKA_LIBS_DIR $TARGET_PATH ; \
fi

FROM $jdk_version

MAINTAINER Apache Kafka [email protected]
Expand All @@ -37,6 +65,7 @@ RUN apt update && apt install -y sudo git netcat iptables rsync unzip wget curl
RUN python3 -m pip install -U pip==21.1.1;
RUN pip3 install --upgrade cffi virtualenv pyasn1 boto3 pycrypto pywinrm ipaddress enum34 debugpy && pip3 install --upgrade "ducktape>0.8"

COPY --from=build-native-image /build/kafka-binary/ /opt/kafka-binary/
# Set up ssh
COPY ./ssh-config /root/.ssh/config
# NOTE: The paramiko library supports the PEM-format private key, but does not support the RFC4716 format.
Expand Down Expand Up @@ -107,7 +136,7 @@ ARG KIBOSH_VERSION="8841dd392e6fbf02986e2fb1f1ebf04df344b65a"
ARG UID="1000"

# Install Kibosh
RUN apt-get install fuse
RUN apt-get install fuse -y
RUN cd /opt && git clone -q https://github.com/confluentinc/kibosh.git && cd "/opt/kibosh" && git reset --hard $KIBOSH_VERSION && mkdir "/opt/kibosh/build" && cd "/opt/kibosh/build" && ../configure && make -j 2

# Set up the ducker user.
Expand Down
42 changes: 42 additions & 0 deletions tests/docker/ducker-ak
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ ducker_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# The absolute path to the root Kafka directory
kafka_dir="$( cd "${ducker_dir}/../.." && pwd )"

# The absolute path to the directory to hold the necessary files to construct a ducker image for executing Kafka in native mode.
tmp_native_dir=${ducker_dir}/native

# The memory consumption to allow during the docker build.
# This does not include swap.
docker_build_memory_limit="3200m"
Expand All @@ -47,6 +50,9 @@ default_jdk="openjdk:8"
# The default ducker-ak image name.
default_image_name="ducker-ak"

# The default kafka server mode.
default_kafka_mode="jvm"

# Port to listen on when debugging
debugpy_port=5678

Expand Down Expand Up @@ -247,6 +253,7 @@ ducker_build() {
--build-arg "ducker_creator=${user_name}" \
--build-arg "jdk_version=${jdk_version}" \
--build-arg "UID=${UID}" \
--build-arg "KAFKA_MODE=${kafka_mode}" \
-t "${image_name}" \
-f "${ducker_dir}/Dockerfile" ${docker_args} -- .
docker_status=$?
Expand All @@ -257,6 +264,7 @@ ducker_build() {
$((${duration} % 60))s."
echo "If this error is unexpected, consider running 'docker system prune -a' \
to clear old images from your local cache."
cleanup_native_dir
exit 1
fi
echo "Successfully built ${what} image in $((${duration} / 60))m \
Expand Down Expand Up @@ -305,6 +313,32 @@ setup_custom_ducktape() {
must_do -v docker rm ducker01
}

cleanup_native_dir() {
if [ -d "${tmp_native_dir}" ]; then
echo "Deleting temporary native dir: ${tmp_native_dir}"
rm -rf "${tmp_native_dir}"
fi
}

prepare_native_dir() {
echo "Mode provided for system tests run: $kafka_mode"
must_pushd "${kafka_dir}"
cleanup_native_dir && mkdir "${tmp_native_dir}"

if [ "$kafka_mode" == "native" ]; then
kafka_tarball_filename=(core/build/distributions/kafka*SNAPSHOT.tgz)
if [ ! -e "${kafka_tarball_filename[0]}" ]; then
echo "Kafka tarball not present. Building Kafka tarball for native image."
./gradlew clean releaseTarGz
fi

cp core/build/distributions/kafka*SNAPSHOT.tgz "${tmp_native_dir}"/kafka.tgz
cp -r docker/native/native-image-configs "${tmp_native_dir}"
cp docker/native/native_command.sh "${tmp_native_dir}"
fi
must_popd
}

ducker_up() {
require_commands docker
while [[ $# -ge 1 ]]; do
Expand All @@ -314,11 +348,13 @@ ducker_up() {
-n|--num-nodes) set_once num_nodes "${2}" "number of nodes"; shift 2;;
-j|--jdk) set_once jdk_version "${2}" "the OpenJDK base image"; shift 2;;
-e|--expose-ports) set_once expose_ports "${2}" "the ports to expose"; shift 2;;
-m|--kafka_mode) set_once kafka_mode "${2}" "the mode in which kafka will run"; shift 2;;
*) set_once image_name "${1}" "docker image name"; shift;;
esac
done
[[ -n "${num_nodes}" ]] || num_nodes="${default_num_nodes}"
[[ -n "${jdk_version}" ]] || jdk_version="${default_jdk}"
[[ -n "${kafka_mode}" ]] || kafka_mode="${default_kafka_mode}"
[[ -n "${image_name}" ]] || image_name="${default_image_name}-${jdk_version/:/-}"
[[ "${num_nodes}" =~ ^-?[0-9]+$ ]] || \
die "ducker_up: the number of nodes must be an integer."
Expand All @@ -334,7 +370,9 @@ use only ${num_nodes}."

docker ps >/dev/null || die "ducker_up: failed to run docker. Please check that the daemon is started."

prepare_native_dir
ducker_build "${image_name}"
cleanup_native_dir

docker inspect --format='{{.Config.Labels}}' --type=image "${image_name}" | grep -q 'ducker.type'
local docker_status=${PIPESTATUS[0]}
Expand Down Expand Up @@ -385,6 +423,10 @@ attempting to start new ones."
[[ $? -ne 0 ]] && die "failed to append to the /etc/hosts file on ${node}"
done

if [ "$kafka_mode" == "native" ]; then
docker exec --user=root ducker01 bash -c 'cp /opt/kafka-binary/kafka.Kafka /opt/kafka-dev/kafka.Kafka'
fi

echo "ducker_up: added the latest entries to /etc/hosts on each node."
generate_cluster_json_file "${num_nodes}" "${ducker_dir}/build/cluster.json"
echo "ducker_up: successfully wrote ${ducker_dir}/build/cluster.json"
Expand Down
11 changes: 10 additions & 1 deletion tests/docker/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,21 @@ die() {
exit 1
}

if [[ "$_DUCKTAPE_OPTIONS" == *"kafka_mode"* && "$_DUCKTAPE_OPTIONS" == *"native"* ]]; then
export KAFKA_MODE="native"
else
export KAFKA_MODE="jvm"
fi

if [ "$REBUILD" == "t" ]; then
./gradlew clean systemTestLibs
if [ "$KAFKA_MODE" == "native" ]; then
./gradlew clean releaseTarGz
fi
fi

if ${SCRIPT_DIR}/ducker-ak ssh | grep -q '(none)'; then
${SCRIPT_DIR}/ducker-ak up -n "${KAFKA_NUM_CONTAINERS}" || die "ducker-ak up failed"
${SCRIPT_DIR}/ducker-ak up -n "${KAFKA_NUM_CONTAINERS}" -m "${KAFKA_MODE}" || die "ducker-ak up failed"
fi

[[ -n ${_DUCKTAPE_OPTIONS} ]] && _DUCKTAPE_OPTIONS="-- ${_DUCKTAPE_OPTIONS}"
Expand Down
12 changes: 9 additions & 3 deletions tests/kafkatest/services/kafka/kafka.py
Original file line number Diff line number Diff line change
Expand Up @@ -806,7 +806,13 @@ def render_configs(self, configs):
return s

def start_cmd(self, node):
cmd = "export JMX_PORT=%d; " % self.jmx_port
"""
To bring up kafka using native image, pass following in ducktape options
--globals '{"kafka_mode": "native"}'
"""
kafka_mode = self.context.globals.get("kafka_mode", "")
cmd = f"export KAFKA_MODE={kafka_mode}; "
cmd += "export JMX_PORT=%d; " % self.jmx_port
cmd += "export KAFKA_LOG4J_OPTS=\"-Dlog4j.configuration=file:%s\"; " % self.LOG4J_CONFIG
heap_kafka_opts = "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=%s" % \
self.logs["kafka_heap_dump_file"]["path"]
Expand Down Expand Up @@ -926,7 +932,7 @@ def run_features_command(self, op, new_version):
def pids(self, node):
"""Return process ids associated with running processes on the given node."""
try:
cmd = "jcmd | grep -e %s | awk '{print $1}'" % self.java_class_name()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jcmd command won't work if Kafka is running in the native mode.
Replacing it with ps ax.

cmd = "ps ax | grep -i %s | grep -v grep | awk '{print $1}'" % self.java_class_name()
pid_arr = [pid for pid in node.account.ssh_capture(cmd, allow_fail=True, callback=int)]
return pid_arr
except (RemoteCommandError, ValueError) as e:
Expand Down Expand Up @@ -994,7 +1000,7 @@ def thread_dump(self, node):
def clean_node(self, node):
JmxMixin.clean_node(self, node)
self.security_config.clean_node(node)
node.account.kill_java_processes(self.java_class_name(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill_java_processes command won't work if Kafka is running in the native mode.
Replacing it with kill_process.

node.account.kill_process(self.java_class_name(),
clean_shutdown=False, allow_fail=True)
node.account.ssh("sudo rm -rf -- %s" % KafkaService.PERSISTENT_ROOT, allow_fail=False)

Expand Down
Loading