You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am wondering if it is possible to build a docker image including llama-cpp-python on a non-GPU host which targets a GPU host?
We build a base docker image that contains llama-cpp-python==0.2.53 by using the following command (relevant portion of dockerfile included for brevity):
ARG CUDA_IMAGE="12.5.0-devel-ubuntu22.04"
FROM nvidia/cuda:${CUDA_IMAGE} as base
...
# setting build related env vars
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
ENV FORCE_CMAKE=1
ENV CUDACXX="/usr/local/cuda-12.5/bin/nvcc"
ENV CMAKE_CUDA_ARCHITECTURES=80
# Install llama-cpp-python (build with cuda)
ENV CMAKE_ARGS="-DLLAMA_CURL=on -DGGML_CUDA=on -DLLAMA_CUBLAS=on -DCMAKE_CUDA_FLAGS='-arch=sm_80' -DCMAKE_CXX_FLAGS='-march=znver2'"
RUN pip install "llama-cpp-python==0.2.53" --no-cache-dir --force-reinstall --upgrade
...
We then use this as a base to add our application code in a CICD build step. This code includes guidance if that is important :).
The initial build phase is very manual because we don't yet have GPU hosts available as workers for our CICD build system so need to: manually spin up a GPU VPS, log in, pull down code, build image and push to repo. This is error-prone and hard to automate so we have just started the process of moving this into our CICD system. Before we invest resources in getting a GPU worker integrated into our build system we would like to completely rule out being able to build an image on a non-GPU host that will be able to utilise a GPU when deployed on a GPU host.
Has this been done? Can it? If not, can someone point me in the direction of the technical background as to why not? I'm new to GPU accelerated ML so any info is greatly appreciated.
The text was updated successfully, but these errors were encountered:
m-o-leary
changed the title
[INFORMATION REQUEST] Is it possible to build for GPU host on non-GPU host?
[INFORMATION REQUEST] Is it possible to build for GPU enabled target on non-GPU host?
Nov 25, 2024
This is just a question on the blessed path here
I am wondering if it is possible to build a docker image including
llama-cpp-python
on a non-GPU host which targets a GPU host?We build a base docker image that contains
llama-cpp-python==0.2.53
by using the following command (relevant portion of dockerfile included for brevity):We then use this as a base to add our application code in a CICD build step. This code includes
guidance
if that is important :).The initial build phase is very manual because we don't yet have GPU hosts available as workers for our CICD build system so need to: manually spin up a GPU VPS, log in, pull down code, build image and push to repo. This is error-prone and hard to automate so we have just started the process of moving this into our CICD system. Before we invest resources in getting a GPU worker integrated into our build system we would like to completely rule out being able to build an image on a non-GPU host that will be able to utilise a GPU when deployed on a GPU host.
Has this been done? Can it? If not, can someone point me in the direction of the technical background as to why not? I'm new to GPU accelerated ML so any info is greatly appreciated.
The text was updated successfully, but these errors were encountered: