Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-10783: fix(rocm): remove more files that instructlab also removes #653

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion rocm/ubi9-python-3.9/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ WORKDIR /opt/app-root/bin
ARG ROCM_VERSION=6.1
ARG AMDGPU_VERSION=6.1

# default: same targets and ROCm version as https://github.com/tiran/instructlab-containers/blob/main/containers/rocm/Containerfile.c9s#L47
ARG AMDGPU_TARGETS=gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx942;gfx1030;gfx1100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get some official set of AMD GPUs we should/need to support? I mean - are we sure that this default is what we want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but we should really assure that this isn't missing in the final image for any use-case we should support.

It's much better to miss something and add it later, than trying to remove things afterwards when we actually have users, imo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan is to support: AMD MI210, MI300 and later on MI250
Based on the strat we have https://issues.redhat.com/browse/RHOAISTRAT-135.

i agree, we can add the details if the opportunity presents, and try this out for now.
however what are these gfx files, are we aware what functionality is this removing ?


# Enable epel-release repositories

# Install the ROCm rpms
Expand All @@ -33,7 +36,15 @@ RUN echo "[ROCm]" > /etc/yum.repos.d/rocm.repo && \
echo "enabled=1" >> /etc/yum.repos.d/amdgpu.repo && \
echo "gpgcheck=0" >> /etc/yum.repos.d/amdgpu.repo && \
yum install -y rocm && \
yum clean all && rm -rf /var/cache/yum
yum clean all && rm -rf /var/cache/yum && \
# force remove 'rocm-llvm' from runtime, saves 3.6 GB on disk
rpm -e --nodeps rocm-llvm && \
# remove gfx files for unused ISAs, saves about 1.7 GB on disk
# sed creates regular expression '.*\(gfx900\|gfx906\|...\).*'
find /opt/rocm/lib/ -type f \
-and -name '*gfx*' \
-and -not -regex '.*\('$(echo $AMDGPU_TARGETS | sed -e 's/;/\\|/g' -e 's/:xnack[-+]//g')'\).*' \
-print0 | xargs -0 rm -v

# Restore notebook user workspace
USER 1001
Expand Down
Loading