Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44619][INFRA] Free up disk space for container jobs #42253

Closed
wants to merge 20 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Aug 1, 2023

What changes were proposed in this pull request?

Free up disk space for container jobs

Why are the changes needed?

increase the available disk space

before this PR
image

after this PR
image

Does this PR introduce any user-facing change?

No, infra-only

How was this patch tested?

updated CI

Comment on lines 415 to 419
rm -rf /__t/CodeQL || echo "fail to delete /__t/CodeQL"
rm -rf /__t/go || echo "fail to delete /__t/go"
rm -rf /__t/node || echo "fail to delete /__t/node"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleting those directories in Dockerfile takes no effect ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

88M	/__e/node12
55M	/__e/node12_alpine
101M	/__e/node16
89M	/__e/node16_alpine
8.2G	/__t/CodeQL
16K	/__t/Java_Temurin-Hotspot_jdk
487M	/__t/PyPy
1.2G	/__t/Python
62M	/__t/Ruby
1.2G	/__t/go
379M	/__t/node
16K	/__w/_PipelineMapping
26M	/__w/_actions
68K	/__w/_temp
681M	/__w/spark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not find an official way to uninstall CodeQL (like pip/apt-get/etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Yikun do you happen to know can we remove /__t/CodeQL in this way?

Copy link
Contributor Author

@zhengruifeng zhengruifeng Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't find a good way to uninstall CodeQL/Go/Node

also cc @LuciferYang @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According the link:
https://github.com/actions/runner-images/blob/main/images/linux/Ubuntu2204-Readme.md#installed-apt-packages

seems only above package can be unintall by apt, for other one, we can only cleanup by this way, so this PR seems a okay way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is CodeQL installed by the Ubuntu 22.04 runner?
I am surprised that I don't find it in a non-container job (in #42241)

@zhengruifeng zhengruifeng changed the title [SPARK-44619][INFRA] Free up disk space for container jobs [SPARK-44619][INFRA] Free up disk space for PySpark test jobs Aug 1, 2023
@zhengruifeng zhengruifeng changed the title [SPARK-44619][INFRA] Free up disk space for PySpark test jobs [SPARK-44619][INFRA] Free up disk space for pyspark container jobs Aug 1, 2023
@zhengruifeng zhengruifeng marked this pull request as draft August 1, 2023 06:50
@zhengruifeng zhengruifeng changed the title [SPARK-44619][INFRA] Free up disk space for pyspark container jobs [WIP][SPARK-44619][INFRA] Free up disk space for pyspark container jobs Aug 1, 2023
@zhengruifeng zhengruifeng marked this pull request as ready for review August 1, 2023 10:56
@zhengruifeng zhengruifeng changed the title [WIP][SPARK-44619][INFRA] Free up disk space for pyspark container jobs [SPARK-44619][INFRA] Free up disk space for pyspark container jobs Aug 1, 2023
@zhengruifeng zhengruifeng force-pushed the infra_clean_container branch from 0874cf4 to 9b6a16f Compare August 1, 2023 11:04
Comment on lines 80 to 81
RUN apt-get autoremove --purge -y
RUN apt-get clean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run

Official Debian and Ubuntu images [automatically run apt-get clean](https://github.com/moby/moby/blob/03e2923e42446dbb830c654d0eec323a0b4ef02a/contrib/mkimage/debootstrap#L82-L105), so explicit invocation is not required.

It already be removed. But add rm -rf /var/lib/apt/lists/* to the end of apt install line might help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this PR aims to uninstall pkgs, I would like to send another PR to following this rm -rf /var/lib/apt/lists/* guide

@zhengruifeng zhengruifeng marked this pull request as draft August 2, 2023 00:57
@zhengruifeng
Copy link
Contributor Author

let me take another look to check whether it is safe to uninstall CodeQL and what is the proper way to do

@HyukjinKwon
Copy link
Member

Since the CI passes fine for the time being, we can just drop this too for now if this takes too much time to investigate.

@zhengruifeng
Copy link
Contributor Author

Since the CI passes fine for the time being, we can just drop this too for now if this takes too much time to investigate.

nice!

@zhengruifeng zhengruifeng force-pushed the infra_clean_container branch from 6b3a9d1 to a175f6b Compare August 2, 2023 03:40
@zhengruifeng zhengruifeng marked this pull request as ready for review August 2, 2023 03:40
@zhengruifeng
Copy link
Contributor Author

let me focus on removing the tool directories first and skip the changes in dockerfile

@zhengruifeng zhengruifeng force-pushed the infra_clean_container branch from a175f6b to e66d1f3 Compare August 2, 2023 11:14
@github-actions github-actions bot added BUILD and removed BUILD labels Aug 2, 2023
@zhengruifeng zhengruifeng force-pushed the infra_clean_container branch from 60e5fa7 to bbc96b4 Compare August 3, 2023 01:34
@zhengruifeng zhengruifeng force-pushed the infra_clean_container branch from 00276d4 to 01763c2 Compare August 3, 2023 06:58
@zhengruifeng zhengruifeng changed the title [SPARK-44619][INFRA] Free up disk space for pyspark container jobs [SPARK-44619][INFRA] Free up disk space for container jobs Aug 3, 2023
@zhengruifeng
Copy link
Contributor Author

Different from the non-container jobs, there are not many unneeded apt libraries in container jobs.
I discard the apt-get remove because:
1, uninstall libgl1-mesa-dri only free about 500MiB disk;
2, cause this new script can not work with job lint;

@zhengruifeng
Copy link
Contributor Author

cc @HyukjinKwon @Yikun @LuciferYang would you mind taking another look?

@zhengruifeng zhengruifeng deleted the infra_clean_container branch August 4, 2023 01:52
@zhengruifeng
Copy link
Contributor Author

thanks, merged to master

zhengruifeng added a commit that referenced this pull request Sep 13, 2023
### What changes were proposed in this pull request?
follow the [Best practices for writing Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get) :

> Always combine RUN apt-get update with apt-get install in the same RUN statement.

### Why are the changes needed?
1, to address #42253 (comment)
2, when I attempted to change the apt-get install in #41918, the behavior was confusing. By following the best practices, further changes should work immediately.

### Does this PR introduce _any_ user-facing change?
NO, dev-only

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #42842 from zhengruifeng/infra_docker_file_opt.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
LuciferYang pushed a commit to LuciferYang/spark that referenced this pull request Oct 16, 2023
### What changes were proposed in this pull request?
Free up disk space for container jobs

### Why are the changes needed?
increase the available disk space

before this PR
![image](https://github.com/apache/spark/assets/7322292/64230324-607b-4c1d-ac2d-84b9bcaab12a)

after this PR
![image](https://github.com/apache/spark/assets/7322292/aafed2d6-5d26-4f7f-b020-1efe4f551a8f)

### Does this PR introduce _any_ user-facing change?
No, infra-only

### How was this patch tested?
updated CI

Closes apache#42253 from zhengruifeng/infra_clean_container.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
### What changes were proposed in this pull request?
Free up disk space for container jobs

### Why are the changes needed?
increase the available disk space

before this PR
![image](https://github.com/apache/spark/assets/7322292/64230324-607b-4c1d-ac2d-84b9bcaab12a)

after this PR
![image](https://github.com/apache/spark/assets/7322292/aafed2d6-5d26-4f7f-b020-1efe4f551a8f)

### Does this PR introduce _any_ user-facing change?
No, infra-only

### How was this patch tested?
updated CI

Closes apache#42253 from zhengruifeng/infra_clean_container.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants