Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use builder image's objcopy when building with quarkus.native.container-build=true and quarkus.native.debug.enabled=true #13963

Merged
merged 5 commits into from
Apr 22, 2021

Conversation

zakkak
Copy link
Contributor

@zakkak zakkak commented Dec 18, 2020

Closes #13754
Closes #13856

@ghost ghost added the area/core label Dec 18, 2020
@zakkak zakkak force-pushed the objcopy-incontainer branch from c7e95e5 to 4d66b88 Compare December 18, 2020 05:07
@zakkak zakkak marked this pull request as ready for review December 18, 2020 12:33
Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments...

Comment on lines 180 to 177
objcopy(containerCommand, outputDir, "--only-keep-debug", executableName, symbols);
objcopy(containerCommand, outputDir, String.format("--add-gnu-debuglink=%s", symbols), executableName);
}
// Strip debug symbols regardless, because the underlying JDK might contain them
objcopy(containerCommand, outputDir, "--strip-debug", executableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this spin up a container 3 times? Ideally, we'd stitch them together so that we can get away with a one time container invocation. I.e. podman run ... --entrypoint=/bin/bash <container> -c 'objcopy --only-keep-debug ... ; objcopy --add-gnu-debuglink=....; objcopy --strip-debug ... or podman run ... --entrypoint=/bin/bash <container> -c 'objcopy --strip-debug ... depending on the -g option.

Copy link
Contributor Author

@zakkak zakkak Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will, but that's fairly fast in comparison to the native-image build step. Keeping it this way we don't have to maintain two different code paths for container and host builds, thus keeping it simple and more readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this myself, but apparently running containers on non-Linux isn't as quick. Perhaps mac and windows use-cases with building with the container should be tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK Mac is almost as fast, and windows docker using WSL2 as the backend should also be pretty fast. I can do some testing on Windows, but unfortunately not on Mac.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe an optimization to do as a later step (once we know this actually is a problem?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR: I tested this on two laptops with similar specs.
I tested the "combined" (single docker container for all 3 processes) vs the "split" (1 container per process) approach and the results were:

Windows Linux
Combined ~16s ~0.8s
Split 3.5 + 13 + 6.5 = ~23s 0.5 + 0.5 + 0.6 = ~1.1s

Although the benefit of using the "combined" approach is pretty clear, the overhead of the "split" approach (~7s) is still relatively low given that the image build took ~260s.

@zakkak zakkak requested a review from gsmet December 22, 2020 02:03
@zakkak zakkak force-pushed the objcopy-incontainer branch from 6b8f277 to eaf9955 Compare January 22, 2021 11:30
@zakkak
Copy link
Contributor Author

zakkak commented Jan 22, 2021

@gsmet can you please review?

@gsmet
Copy link
Member

gsmet commented Jan 26, 2021

@zakkak can you fix the conflict?

Frankly, I can't backport it as is. We can't refactor everything in a PR we want to backport...

@zakkak zakkak force-pushed the objcopy-incontainer branch from eaf9955 to ac8e60e Compare January 27, 2021 13:25
@zakkak
Copy link
Contributor Author

zakkak commented Jan 27, 2021

@zakkak can you fix the conflict?

Done

Frankly, I can't backport it as is. We can't refactor everything in a PR we want to backport...

I understand. I should have actually added this PR to 1.11.0.Final in the first place.
How would you like to proceed with this since we would like it in 1.11.x? Is it OK if I create a new PR with branch 1.11 as the base?
Unfortunately the refactoring is necessary to make this change in a sensible way, otherwise we would have to duplicate a lot of code and some processing.

@gsmet
Copy link
Member

gsmet commented Feb 3, 2021

As I said I'm uncomfortable backporting something that rewrites a whole class. So either we find a finer-grained approach or we don't backport it. The current approach is not safe for a backport. I don't want to break something else to fix an issue.

Now if someone really wants this, they will have to review this VERY closely to be sure everything is safe.

/cc @maxandersen @tqvarnst

@rsvoboda
Copy link
Member

@zakkak can you please rebase?

@zakkak
Copy link
Contributor Author

zakkak commented Feb 22, 2021

@zakkak can you please rebase?

@rsvoboda this is not worth rebasing before #14635 gets merged, unless it's a real blocker for you.

@geoand
Copy link
Contributor

geoand commented Feb 22, 2021

This need a rebase but we need a decision on

  1. Whether or needs to be backported or not (based on what @gsmet mentioned)
  2. If we want this or Add native-source package type #15233 to go in first (that PR is definitely not backport material either)

@zakkak
Copy link
Contributor Author

zakkak commented Feb 22, 2021

This need a rebase but we need a decision on

1. Whether or needs to be backported or not (based on what @gsmet mentioned)

We have concluded that this does not need to be backported.

2. If we want this or #15233 to go in first

I would prefer #15233 to go in first.

@geoand
Copy link
Contributor

geoand commented Feb 22, 2021

OK, then I will take #15233 out of draft so it can get a proper review.

Furthermore, I'll add a test for it

@zakkak
Copy link
Contributor Author

zakkak commented Feb 22, 2021

OK, then I will take #15233 out of draft so it can get a proper review.

Furthermore, I'll add a test for it

Can we please make sure #14635 goes in first, then #15233 and finally #13963 (this PR)?

@geoand
Copy link
Contributor

geoand commented Feb 22, 2021

Ouch... I just saw #14635... Rebasing my PR onto that one won't be fun :(

@zakkak
Copy link
Contributor Author

zakkak commented Feb 22, 2021

Ouch... I just saw #14635... Rebasing my PR onto that one won't be fun :(

I feel you, if it eases your pain it will be more or less the same for this PR ;)

@geoand
Copy link
Contributor

geoand commented Feb 22, 2021

Ouch... I just saw #14635... Rebasing my PR onto that one won't be fun :(

I feel you, if it eases your pain it will be more or less the same for this PR ;)

Yeah, we are in the same boat :)

@geoand
Copy link
Contributor

geoand commented Feb 22, 2021

Turned out to be easier than I thought :)

@zakkak zakkak force-pushed the objcopy-incontainer branch from ac8e60e to efe041c Compare March 2, 2021 14:22
@zakkak
Copy link
Contributor Author

zakkak commented Mar 2, 2021

Turned out to be easier than I thought :)

I wish I could say the same :)

This is now ready for review. I tested it locally with all three configurations and GraalVM CE 21.0:

  1. Host build with -Dquarkus.native.debug.enabled=true -Dquarkus.native.graalvm-home=/opt/jvms/graalvm-ce-java11-21.0.0
  2. Builder image with -Dquarkus.native.debug.enabled=true
  3. Remote buidler image with -Dquarkus.native.remote-container-build=true -Dquarkus.native.debug.enabled=true

@jonathan-meier FYI (It would also be great if you could give this a spin on your "remote" docker setup)

@@ -13,6 +15,7 @@
public class NativeImageBuildRemoteContainerRunner extends NativeImageBuildContainerRunner {

private static final Logger log = Logger.getLogger(NativeImageBuildRemoteContainerRunner.class);
private static final String CONTAINER_BUILD_VOLUME_NAME = "quarkus-native-builder-image-project-volume";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I would prefer a dynamically generated unique volume name, but I am not sure it's worth the effort.
We could also do something like fixed-name-YYYY-MM-DD-HHMM to avoid potential conflicts with existing volumes.
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's straightforward to let docker generate the unique volume id, I quickly tested it locally:

final String[] createVolumeCommand = new String[] { containerRuntime.getExecutableName(), "volume", "create" };
volumeId = runCommandAndReadOutput(createVolumeCommand, "Failed to create temp volume.");

Using this as a first step in preBuild and replacing all CONTAINER_BUILD_VOLUME_NAME with volumeId.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it's not that simple. If you manually create the volume before starting the container, the volume (when mount) is owned by root:root and native-image fails to write to it in subsequent steps.

Creating the volume explicitly

$ podman volume create test-volume                         
test-volume
$ podman create --name temp -v test-volume:/project quay.io/quarkus/ubi-quarkus-native-image:21.0.0-java11                         
bfa2406d5fa0a003af99c91f077ed989006eb2441004135773429c425f9ce997
$ podman run --rm --entrypoint /bin/bash -it -v test-volume:/project quay.io/quarkus/ubi-quarkus-native-image:21.0.0-java11 -c "ls -la"
total 0
drwxr-xr-x.  2 root root 6 Mar  3 11:40 .
drwxr-xr-x. 19 root root 6 Mar  3 11:41 ..

Creating the volume implicitly

$ podman create --name temp -v test-volume:/project quay.io/quarkus/ubi-quarkus-native-image:21.0.0-java11
e30ffb76aa320448a3e645d7030c4dcf76b0efb83b68e3bc58a2e655c683d1de
$ podman run --rm --entrypoint /bin/bash -it -v test-volume:/project quay.io/quarkus/ubi-quarkus-native-image:21.0.0-java11 -c "ls -la"
total 0
drwxr-xr-x.  2 quarkus quarkus 6 Mar  3 11:42 .
drwxr-xr-x. 19 root    root    6 Mar  3 11:42 ..

I am not a containers expert though so I'll be happy to be corrected :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I checked the following:

  • docker 19.03.13 on Ubuntu 20.10
  • docker 20.10.2 on Ubuntu 20.10 in WSL
  • podman 2.0.6 on Ubuntu 20.10
  • podman 1.6.4 on CentOS 7.9
  • podman 3.0.1 on CentOS 7.9

in all but the last case, the /project folder was owned by the quarkus user also when creating the volume explicitly. So something seems to have changed for podman 3. I'm guessing you're running on podman 3 as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I am using podman 3.0.1 on Fedora 33.

Thanks for the additional info @jonathan-meier, I created containers/podman#9608.

@jonathan-meier
Copy link
Contributor

Works like a charm! I tested various combinations of local/remote builds with debug enabled/disabled both natively on Windows and inside WSL.

@zakkak
Copy link
Contributor Author

zakkak commented Mar 3, 2021

Works like a charm! I tested various combinations of local/remote builds with debug enabled/disabled both natively on Windows and inside WSL.

Thank you @jonathan-meier, I really appreciate it :)

Base automatically changed from master to main March 12, 2021 15:55
@gsmet gsmet force-pushed the objcopy-incontainer branch from efe041c to 57328b0 Compare April 1, 2021 12:49
@gsmet
Copy link
Member

gsmet commented Apr 1, 2021

@zakkak @jonathan-meier I rebased this PR and also added an additional commit to fix (again) the confusion between native image name and executable name. Could you both have a look?

@gsmet gsmet added triage/waiting-for-ci Ready to merge when CI successfully finishes and removed triage/waiting-for-ci Ready to merge when CI successfully finishes labels Apr 1, 2021
@zakkak
Copy link
Contributor Author

zakkak commented Apr 1, 2021

Thanks @gsmet. It looks like the issue (which appears only in podman 3.0.x) I am working around in this PR (see #13963 (comment)) is fixed in podman 3.1.0 (which was released 2 days ago) so I was thinking about waiting for it to land in fedora and rework the PR to avoid the work-around. WDYT?

@gsmet
Copy link
Member

gsmet commented Apr 1, 2021

If the workaround doesn't break Podman 3.1, I would get it in anyway. People don't always keep things updated.

@zakkak
Copy link
Contributor Author

zakkak commented Apr 1, 2021

If the workaround doesn't break Podman 3.1, I would get it in anyway. People don't always keep things updated.

OK, I added a comment to explain that we follow this approach due to the Podman 3.0.x issue. It looks good to me (at least the linux part). It would be nice if someone could confirm that everything works as expected on windows as well.

@jonathan-meier
Copy link
Contributor

Looks good to me too! I again tested various combinations of containerized local/remote docker builds with debug enabled/disabled both natively on Windows and inside WSL.

Please note that I did not test direct local builds on Windows producing a Windows binary (i.e. without docker, as I don't have GraalVM installed on Windows). I did not do any podman tests either.

zakkak and others added 5 commits April 12, 2021 12:39
This patch creates a volume for /project where we can copy files to/from
using `docker cp`. The volume is create in the prebuild step, then used
in the build step, and finally removed in the postbuild step. The new
approach allows us to execute more than one commands using
`docker run --rm` on the same files in /project.
@zakkak zakkak force-pushed the objcopy-incontainer branch from ae176ad to 8008ceb Compare April 12, 2021 10:34
@zakkak
Copy link
Contributor Author

zakkak commented Apr 12, 2021

@gsmet I rebased this (cleanly). I think it's good to go (after the CI completes).

@quarkus-bot
Copy link

quarkus-bot bot commented Apr 12, 2021

Failing Jobs - Building 8008ceb

Status Name Step Test failures Logs Raw logs
JVM Tests - JDK 8 Build Test failures Logs Raw logs

Full information is available in the Build summary check run.

Test Failures

⚙️ JVM Tests - JDK 8 #

📦 integration-tests/kafka

io.quarkus.it.kafka.SaslKafkaConsumerTest.testReception line 48 - More details - Source on GitHub

io.quarkus.it.kafka.SslKafkaConsumerTest.testReception line 56 - More details - Source on GitHub

@zakkak
Copy link
Contributor Author

zakkak commented Apr 22, 2021

CI looks good @gsmet @geoand can you please merge?

@gsmet gsmet merged commit 7299fd0 into quarkusio:main Apr 22, 2021
@quarkus-bot quarkus-bot bot added this to the 2.0 - main milestone Apr 22, 2021
@zakkak zakkak deleted the objcopy-incontainer branch July 12, 2022 12:14
zakkak added a commit to zakkak/quarkus that referenced this pull request Jul 12, 2022
PR quarkusio#13963 mistakenly stopped stripping debug information from the native
executable when debug info generation is enabled resulting in both the
native executable and the corresponding `.debug` file containing the
debug information. Which also results in the native executable being
larger when `-Dquarkus.native.debug.enabled` is used:

```
$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug
108M	quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug
182M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
75M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
```

Inspecting `quarkus-integration-test-main-999-SNAPSHOT-runner-debug` and
`quarkus-integration-test-main-999-SNAPSHOT-runner-debug` with
`readelf --debug-dump=info` we observe that both files are containing
debug information while only the latter should contain them.
gsmet pushed a commit to gsmet/quarkus that referenced this pull request Jul 18, 2022
PR quarkusio#13963 mistakenly stopped stripping debug information from the native
executable when debug info generation is enabled resulting in both the
native executable and the corresponding `.debug` file containing the
debug information. Which also results in the native executable being
larger when `-Dquarkus.native.debug.enabled` is used:

```
$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug
108M	quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug
182M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
75M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
```

Inspecting `quarkus-integration-test-main-999-SNAPSHOT-runner-debug` and
`quarkus-integration-test-main-999-SNAPSHOT-runner-debug` with
`readelf --debug-dump=info` we observe that both files are containing
debug information while only the latter should contain them.

(cherry picked from commit f69d98d)
gsmet pushed a commit to gsmet/quarkus that referenced this pull request Jan 13, 2023
PR quarkusio#13963 mistakenly stopped stripping debug information from the native
executable when debug info generation is enabled resulting in both the
native executable and the corresponding `.debug` file containing the
debug information. Which also results in the native executable being
larger when `-Dquarkus.native.debug.enabled` is used:

```
$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug
108M	quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug
182M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
75M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
```

Inspecting `quarkus-integration-test-main-999-SNAPSHOT-runner-debug` and
`quarkus-integration-test-main-999-SNAPSHOT-runner-debug` with
`readelf --debug-dump=info` we observe that both files are containing
debug information while only the latter should contain them.

(cherry picked from commit f69d98d)
gsmet pushed a commit to gsmet/quarkus that referenced this pull request Jan 18, 2023
PR quarkusio#13963 mistakenly stopped stripping debug information from the native
executable when debug info generation is enabled resulting in both the
native executable and the corresponding `.debug` file containing the
debug information. Which also results in the native executable being
larger when `-Dquarkus.native.debug.enabled` is used:

```
$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug
108M	quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug
182M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
75M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
```

Inspecting `quarkus-integration-test-main-999-SNAPSHOT-runner-debug` and
`quarkus-integration-test-main-999-SNAPSHOT-runner-debug` with
`readelf --debug-dump=info` we observe that both files are containing
debug information while only the latter should contain them.

(cherry picked from commit f69d98d)
gsmet pushed a commit to gsmet/quarkus that referenced this pull request Jan 25, 2023
PR quarkusio#13963 mistakenly stopped stripping debug information from the native
executable when debug info generation is enabled resulting in both the
native executable and the corresponding `.debug` file containing the
debug information. Which also results in the native executable being
larger when `-Dquarkus.native.debug.enabled` is used:

```
$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug
108M	quarkus-integration-test-main-999-SNAPSHOT-runner-nodebug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug
182M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug

$ du -hs quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
75M	quarkus-integration-test-main-999-SNAPSHOT-runner-debug.debug
```

Inspecting `quarkus-integration-test-main-999-SNAPSHOT-runner-debug` and
`quarkus-integration-test-main-999-SNAPSHOT-runner-debug` with
`readelf --debug-dump=info` we observe that both files are containing
debug information while only the latter should contain them.

(cherry picked from commit f69d98d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants