Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to docker-maven-plugin configuration so M1 mac build runs (almost) clean with -Dstart-containers #25648

Closed
wants to merge 14 commits into from

Conversation

holly-cummins
Copy link
Contributor

Fixes #25428. This is a big PR, sorry! I kept fixing-one-more-thing before doing the PR. I've left it unsquashed, partly because the history for each individual file is pretty clean and descriptive, and partly because I made the mistake of merging upstream into my long-running branch before squashing and so there's big merge commits in which are complicated to squash around.

Here's what's changed:

DB2

  • Set a platform override on M1 to pull the amd64 image (rather than trying to pull aarch64 and then dying). I’m not sure if this affects all M1 or just M1 + podman. (Most other database did not require the explicit platform flag, even if they did not have an arm64 platform-specific image, but some did. I don't fully understand why.)

We may want to do something similar in container-build-jib-with-db2 and also the dev services but I have not touched those.

MariaDb

I had a problem with tcp-ping based readiness checks for mariadb. Only one test used it, so we can switch to using an sqladmin ping check. Several tests do a ping inside a <postStart> in the <wait>. However, a <postStart> inside a <wait> isn’t a true readiness check. The plugin will always wait for the <time>and then run the`. A better (but more verbose) option is to use a container health check.

This was only strictly necessary where the tests were using a tcp ping to check for readiness (which did not work on podman), but I think it’s an improvement elsewhere.

This sped up the tests quite a bit for me since it will cut the <time> wait short if the database is ready before the timeout pops. I saw a factor of two improvement on tests which used mariadb, but on slower hardware the effect may not be so noticeable.

Be aware when testing healthcheck scripts, caching of named images can cause apparently non-deterministic behaviour. Change the name if making changes.

I also found that trying to extract common code to parent poms caused tests not to be able to connect to the containers. I don’t understand the reason - perhaps some crucial fabric8 state ends up in a target directory at the parent level?

MSSQL

This one was challenging. The MS SQL image segfaults on ARM. The best option seemed to be the widely recommended workaround of using Azure Edge Server. The two products are not functionally interchangeable, but there is enough similarity to make it worth running this way.

Note: this change will also affect dev services if built locally, but not from the official releases unless they are built on M1. I think that’s kind of weird but kind of ok, but I’m open to objections if people think that's too inconsistent.

MYSQL

Like DB2, I got a 404 if I tried to pull the image without explicitly specifying the architecture. However, just specifying the architecture wasn’t enough for the container to start. It would work on the command line with podman, but not in fabric8 with podman.

Following https://www.emmanuelgautier.com/blog/mysql-docker-arm-m1 and https://stackoverflow.com/questions/65456814/docker-apple-silicon-m1-preview-mysql-no-matching-manifest-for-linux-arm64-v8 I instead changed the image name to one with better behaviour on arm.

Oracle

I hit fabric8io/docker-maven-plugin#1369 with the log readiness check. Based on the discussion, I wonder if it’s a timing issue which my machine exposes by being fast. (I saw similar issues with <log> experimenting with the mariadb readiness check.)

ERRO[20873] accept tcp [::]:1521: use of closed network connection
[ERROR] DOCKER> IO Error while requesting logs: org.apache.http.ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected Thread-6

The image we use has a healthcheck configured, so we can just use that with fabric8.

I also hit a more serious problem, which is that I could not get the database to start on M1. It seemed to be the same issue as https://stackoverflow.com/questions/68605011/oracle-12c-docker-setup-on-apple-m1. Others there reported success with qemu, which podman is using under the covers. I tried with podman 4.0.3 and podman 4.1, without success.

Reluctantly, I disabled the tests. We should re-evaluate with later versions of the Oracle database in the future.

Elasticsearch

According to https://stackoverflow.com/questions/65962810/m1-mac-issue-bringing-up-elasticsearch-cannot-run-jdk-bin-java, we need a version bump to get M1 support, up to 7.10.2. I had to go up two minor increments to 7.11.0 on the logstash version to get arm64 working.

I notice we don’t have a centralised elasticsearch version, unlike other dev services. Is this something we want to change?

I also hit a comedy problem, which I'm noting here in case it helps others. I got a port conflict on port 5000. The process running on this port turns out to be an AirPlay server. You can deactivate it in System Preferences › Sharing, and unchecking AirPlay Receiver.

Container Image Invoker

I did not look at #25230, so those tests still fail with podman on M1.

Other issues

I had to restart my podman machine to resolve some unexplained failures. I guess it had got tired. This could affect runs of the whole test suite if it runs end to end.

@@ -114,6 +114,8 @@
</plugins>
</build>
</profile>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no diff ? remove from commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you!

@@ -168,29 +185,28 @@
<date>default</date>
<color>cyan</color>
</log>
<!-- Speed things up a bit by not actually flushing writes to disk -->
Speed things up a bit by not actually flushing writes to disk -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to remove this comment start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not. :)
I think my IDE is trolling me, since at one point I was running that test successfully. Will fix.

@maxandersen maxandersen changed the title Updates to fabric8 configuration so M1 mac build runs (almost) clean with -Dstart-containers Updates to docker-maven-plugin configuration so M1 mac build runs (almost) clean with -Dstart-containers May 18, 2022
@maxandersen
Copy link
Member

Changes looks sensible to me but @Sanne better off verifying - and I don't have a M1 :/

I did notice lots of non meaningfull whitespace changes - not sure if our formatter being bad or just your IDE being configured differently :) either way nothing major but the line count/diff would be a lot less without them.

@holly-cummins
Copy link
Contributor Author

I noticed that too about the whitespace and was annoyed by it. I thought I'd followed our IDE setup instructions accurately and I was getting a lot of line wrapping, so I made some more updates to my settings beyond what was in CONTRIBUTING.MD, but I probably should have gone back and manually reverted all of the wrapping that did happen. (I reverted some, but not all.)

It might be worth spending an hour or two with a bunch of us in a room with a new starter, seeing if we've got compatible settings + docs across a few IDEs. I think XML is probably hardest to get right and consistent because the tooling support is most fragmented.

@Sanne
Copy link
Member

Sanne commented May 18, 2022

looks great! I'll need to run it on my workstation at home though - sorry not possible today.

@yrodiere are you ok with the Elasticsearch version changes?

@yrodiere
Copy link
Member

yrodiere commented May 18, 2022

TL;DR: We'd better use Elasticsearch 7.16, and I'll submit another PR to centralize the version used in every integration tests, because currently it's a bit messy :)

@yrodiere are you ok with the Elasticsearch version changes?

7.11 is affected by at least one bug that is relevant to Hibernate Search: elastic/elasticsearch#53127

While I don't think we'll experience that bug in Quarkus integration tests, I'd rather upgrade directly to 7.16, which is the version used in Hibernate Search 6.1 integration tests [see below, I'll send a PR to upgrade].

However, be aware that the version bumps in this PR are apparently in integration-tests/logging-gelf/pom.xml only. I have no opinion on those as I barely know about the use case. We'd have to ask @loicmathieu .

I'm surprised the build passes without bumping the version used for Hibernate Search integration tests, though?

I notice we don’t have a centralised elasticsearch version, unlike other dev services. Is this something we want to change?

Please note that, at least when it comes to Elasticsearch you updated the configuration of (some) integration tests, not dev services.

integration-tests/logging-gelf apparently uses its own version, which I think we should change, indeed.

Other extensions rely on Maven properties defined in build-parent, and I think logging-gelf should do the same:

        <!-- Defaults for integration tests -->
        <elasticsearch-server.version>7.10.0</elasticsearch-server.version>
        <elasticsearch.image>docker.elastic.co/elasticsearch/elasticsearch-oss:${elasticsearch-server.version}</elasticsearch.image>
        <elasticsearch.protocol>http</elasticsearch.protocol>
        <opensearch-server.version>1.2.3</opensearch-server.version>
        <opensearch.image>docker.io/opensearchproject/opensearch:${opensearch-server.version}</opensearch.image>
        <opensearch.protocol>http</opensearch.protocol>

Elasticsearch dev services use a version hardcoded in their configuration class:

That last part is indeed at odds with the more sophisticated solutions we have for relational datasources, which relies on a properties file using interpolation to inject Maven properties:

.parse(imageName.orElseGet(() -> ConfigureUtil.getDefaultImageNameFor("postgresql")))

... but it's consistent with Kafka dev services, for example:

So to sum up, it's a mess :)

I'll send a PR to make the IT configuration more consistent, and upgrade to 7.16; feel free to merge this PR first and I'll handle the conflicts.

I also added a comment to #25486 so that we eventually solve the problem of Elasticsearch dev services.

I also hit a comedy problem, which I'm noting here in case it helps others. I got a port conflict on port 5000

That's probably related to logging-gelf rather than Elasticsearch itself, as Elasticserach uses port 9200 (and sometimes 9300).

@loicmathieu
Copy link
Contributor

logging-gelf is an extension that allows to send log records to an external log system in the GELF format. It is capable to send them to at least Graylog, EFK and ELK.
The integration test starts an Elasticsearch and a Logstash to test that the ELK scenario works.

The library sends TCP packet directly, so it is not tied to any Elasticsearch versio. Feel free to change the version and refactor it to share a common version with the other Elasticsearch based extensions, it predates the integration test of Elasticsearch that's why it didn't use the common version from the parent.

The port 5000 is from Logstash, I don't remember, as we use quarkus.log.handler.gelf.port=12201 in the config file it seems not to be used so you may be able to un-bind it to prevent the conflict.

@Sanne
Copy link
Member

Sanne commented May 18, 2022

I'll send a PR to make the IT configuration more consistent, and upgrade to 7.16; feel free to merge this PR first and I'll handle the conflicts.

Thanks!

@quarkus-bot
Copy link

quarkus-bot bot commented May 18, 2022

Failing Jobs - Building 6ba6826

Status Name Step Failures Logs Raw logs
Gradle Tests - JDK 11 Build Failures Logs Raw logs
Gradle Tests - JDK 11 Windows Build Failures Logs Raw logs
JVM Tests - JDK 11 Build Failures Logs Raw logs
JVM Tests - JDK 11 Windows Build Failures Logs Raw logs
JVM Tests - JDK 17 Build Failures Logs Raw logs
Native Tests - Data2 Build Failures Logs Raw logs
Native Tests - Data7 Build Failures Logs Raw logs
Native Tests - HTTP Build Failures Logs Raw logs

Full information is available in the Build summary check run.

Failures

⚙️ Gradle Tests - JDK 11 #

- Failing: integration-tests/gradle 

📦 integration-tests/gradle

io.quarkus.gradle.devmode.MultiSourceProjectDevModeTest.main line 22 - More details - Source on GitHub

org.awaitility.core.ConditionTimeoutException: Condition with lambda expression in io.quarkus.test.devmode.util.DevModeTestUtils that uses java.util.function.Supplier, java.util.function.Supplierjava.util.concurrent.atomic.AtomicReference, java.util.concurrent.atomic.AtomicReferencejava.lang.String, java.lang.Stringboolean was not fulfilled within 1 minutes.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.CallableCondition.await(CallableCondition.java:78)

⚙️ Gradle Tests - JDK 11 Windows #

- Failing: integration-tests/gradle 

📦 integration-tests/gradle

io.quarkus.gradle.devmode.MultiSourceProjectDevModeTest.main line 22 - More details - Source on GitHub

org.awaitility.core.ConditionTimeoutException: Condition with lambda expression in io.quarkus.test.devmode.util.DevModeTestUtils that uses java.util.function.Supplier, java.util.function.Supplierjava.util.concurrent.atomic.AtomicReference, java.util.concurrent.atomic.AtomicReferencejava.lang.String, java.lang.Stringboolean was not fulfilled within 1 minutes.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
	at org.awaitility.core.CallableCondition.await(CallableCondition.java:78)

⚙️ JVM Tests - JDK 11 #

- Failing: integration-tests/rest-client 

📦 integration-tests/rest-client

io.quarkus.it.rest.client.selfsigned.ExternalSelfSignedTestCase.should_accept_self_signed_certs_java_url line 29 - More details - Source on GitHub

java.lang.AssertionError: 
1 expectation failed.
Expected status code <200> but was <500>.

⚙️ JVM Tests - JDK 11 Windows #

- Failing: integration-tests/rest-client 

📦 integration-tests/rest-client

io.quarkus.it.rest.client.selfsigned.ExternalSelfSignedTestCase.should_accept_self_signed_certs_java_url line 29 - More details - Source on GitHub

java.lang.AssertionError: 
1 expectation failed.
Expected status code <200> but was <500>.

⚙️ JVM Tests - JDK 17 #

- Failing: integration-tests/rest-client 

📦 integration-tests/rest-client

io.quarkus.it.rest.client.selfsigned.ExternalSelfSignedTestCase.should_accept_self_signed_certs_java_url line 29 - More details - Source on GitHub

java.lang.AssertionError: 
1 expectation failed.
Expected status code <200> but was <500>.

⚙️ Native Tests - Data2 #

- Failing: integration-tests/jpa-db2 

📦 integration-tests/jpa-db2

Failed to execute goal io.fabric8:docker-maven-plugin:0.39.1:start (docker-start) on project quarkus-integration-test-jpa-db2: I/O Error


⚙️ Native Tests - Data7 #

- Failing: integration-tests/hibernate-reactive-db2 integration-tests/reactive-db2-client 

📦 integration-tests/hibernate-reactive-db2

Failed to execute goal io.fabric8:docker-maven-plugin:0.39.1:start (docker-start) on project quarkus-integration-test-hibernate-reactive-db2: I/O Error

📦 integration-tests/reactive-db2-client

Failed to execute goal io.fabric8:docker-maven-plugin:0.39.1:start (docker-start) on project quarkus-integration-test-reactive-db2-client: I/O Error


⚙️ Native Tests - HTTP #

- Failing: integration-tests/rest-client 

📦 integration-tests/rest-client

io.quarkus.it.rest.client.selfsigned.ExternalSelfSignedITCase.should_accept_self_signed_certs_java_url - More details - Source on GitHub

java.lang.AssertionError: 
1 expectation failed.
Expected status code <200> but was <500>.

@yrodiere
Copy link
Member

I'll send a PR to make the IT configuration more consistent, and upgrade to 7.16; feel free to merge this PR first and I'll handle the conflicts.

Here you go: #25663

@geoand
Copy link
Contributor

geoand commented May 19, 2022

The rest client failures can be addressed by rebasing on main

@Sanne
Copy link
Member

Sanne commented May 23, 2022

It's a bit big, and I'm struggling to follow the diffs because of the included merge commits. It's also conflicting now with main...

@holly-cummins you think you could rebase it, and perhaps split it in smaller PRs ? Many tests failed: it might be useful to narrow it down, and also merge things in smaller bites.

@Sanne
Copy link
Member

Sanne commented May 23, 2022

at least one of the errors seems related; found this in section "Data7"

Error: Failed to execute goal io.fabric8:docker-maven-plugin:0.39.1:start (docker-start) on project quarkus-integration-test-hibernate-reactive-db2: I/O Error: Unable to pull 'docker.io/ibmcom/db2:11.5.7.0a' from registry 'docker.io' : {"message":""${os.family}" is an invalid component of "${os.family}/amd64": platform specifier component must match "^[A-Za-z0-9_-]+$": invalid argument"} (Bad Request: 400) -> [Help 1]

Perhaps keep DB2 changes to the side, we can try merging the other goodies first?

@yrodiere
Copy link
Member

@holly-cummins you think you could rebase it, and perhaps split it in smaller PRs

FWIW you can probably skip the part about Elasticsearch completely, now: #25663 was merged and it includes an upgrade to Elasticsearch 7.16, so more recent than what you did here and what you need for M1.

@holly-cummins
Copy link
Contributor Author

It's a bit big, and I'm struggling to follow the diffs because of the included merge commits. It's also conflicting now with main...

@holly-cummins you think you could rebase it, and perhaps split it in smaller PRs ? Many tests failed: it might be useful to narrow it down, and also merge things in smaller bites.

Will do. I synced to main just before creating the PR, but that created its own problems because of the merge commit. (I normally prefer to rebase so I'm not sure why I did a merge commit this time.)

And because it's such a big PR it gets merge conflicts quickly. :(

@holly-cummins
Copy link
Contributor Author

OK, first smaller PR in for @Sanne, with the Maria DB changes: #25805
It's still touching eight files, but it's the same change duplicated across all of the files. I'll wait until it's merged to look at MYSQL, MSSQL, Oracle, and Elasticsearch (with the latter hopefully being a no-op because of @yrodiere 's changes). Otherwise I'll get a big teetering pile of interconnected PRs, all trying to change the parent pom.

@Sanne
Copy link
Member

Sanne commented Jun 10, 2022

@holly-cummins should we close this one?

@holly-cummins
Copy link
Contributor Author

Not quite yet! One more sub-PR to go (DB2), which I'm testing locally now. And then I need to validate elasticsearch, and then do a final run through to make sure I didn't miss anything. This is what's gone in:

@holly-cummins
Copy link
Contributor Author

holly-cummins commented Jun 10, 2022

Investigation continues. Now, the DB2 tests pass without any extra specifying the image name (perhaps I updated podman and that made a difference?). This is good news, because as @Sanne points out, there is no maven property called ${os.family}, and what comes out of ${os.name} is not useful for selecting docker images. Doing the manual override on M1 would have been ugly code, so it's nice if (maybe?) we don't have to.

However, the gelf tests are failing even with @yrodiere 's changes, so I need to figure out why.

@Sanne
Copy link
Member

Sanne commented Jun 10, 2022

Remember - rather than wasting too much time on DB2 I'd be ok to move all related tests to a different repository.
It would also help for other reasons, such as the size of the project - and help a little for newcomers needing to import it for the first time.

@holly-cummins
Copy link
Contributor Author

Noted re. DB2 moving out of the mainline test suite, @Sanne. I reckon even if we move them to a separate repo (or guard them some other way, like with a -Dslow profile), we'd want them to pass on M1. However, my fix was pretty ugly, since it had to explicitly set an architecture that the container provider should be setting for us. And since I now can't make the DB2 tests fail, I agree we should leave it.

I have a feeling that maybe things are working locally now because of some caching (a bit like testcontainers/testcontainers-java#5275 but for architectures?). If so, the DB2 M1 issues may rear their head again. But if things fail in an M1 Actions build, we can revisit this fix.

And apart from the DB2-mystery-non-fix, and one more volume access modifier I had to add, and #25230, this PR has now been dissected and absorbed. So I'll close.

@quarkus-bot quarkus-bot bot added the triage/invalid This doesn't seem right label Jun 24, 2022
@Sanne
Copy link
Member

Sanne commented Jun 24, 2022

Noted re. DB2 moving out of the mainline test suite, @Sanne. I reckon even if we move them to a separate repo (or guard them some other way, like with a -Dslow profile), we'd want them to pass on M1.

Yes of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependencies Pull requests that update a dependency file area/hibernate-orm Hibernate ORM area/logging area/persistence OBSOLETE, DO NOT USE area/reactive-sql-clients triage/invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve podman compatibility on M1 for build test suites which use fabric8 docker-maven-plugin
6 participants