Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand aarch64 support to all CI images #2380

Merged
merged 16 commits into from
Nov 22, 2022
Merged

Conversation

lloeki
Copy link
Member

@lloeki lloeki commented Nov 17, 2022

What does this PR do?

Add first-class support for development on aarch64-linux.

Motivation

Working from arm64-darwin machines becomes an increasingly common thing, and presumably we want to be able to run all Ruby versions on that.

Our docker-compose relies on images that aren't currently built for aarch64-linux, this commit changes that.

Additional Notes

Special cases:

  • Removed protoc from all Dockerfiles since it appears to be unused now
  • ruby:2.1 has no aarch64 image published, Rebuilt it from a Debian Stretch base
  • ruby:2.2 has an aarch64 image published but is Debian Jessie based, and Jessie had aarch64 upon release but as an "extra" arch, which is not part of LTS, so packages vanished when Debian moved the repos to archive, so any apt-get irrevocably fails. Rebuilt from a Debian Stretch base
  • JRuby 9.2 has no aarch64 image published, was based on openjdk:8-jre, I rebuilt both from the last openjdk:8-jre which now has aarch64 images, these being based on bullseye (there are also Buster-based images upstream if it turns out it matters)
  • TruffleRuby 21 has no aarch64 image published, and it wasn't an official one anyway. Added TruffleRuby 22 from the official Oracle registry, and removed TruffleRuby 21

The last bit of adjustment is Stretch-based ones (so 2.1, 2.2, 2.3, 2.4) which do build until they fail to run the first aarch64 release of docker-compose due to an outdated glibc version. This has been worked around by installing it from stretch-backports, which has a 1.21 version close to 1.29 we presumably want.

2.4 had a note about staying on Stretch instead of Buster because of sequel, which may not be relevant anymore. It was initially moved to Buster to work around the docker-compose failure, but is currently moved back to Stretch using the above backport alternative.

In all cases attempts were made to stay as close as possible to the original Dockerfiles for each version and their base. When things are made to differ it's because there seems to be no realistic alternative. There are opportunistic moves to streamline and clean all of this but this was kept out of scope of this commit to maximise consistency and ease of reviewing.

All images are now building and running for both x86_64 and aarch64.

How to test the change?

  • Build locally, see .circleci/images/primary/README.md
  • On an arm64 machine (or an x86_64 machine using qemu via https://github.com/tonistiigi/binfmt), and specifying the platform to docker with --platform linux/aarch64
  • Final testing needs publishing images for CircleCI to pick them up, so results there won't be relevant. But see below.

A subsequent commit will also add a GitHub Workflow that will build these images in CI, with an option to publish them on the GitHub Container Registry at ghcr.io for everyone (including CircleCI) to use.

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested a few of the images on my x86 mac and it works great!

For CI, you can push images to your own dockerhub or GH registry account and change .circleci/config.yml for testing.

@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch 4 times, most recently from cfa574d to c086b03 Compare November 18, 2022 12:59
@codecov-commenter
Copy link

codecov-commenter commented Nov 18, 2022

Codecov Report

Merging #2380 (23e82b5) into master (4114496) will increase coverage by 0.01%.
The diff coverage is 98.78%.

@@            Coverage Diff             @@
##           master    #2380      +/-   ##
==========================================
+ Coverage   98.34%   98.36%   +0.01%     
==========================================
  Files        1101     1102       +1     
  Lines       58940    59222     +282     
==========================================
+ Hits        57967    58252     +285     
+ Misses        973      970       -3     
Impacted Files Coverage Δ
.../datadog/appsec/contrib/sinatra/gateway/watcher.rb 100.00% <ø> (ø)
lib/datadog/appsec/extensions.rb 88.05% <83.33%> (-0.47%) ⬇️
lib/datadog/appsec/response.rb 88.46% <88.46%> (ø)
lib/datadog/appsec/assets.rb 100.00% <100.00%> (+5.88%) ⬆️
lib/datadog/appsec/configuration.rb 100.00% <100.00%> (ø)
lib/datadog/appsec/configuration/settings.rb 81.39% <100.00%> (+0.44%) ⬆️
...ib/datadog/appsec/contrib/rack/reactive/request.rb 92.50% <100.00%> (+0.39%) ⬆️
lib/datadog/appsec/contrib/rack/request.rb 97.14% <100.00%> (+1.14%) ⬆️
...dog/appsec/contrib/rack/request_body_middleware.rb 100.00% <100.00%> (+5.55%) ⬆️
.../datadog/appsec/contrib/rack/request_middleware.rb 100.00% <100.00%> (+1.47%) ⬆️
... and 23 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch 2 times, most recently from 77bd165 to 12d7cb7 Compare November 18, 2022 14:10
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes, but this is looking nice!

One thing I would suggest is to avoid a big PR that tries to do everything in one big step. For instance -- why not merge the Dockerfile changes and then add the rest of the improvements separately?

.circleci/images/primary/Dockerfile-2.7.3 Show resolved Hide resolved
.circleci/images/primary/Dockerfile-2.7.3 Show resolved Hide resolved
.circleci/images/primary/Dockerfile-2.1.10 Outdated Show resolved Hide resolved
.circleci/images/primary/Dockerfile-2.1.10 Outdated Show resolved Hide resolved
.circleci/images/primary/Dockerfile-2.1.10 Outdated Show resolved Hide resolved
.circleci/images/primary/Dockerfile-2.1.10 Show resolved Hide resolved
.circleci/images/primary/Dockerfile-2.2.10 Show resolved Hide resolved
.circleci/images/primary/Dockerfile-2.4.10 Outdated Show resolved Hide resolved
.circleci/images/primary/Dockerfile-2.6.7 Show resolved Hide resolved
@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch 8 times, most recently from 1291323 to 23e82b5 Compare November 18, 2022 18:55
@lloeki
Copy link
Member Author

lloeki commented Nov 18, 2022

One thing I would suggest is to avoid a big PR that tries to do everything in one big step. For instance -- why not merge the Dockerfile changes and then add the rest of the improvements separately?

Good point. Let's have the updates and cleanups proposed as separate PRs and keep this a minimal drop-in replacement.

I do want to move docker compose and CircleCI to use these images ASAP though, so that it's all synchronised: it doesn't make much sense to have these rely on images that come from this repo but the Dockerfiles don't exist anymore (it'd be awkward if a fix, update, or change is suddenly needed).

I've added changes to that end in this PR.

@lloeki
Copy link
Member Author

lloeki commented Nov 18, 2022

CircleCI failures so far:

Ruby 2.1 and Ruby 2.2

libmysqlclient.so.18: cannot open shared object file: No such file or directory - /usr/local/bundle/gems/mysql2-0.3.21/lib/mysql2/mysql2.so

bundle exec appraisal ruby-2.1.10-contrib rake spec:mysql2
>> BUNDLE_GEMFILE=/app/gemfiles/ruby_2.1.10_contrib.gemfile bundle exec rake spec:mysql2
/usr/local/bin/ruby -I/usr/local/bundle/gems/rspec-core-3.12.0/lib:/usr/local/bundle/gems/rspec-support-3.12.0/lib /usr/local/bundle/gems/rspec-core-3.12.0/exe/rspec --pattern spec/datadog/tracing/contrib/mysql2/\*\*/\*_spec.rb
warning suppressing gem not available, external library warnings will be displayed

An error occurred while loading ./spec/datadog/tracing/contrib/mysql2/patcher_spec.rb.
Hint: Install the `did_you_mean` gem in order to provide suggestions for similarly named files.
Failure/Error: require 'mysql2'

LoadError:
  libmysqlclient.so.18: cannot open shared object file: No such file or directory - /usr/local/bundle/gems/mysql2-0.3.21/lib/mysql2/mysql2.so
# /usr/local/bundle/gems/mysql2-0.3.21/lib/mysql2.rb:31:in `require'
# /usr/local/bundle/gems/mysql2-0.3.21/lib/mysql2.rb:31:in `<top (required)>'
# ./spec/datadog/tracing/contrib/mysql2/patcher_spec.rb:10:in `require'
# ./spec/datadog/tracing/contrib/mysql2/patcher_spec.rb:10:in `<top (required)>'

Failure/Error: raise 'No database adapter found!'

bundle exec appraisal ruby-2.1.10-rails4-mysql2 rake spec:rails
>> BUNDLE_GEMFILE=/app/gemfiles/ruby_2.1.10_rails4_mysql2.gemfile bundle exec rake spec:rails
/usr/local/bin/ruby -I/usr/local/bundle/gems/rspec-core-3.12.0/lib:/usr/local/bundle/gems/rspec-support-3.12.0/lib /usr/local/bundle/gems/rspec-core-3.12.0/exe/rspec --pattern spec/datadog/tracing/contrib/rails/\*\*/\*_spec.rb  --exclude-pattern spec/datadog/tracing/contrib/rails/\*\*/\*\{active_job,disable_env,redis_cache,auto_instrument,semantic_logger\}\*_spec.rb
warning suppressing gem not available, external library warnings will be displayed
/usr/local/bundle/gems/tzinfo-1.2.9/lib/tzinfo/zoneinfo_timezone_info.rb:506: warning: shadowing outer local variable - i
pg gem not found, trying another connector
mysql2 gem not found, trying another connector
activerecord-jdbcpostgresql-adapter gem not found, trying another connector
activerecord-jdbcmysql-adapter gem not found, trying another connector

An error occurred while loading ./spec/datadog/tracing/contrib/rails/action_controller_spec.rb.
Failure/Error: raise 'No database adapter found!'

RuntimeError:
  No database adapter found!
# ./spec/datadog/tracing/contrib/rails/support/database.rb:34:in `load_adapter!'
# ./spec/datadog/tracing/contrib/rails/rails_helper.rb:17:in `<top (required)>'
# ./spec/datadog/tracing/contrib/rails/action_controller_spec.rb:3:in `require'
# ./spec/datadog/tracing/contrib/rails/action_controller_spec.rb:3:in `<top (required)>'

JRuby 9.2.8.0

SIGSEGV on Ethon

Datadog::Tracing::Contrib::Ethon::MultiPatch
  #add
    multi already performing
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f8e3c06445b, pid=1717, tid=0x00007f8e5abaa700
#
# JRE version: OpenJDK Runtime Environment (8.0_342-b07) (build 1.8.0_342-b07)
# Java VM: OpenJDK 64-Bit Server VM (25.342-b07 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libcurl.so.4+0x5645b]
#
# Core dump written. Default location: /app/core or core.1717
#
# An error report file with more information is saved as:
# /app/hs_err_pid1717.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted (core dumped)
/opt/jruby/bin/jruby -I/usr/local/bundle/jruby/2.5.0/gems/rspec-core-3.12.0/lib:/usr/local/bundle/jruby/2.5.0/gems/rspec-support-3.12.0/lib /usr/local/bundle/jruby/2.5.0/gems/rspec-core-3.12.0/exe/rspec --pattern spec/datadog/tracing/contrib/ethon/\*\*/\*_spec.rb failed
rake aborted!

SIGILL on Httprb

Datadog::Tracing::Contrib::Httprb::Instrumentation
  instrumented request
    behaves like instrumented request
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x00007f650101ffff, pid=1836, tid=0x00007f651757d700
#
# JRE version: OpenJDK Runtime Environment (8.0_342-b07) (build 1.8.0_342-b07)
# Java VM: OpenJDK 64-Bit Server VM (25.342-b07 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# Segmentation fault (core dumped)
/opt/jruby/bin/jruby -I/usr/local/bundle/jruby/2.5.0/gems/rspec-core-3.12.0/lib:/usr/local/bundle/jruby/2.5.0/gems/rspec-support-3.12.0/lib /usr/local/bundle/jruby/2.5.0/gems/rspec-core-3.12.0/exe/rspec --pattern spec/datadog/tracing/contrib/httprb/\*\*/\*_spec.rb failed
rake aborted!

JRuby 9.2.18.0

JSON::ParserError: unexpected token at '' on bundle exec appraisal jruby-9.2.18.0-contrib rake spec:ethon

  23) Ethon integration tests with single Multi request behaves like instrumented request instrumented request distributed tracing default behaves like propagating distributed headers propagates the headers
      Failure/Error: headers = JSON.parse(response.body)['headers']
      
      JSON::ParserError:
        unexpected token at ''
      Shared Example Group: "propagating distributed headers" called from ./spec/datadog/tracing/contrib/ethon/shared_examples.rb:165
      Shared Example Group: "instrumented request" called from ./spec/datadog/tracing/contrib/ethon/integration_test_spec.rb:96
      # json/ext/Parser.java:238:in `parse'
      # ./spec/datadog/tracing/contrib/ethon/shared_examples.rb:155:in `block in <main>'
      # ./spec/datadog/tracing/contrib/ethon/integration_context.rb:82:in `block in <main>'
      # ./spec/datadog/tracing/contrib/support/tracer_helpers.rb:96:in `block in TracerHelpers'
      # ./spec/spec_helper.rb:216:in `block in <main>'
      # ./spec/spec_helper.rb:108:in `block in <main>'
      # /usr/local/bundle/jruby/2.5.0/gems/webmock-3.13.0/lib/webmock/rspec.rb:37:in `block in <main>'

HTTP::ConnectionError: error reading from socket: Could not parse data on bundle exec appraisal jruby-9.2.18.0-contrib rake spec:httprb, specifically all spec examples ofDatadog::Tracing::Contrib::Httprb::Instrumentation

57) Datadog::Tracing::Contrib::Httprb::Instrumentation instrumented request behaves like instrumented request created span response has internal server error status has tag with status code
      Failure/Error: res = super(req, options)
      
      HTTP::ConnectionError:
        error reading from socket: Could not parse data
      Shared Example Group: "instrumented request" called from ./spec/datadog/tracing/contrib/httprb/instrumentation_spec.rb:291
      # /usr/local/bundle/jruby/2.5.0/gems/http-4.4.1/lib/http/response/parser.rb:31:in `add'
      # /usr/local/bundle/jruby/2.5.0/gems/http-4.4.1/lib/http/connection.rb:217:in `read_more'
      # /usr/local/bundle/jruby/2.5.0/gems/http-4.4.1/lib/http/connection.rb:103:in `read_headers!'
      # /usr/local/bundle/jruby/2.5.0/gems/http-4.4.1/lib/http/client.rb:75:in `perform'
      # /usr/local/bundle/jruby/2.5.0/gems/webmock-3.13.0/lib/webmock/http_lib_adapters/http_rb/client.rb:8:in `block in perform'
      # /usr/local/bundle/jruby/2.5.0/gems/webmock-3.13.0/lib/webmock/http_lib_adapters/http_rb/webmock.rb:51:in `perform'
      # /usr/local/bundle/jruby/2.5.0/gems/webmock-3.13.0/lib/webmock/http_lib_adapters/http_rb/webmock.rb:10:in `exec'
      # /usr/local/bundle/jruby/2.5.0/gems/webmock-3.13.0/lib/webmock/http_lib_adapters/http_rb/client.rb:8:in `perform'
      # ./lib/datadog/tracing/contrib/httprb/instrumentation.rb:41:in `block in perform'
      # ./lib/datadog/tracing/trace_operation.rb:193:in `block in measure'
      # ./lib/datadog/tracing/span_operation.rb:151:in `measure'
      # ./lib/datadog/tracing/trace_operation.rb:193:in `measure'
      # ./lib/datadog/tracing/tracer.rb:382:in `start_span'
      # ./lib/datadog/tracing/tracer.rb:162:in `block in trace'
      # ./lib/datadog/tracing/context.rb:45:in `activate!'
      # ./lib/datadog/tracing/tracer.rb:161:in `trace'
      # ./lib/datadog/tracing.rb:18:in `trace'
      # ./lib/datadog/tracing/contrib/httprb/instrumentation.rb:27:in `perform'
      # /usr/local/bundle/jruby/2.5.0/gems/http-4.4.1/lib/http/client.rb:31:in `request'
      # /usr/local/bundle/jruby/2.5.0/gems/http-4.4.1/lib/http/chainable.rb:75:in `request'
      # /usr/local/bundle/jruby/2.5.0/gems/http-4.4.1/lib/http/chainable.rb:27:in `post'
      # ./spec/datadog/tracing/contrib/httprb/instrumentation_spec.rb:83:in `block in response'
      # ./spec/datadog/tracing/contrib/httprb/instrumentation_spec.rb:155:in `block in <main>'
      # ./spec/datadog/tracing/contrib/httprb/instrumentation_spec.rb:70:in `block in <main>'
      # ./spec/datadog/tracing/contrib/support/tracer_helpers.rb:96:in `block in TracerHelpers'
      # ./spec/spec_helper.rb:216:in `block in <main>'
      # ./spec/spec_helper.rb:108:in `block in <main>'
      # /usr/local/bundle/jruby/2.5.0/gems/webmock-3.13.0/lib/webmock/rspec.rb:37:in `block in <main>'
      # ------------------
      # --- Caused by: ---
      # IOError:
      #   Could not parse data
      #   /usr/local/bundle/jruby/2.5.0/gems/http-4.4.1/lib/http/response/parser.rb:31:in `add'

@lloeki lloeki marked this pull request as ready for review November 18, 2022 20:45
@lloeki lloeki requested a review from a team November 18, 2022 20:45
Working from `arm64-darwin` machines becomes an increasingly common
thing, and presumably we want to be able to run all Ruby versions on
that.

Our `docker-compose` relies on images that aren't currently built for
`aarch64-linux`, this commit changes that.

Special cases:

* Removed `protoc` from all `Dockerfile`s since it appears to be unused now
* `ruby:2.1` has no aarch64 image published, Rebuilt it from a Debian
  Stretch base
* `ruby:2.2` has an aarch64 image published but is Debian Jessie based,
  and Jessie had aarch64 upon release but as an "extra" arch, which is
  not part of LTS, so packages vanished when Debian moved the repos to
  `archive`, so any `apt-get` irrevocably fails. Rebuilt from a Debian
  Stretch base
* JRuby 9.2 has no aarch64 image published, was based on
  `openjdk:8-jre`, I rebuilt both from the last `openjdk:8-jre` which
  now has aarch64 images, these being based on bullseye (there are also
  Buster-based images upstream if it turns out it matters)
* TruffleRuby 21 has no aarch64 image published, and it wasn't an
  official one anyway. Added TruffleRuby 22 from the official
  Oracle registry, and removed TruffleRuby 21

The last bit of adjustment is Stretch-based ones (so 2.1, 2.2, 2.3, 2.4)
which do build until they fail to run the first aarch64 release of
`docker-compose` due to an outdated glibc version. This has been worked
around by installing it from `stretch-backports`, which has a 1.21 version close to
1.29 we presumably want.

2.4 had a note about staying on Stretch instead of Buster because
of `sequel`, which may not be relevant anymore. It was initially moved
to Buster to work around the `docker-compose` failure, but is currently
moved back to Stretch using the above backport alternative.

In all cases attempts were made to stay as close as possible to the original
`Dockerfile`s for each version and their base. When things are made to
differ it's because there seems to be no realistic alternative. There
are opportunistic moves to streamline and clean all of this but this was
kept out of scope of this commit to maximise consistency and ease of
reviewing.

All images are now building and running for both `x86_64` and `aarch64`.
We try to use GHA caching to go faster on subsequent runs.
GHA cache is limited to 10GB, we overshoot that by a lot with Docker
layers. Also, while cache appears to store and restore correctly, it
seems like it fails to match.

Therefore we stop using the GHA cache type and attempt to use images
pushed in the registry instead.
@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch from 67923a3 to 510be4c Compare November 18, 2022 20:50
It seems like there's an issue with JRE 8 + Debian bullseye, which is
what openjdk:8-jre is based on nowadays.

- jruby:9.2.18.0-jre8 was based on openjdk:8-jre-buster, moving it back
  to that
- jruby:9.2.8.0-jre was based on stretch but had no aarch64, so moving
  it to buster as well

Both `rake spec:httprb` and `rake spec:ethon` now pass on JRuby 9.2
MySQL 8.0.4 changed the authentication set at user creation, defaulting
to `caching_sha2_password`, and client libs predating 8.0.3, which
includes MariaDB, cannot connect. MariaDB finally updated but took some
time to catch up, so there's a time gap. Another failure mode is when
client libs are built against GnuTLS.

MariaDB libs are what is installed by default, so when running against a
recently pulled MySQL 8 server, outdated clients will fail to connect.

This affects Ruby images in the following way:

- 2.1, 2.2, 2.3, 2.4: Authentication plugin 'caching_sha2_password'
  cannot be loaded
- 2.5, 2.6, 2.7: RSA Encryption not supported - caching_sha2_password
  plugin was built with GnuTLS support

This works around it by reverting to `mysql_native_password` as a
default.

We are warned about this now:

> 'default_authentication_plugin' is deprecated and will be removed in a
> future release. Please use authentication_policy instead

But `authentication_policy` appears to work differently. This will have
to do to get us out of the wood.
@lloeki
Copy link
Member Author

lloeki commented Nov 19, 2022

Ruby 2.1 and 2.2

Errors above could not be reproduced locally: I have a feeling they may be due to a bundle mismatch. Indeed for these versions, binary gem builds were cached when installed on Jessie, but these are now Stretch-based.

I did find another interesting issue with MySQL 8.0.4 though regarding authentication plugins.

JRuby

These crashes appear to echo back to @ivoanjo's jruby/jruby#7033 and I reproduced with:

bundle exec appraisal jruby-9.2.8.0-contrib rake spec:httprb
bundle exec appraisal jruby-9.2.8.0-contrib rake spec:ethon
bundle exec appraisal jruby-9.2.18.0-contrib rake spec:httprb
bundle exec appraisal jruby-9.2.18.0-contrib rake spec:ethon

I too thought they were maybe tied to the JVM:

$ docker run --rm -it ivoanjo/docker-library:ddtrace_rb_jruby_9_2_8_0 java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

$ docker run --rm -it ivoanjo/docker-library:ddtrace_rb_jruby_9_2_18_0 java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

$ docker run --rm -it ghcr.io/datadog/dd-trace-rb/jruby:9.2.8.0-dd java -version
openjdk version "1.8.0_342"
OpenJDK Runtime Environment (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)

$ docker run --rm -it ghcr.io/datadog/dd-trace-rb/jruby:9.2.18.0-dd java -version
openjdk version "1.8.0_342"
OpenJDK Runtime Environment (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)

But that would be in these patch updates, not 8 vs 11... So checking the base userland:

$ docker run --rm -it ivoanjo/docker-library:ddtrace_rb_jruby_9_2_18_0 cat /etc/debian_version
10.9 # buster
$ docker run --rm -it ivoanjo/docker-library:ddtrace_rb_jruby_9_2_8_0 cat /etc/debian_version 
9.11 # stretch
$ docker run --rm -it ghcr.io/datadog/dd-trace-rb/jruby:9.2.8.0-dd cat /etc/debian_version
11.4 # bullseye
$ docker run --rm -it ghcr.io/datadog/dd-trace-rb/jruby:9.2.18.0-dd cat /etc/debian_version
11.4 # bullseye

In that issue, @ivoanjo commented:

Interesting. Thanks for extra info! It looks like I may have been too quick to dismiss this as a JRE 8 issue.

As it turns out, it may indeed depend not entirely on the JVM version but on the base image from which the JVM image was built.

It's hard to know what was exactly the version JRuby 9.2.19.0 (JRE 8) was built at the time. But today it is:

$ docker run --rm -it jruby:9.2.19.0-jre8 cat /etc/debian_version
11.1

And the following is consistent with @ivoanjo's images:

$ docker run --rm -it jruby:9.2.18.0-jre8 cat /etc/debian_version
10.9
$ docker run --rm -it jruby:9.2.8.0-jre cat /etc/debian_version
9.11

This may point to jre8 + debian 11 being a problem!

So, basing 9.2.8.0 on 8-jre-stretch... ethon => pass! httprb => pass!

And basing 9.2.18.0 on 8-jre-buster... ethon => pass! httprb => pass!

8-jre-stretch has not been updated in 3 years though, and lacks aarch64

Basing 9.2.8.0 on 8-jre-buster... ethon => pass! httprb => pass!

@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch 2 times, most recently from 935382b to d5caf13 Compare November 20, 2022 01:29
@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch from d5caf13 to 8f88885 Compare November 20, 2022 02:20
@lloeki
Copy link
Member Author

lloeki commented Nov 21, 2022

Added caching information to the published images, and now the jobs are FAST (when cached layers match).

Before:

Screenshot 2022-11-21 at 12 15 23

After:

Screenshot 2022-11-21 at 12 15 35

Images are still built on every push, but are not pushed.
@lloeki
Copy link
Member Author

lloeki commented Nov 21, 2022

I have a feeling they may be due to a bundle mismatch. Indeed for these versions, binary gem builds were cached when installed on Jessie, but these are now Stretch-based.

I experienced this in another form:

LoadError:
  Could not open library '/usr/local/bundle/jruby/2.5.0/gems/rdkafka-0.10.0/lib/rdkafka/../../ext/librdkafka.so' : /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /usr/local/bundle/jruby/2.5.0/gems/rdkafka-0.10.0/lib/rdkafka/../../ext/librdkafka.so)

Worked around to confirm my hunch by shooting the cache using a hack in lib/ddtrace/version.rb which is hashed to produce the CircleCI cache key, thus using a fresh cache yet not breaking anything for anyone else.

@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch from 0b6dd70 to 99cd2b3 Compare November 21, 2022 15:10
@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch from 99cd2b3 to 44188f4 Compare November 21, 2022 15:16
@lloeki
Copy link
Member Author

lloeki commented Nov 21, 2022

Interetingly enough, two system test apps are unrelatedly failing because they're based on Jessie, which has a dead GPG key. These images can be reused to fix the corresponding system test ruby apps.

@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch 2 times, most recently from a908596 to 3cddbe7 Compare November 21, 2022 16:41
Because CIRCLE_CACHE_VERSION is globally defined in CircleCI, bumping it
means starting from a blank universe for all subsequent job runs.

If a new PR that breaks binary compatibility ran + saved to cache first
then it would produce a matchable key from which other branches would
try to build, and fail for older jobs.

This also serves as a signal to developers that if they pull new images
past a change to this file they should clear their bundle cache volumes
and run bundle install and appraisal install again.
@lloeki lloeki force-pushed the lloeki/expand-aarch64-support branch from 3cddbe7 to ed862b7 Compare November 21, 2022 16:47
Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Thank you for also making it easier to rebuild images in the future!

Comment on lines +11 to +13
push:
branches:
- "**"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we don't want to keep push as one of the triggers, right? I'd think we only want manual (workflow_dispatch).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Loic may have been thinking of having docker cache handle this, but it may get slightly annoying -- e.g. we already have like 80 jobs that happen on push between github actions and circleci and it's getting somewhat confusing to find the ones that fail in the github UI, it was definitely not made to scale in that way...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to keep the push trigger.

it's getting somewhat confusing to find the ones that fail in the github UI

It's mostly confusing because CircleCI does not show up in the Checks tab of a PR so we have to check the summary :/ (I just jump to a tab to CircleCI to really check, much easier)

I do wish GitHub would sort jobs by state: (in the order: failed, cancelled, success, running, pending)

That said, since these images are essentially independent of dd-trace-rb and can be reused to great effect, I initially wanted these images to be defined and built outside of this repo, but @delner was not too keen about that.

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more notes, but I see no reason not to get this merged and leave other changes to follow-up PRs.

@@ -0,0 +1 @@
1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the contents of this file can be anything, what do you think of including an explanation at the top of what this file is for? I'm thinking that after we merge this a file called binary_version with a 1 sounds kinda mysterious but not very self-explanatory ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the whole file contents is checksum'd by CircleCI I refrained from adding non-semantic things as any change would also trigger a cache bust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm but right now coming in from the outside it's kinda cryptic what this file does -- you need to "reverse engineer" when it was added and maybe find this discussion :/

image: ivoanjo/docker-library:ddtrace_rb_2_5_9
image: ghcr.io/datadog/dd-trace-rb/ruby:2.5.9-dd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since the repo name already includes dd-trace-rb, consider perhaps dropping the -dd suffix at the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix is intentional, I will need to have neutral, "bare" ruby images to base myself upon at DataDog/system-tests-apps-ruby, the -dd suffix means that it's ruby but with some Datadog-specific additions.

Picture it like this:

  • ghcr.io/datadog/dd-trace-rb/ruby:2.5.9 should be like ruby:2.5.9 from Docker hub (except more up to date WRT bundler + base Debian).
  • ghcr.io/datadog/dd-trace-rb/ruby:2.5.9-dd is the above but with all the DataDog stuff added in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! Consider documenting that somewhere ;)

\
&& gem update --system "$RUBYGEMS_VERSION"

# TODO: update only once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my suggested alternative for this comment (that appears in a few places):

# TODO: This is not the version that ends up in the image; at the bottom of the Dockerfile we upgrade it to a later one. To be fixed in the future

I think it's important to spell out that we update more than once to different versions :)

Comment on lines +16 to +17
# To build AND push multi-arch images (but DON'T DO THAT IN GENERAL, unless e.g CI is down):
$ docker buildx build . --platform linux/x86_64,linux/aarch64 -f Dockerfile-3.1.1 -t ghcr.io/datadog/dd-trace-rb/ruby:3.1.1-dd --push
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since people tend to copy-paste commands and not always edit them, I suggest removing the --push from the actual command example, and instead just stating on the text "If you ever need to push (....) add --push to the above command".

Comment on lines +11 to +13
push:
branches:
- "**"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Loic may have been thinking of having docker cache handle this, but it may get slightly annoying -- e.g. we already have like 80 jobs that happen on push between github actions and circleci and it's getting somewhat confusing to find the ones that fail in the github UI, it was definitely not made to scale in that way...

@lloeki lloeki merged commit 693a8f9 into master Nov 22, 2022
@lloeki lloeki deleted the lloeki/expand-aarch64-support branch November 22, 2022 11:28
@github-actions github-actions bot added this to the 1.7.0 milestone Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants