Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PXE: coreos-installer error "end of file before message length reached" #439

Closed
fclerg opened this issue Mar 29, 2020 · 17 comments
Closed

Comments

@fclerg
Copy link

fclerg commented Mar 29, 2020

Host Operating System Version: Proxmox 5.2-12
Target Operating System Version: fedora-coreos-31.20200210.3.0
coreos-installer Version: coreos-installer 0.1.2

Expected Behavior

Successful installation of FCOS

Actual Behavior

coreos-installer-error

The relevant part seems to be :
Caused by: end of file before message length reached

Reproduction Steps

PXE boot. Here is my iPXE config (it is provided by Matchbox to match a configuration with a machine):

	"boot": {
		"kernel": "http://10.20.30.100:8000/image/fedora-coreos-31.20200210.3.0-live-kernel-x86_64",
		"initrd": [
			"http://10.20.30.100:8000/image/fedora-coreos-31.20200210.3.0-live-initramfs.x86_64.img"
		],
		"args": [
			"ip=dhcp",
			"rd.neednet=1",
			"initrd=fedora-coreos-31.20200210.3.0-live-initramfs.x86_64.img",
			"coreos.inst.image_url=http://10.20.30.100:8000/image/fedora-coreos-31.20200210.3.0-metal.x86_64.raw.xz",
			"coreos.inst.ignition_url=http://matchbox.yuzu.local:8080/ignition?uuid=${uuid}\u0026mac=${mac:hexhyp}",
			"coreos.inst.install_dev=sda"
                        "random.trust_cpu=on",
			"console=tty0"
		]
	}

Other Information

  • After it fails I can access the emergency mode for maintenance.
    From there I could confirm that the guest gets its IP config with DHCP and that it has access to the internet.

  • I re-tried the PXE install but always get the same error

  • I didn't find anything obvious about the failure. I am not sure where to look at, based on this "end of file before message length reached" error message.

  • The image I use is fedora-coreos-31.20200210.3.0-metal.x86_64.raw.xz.
    I use a copy of the image in a local web repo and I can confirm its signature and integrity :

# gpg --verify fedora-coreos-31.20200210.3.0-metal.x86_64.raw.xz.sig  fedora-coreos-31.20200210.3.0-metal.x86_64.raw.xz
gpg: Signature made Mon Feb 24 18:03:55 2020 UTC using RSA key ID 3C3359C4
gpg: Good signature from "Fedora (31) <[email protected]>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 7D22 D586 7F2A 4236 474B  F7B8 50CB 390B 3C33 59C4
#
# curl -s https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/31.20200210.3.0/x86_64/meta.json \
>   | jq -r '.images.metal | select(.path=="fedora-coreos-31.20200210.3.0-metal.x86_64.raw.xz") | .sha256'
8ed20e48375ea2c0c09dace2a06d366d80990f97f2a79757dbc81078424725c6
# sha256sum  fedora-coreos-31.20200210.3.0-metal.x86_64.raw.xz
8ed20e48375ea2c0c09dace2a06d366d80990f97f2a79757dbc81078424725c6  fedora-coreos-31.20200210.3.0-metal.x86_64.raw.xz
#
@dustymabe
Copy link
Member

hmm that is interesting for sure. As an experiment could you use the same setup (with 31.20200210.3.0), but try with the following command line (which will download the latest stable from the Fedora hosted repository):

		"args": [
			"ip=dhcp",
			"rd.neednet=1",
			"initrd=fedora-coreos-31.20200210.3.0-live-initramfs.x86_64.img",
			"coreos.inst.ignition_url=http://matchbox.yuzu.local:8080/ignition?uuid=${uuid}\u0026mac=${mac:hexhyp}",
			"coreos.inst.install_dev=sda"
			"coreos.inst.stream=stable",
			"random.trust_cpu=on",
			"console=tty0"
		]

This should give us at least another data point and help us narrow down the field of possible issues.

@fclerg
Copy link
Author

fclerg commented Mar 29, 2020

Thanks @dustymabe! removing the coreos.inst.image_url so that the latest FCOS stable image is downloaded fixed the issue.

@dustymabe
Copy link
Member

Thanks @dustymabe! removing the coreos.inst.image_url so that the latest FCOS stable image is downloaded fixed the issue.

That's good news. Now I'm wondering why it wasn't working for you with locally copied images. Do you want to try again to copy down a raw.xz and raw.xz.sig (maybe of the latest stable this time) and see if you see the problem again?

@fclerg
Copy link
Author

fclerg commented Mar 30, 2020

I did try with "fedora-coreos-31.20200310.3.0-metal.x86_64.raw.xz" but got the same error.

Note that the gpg key ID I imported and used to succesfuly verify the signature locally was "3C3359C4".
But the screenshot I posted above shows a "BAD signature" error and coreos-installer seems to be trying to use a key with ID "50CB390B3C3359C4".

By letting cores-installer download the latest image, I can see that it also does use the "50CB390B3C3359C4" gpg key ID and successfully verifies the signature.

@dustymabe
Copy link
Member

IIUC "3C3359C4" is just a short notation for "50CB390B3C3359C4". "3C3359C4" is the last 8 characters of "50CB390B3C3359C4".

@fclerg
Copy link
Author

fclerg commented Mar 30, 2020

Ok, so the cause of the issue was probably the integrity of the image downloaded in the PXE environment. I'll have a look at the configuration of my local nginx server.
The kernel and initramfs come from the same local repo and seemed alright though.

@lucab
Copy link
Contributor

lucab commented Mar 31, 2020

The relevant part seems to be:
Caused by: end of file before message length reached

This means that the server at 10.20.30.100:8000 is sending a response size value (Content-Length header), but the actual data that it sends is shorter than that.
The client (coreos-installer) receives all the data (OS image) made available from the server, but at the end it detects the size mismatch and halts.

I'm not sure what's the underlying root cause, but the symptoms likely point to a server misconfiguration or a network issue.

@dustymabe
Copy link
Member

@fclerg any updates here. Were you able to investigate your local set up to see what the problem was?

@fclerg
Copy link
Author

fclerg commented Apr 5, 2020

I looked at few things in my local setup but no luck.
Using curl, I can confirm that the Content-Length header matches the downloaded file size :

$ curl -s -v  http://10.20.30.100:8000/image/fedora-coreos-31.20200310.3.0-metal.x86_64.raw.xz --output test
*   Trying 10.20.30.100:8000...
* TCP_NODELAY set
* Connected to 10.20.30.100 (10.20.30.100) port 8000 (#0)
> GET /image/fedora-coreos-31.20200310.3.0-metal.x86_64.raw.xz HTTP/1.1
> Host: 10.20.30.100:8000
> User-Agent: curl/7.66.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.10.3 (Ubuntu)
< Date: Sun, 05 Apr 2020 13:33:15 GMT
< Content-Type: application/x-xz
< Content-Length: 476192788
< Last-Modified: Wed, 25 Mar 2020 21:15:09 GMT
< Connection: keep-alive
< ETag: "5e7bc9dd-1c622014"
< Accept-Ranges: bytes
<
{ [43203 bytes data]
* Connection #0 to host 10.20.30.100 left intact
$
$ ls -l --b=1 test | cut -d " " -f5
476192788
$

In case the issue come from other headers, I got to a point where the response from my local Nginx server matches the one from the Fedora hosted repo:

[core@knode1 ~]$ curl -s  -v  https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/31.20200310.3.0/x86_64/fedora-coreos-31.20200310.3.0-metal.x86_64.raw.xz  --output test
*   Trying 13.249.8.42:443...
* TCP_NODELAY set
* Connected to builds.coreos.fedoraproject.org (13.249.8.42) port 443 (#0)
[ ... I REMOVED CERTIFICATE VALIDATION DETAILS ... ]
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x55a53c905130)
} [5 bytes data]
> GET /prod/streams/stable/builds/31.20200310.3.0/x86_64/fedora-coreos-31.20200310.3.0-metal.x86_64.raw.xz HTTP/2
> Host: builds.coreos.fedoraproject.org
> User-Agent: curl/7.66.0
> Accept: */*
>
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 200
< content-type: application/x-xz
< content-length: 476192788
< date: Sun, 05 Apr 2020 22:35:18 GMT
< last-modified: Wed, 25 Mar 2020 21:15:09 GMT
< etag: "211654e2a2e650ddf0e939e6ca997357-57"
< cache-control: max-age=31536000
< accept-ranges: bytes
< server: AmazonS3
< x-cache: Hit from cloudfront
< via: 1.1 ef76486b8b2194781e7708296c3d455c.cloudfront.net (CloudFront)
< x-amz-cf-pop: CDG53-C1
< x-amz-cf-id: G_sdcbJgCWAAv2dcRcsXY24k3aeO8f-LeQsc7OANI3p1kZntOcCebw==
< age: 62
<
{ [14480 bytes data]
* Connection #0 to host builds.coreos.fedoraproject.org left intact
[core@knode1 ~]$

The only difference I can see at this point is that I am using HTTP locally instead of HTTPS when using the Fedora hosted repo.

@fclerg
Copy link
Author

fclerg commented Apr 10, 2020

I could inspect the downloaded archive from the emergency mode. I couldn't find it there though. Do you know where it is supposed to be ?

@bgilbert
Copy link
Contributor

coreos-installer install streams the archive directly to the target disk and doesn't persist it to a file. You could run coreos-installer download from the emergency shell, but I'm not sure that'll be helpful: it should exercise the same logic, but will delete the downloaded artifacts on error.

@dustymabe
Copy link
Member

@fclerg are you still able to reproduce this issue with the latest released artifacts ?

@fclerg
Copy link
Author

fclerg commented May 9, 2020

I tried with the latest release and the install runs well. Has something been done ?

However this "Read Disk" stage takes around 20 minutes to complete. What is it exactly doing ?

@dustymabe
Copy link
Member

I tried with the latest release and the install runs well. Has something been done ?

That's good to hear, though I don't know of anything specifically that has been done to adress the issue you were seeing. We did do a new release of coreos-installer, but that only made it into the testing stream right now. What exact version of FCOS did you try?

Earlier in the disucssion here we were thinking it could be related to your environment. Maybe an issue in your environment resolved itself?

However this "Read Disk" stage takes around 20 minutes to complete. What is it exactly doing ?

Yeah we have an open issue where we are trying to investigate slow performance of coreos-installer: coreos/coreos-installer#184.

Can you test by downloading the latest testing stream ISO and do an install without --stream or --image-url? This will cause the install to use the contents from the ISO for the install and should be faster (in my experience).

@fclerg
Copy link
Author

fclerg commented May 13, 2020

The problem really seems to have been fixed with the initramfs 31.20200420.3.0. Basically the signature error shows up when coreos-installer fetches a local image with initramfs 31.20200310.3.0 (and before). Here is a recap of some use cases I tried :

+------------------+-----------------+-------------+---------+
| kernel/initramfs |   FCOS image    | image from  | install |
|                  |                 | local repo? |         |
+------------------+-----------------+-------------+---------+
|  31.20200310.3.0 | 31.20200310.3.0 | yes         | FAILURE |
|  31.20200310.3.0 | 31.20200420.3.0 | yes         | FAILURE |
|  31.20200310.3.0 | 31.20200310.3.0 | no          | SUCCESS |
|  31.20200420.3.0 | 31.20200420.3.0 | no          | SUCCESS |
|  31.20200420.3.0 | 31.20200420.3.0 | yes         | SUCCESS |
|  31.20200420.3.0 | 31.20200310.3.0 | yes         | SUCCESS |
|  31.20200420.3.0 | 31.20200420.3.0 | no          | SUCCESS |
|  31.20200420.3.0 | 31.20200310.3.0 | no          | SUCCESS |
+------------------+-----------------+-------------+---------+

It must a combination of both my environment and something that has been done in the latest version.


Removing the --stream or --image-url didn't seem to make any difference in the installation time for me.

@dustymabe
Copy link
Member

The problem really seems to have been fixed with the initramfs 31.20200420.3.0. Basically the signature error shows up when coreos-installer fetches a local image with initramfs 31.20200310.3.0 (and before).

There were a lot of packages that changed from 31.20200310.3.0 to 31.20200420.3.0. I'm guessing it was probably some fix in the coreos-installer package upgrade that helped.

[dustymabe@media fedora-ostree-repo-mirror]$ rpm-ostree --repo=./ db diff 436592e6eb93e899bebab8dbd17514c85be683390ef8bbce8c6d96069ce4c543 b3fc3a3e8513d7e424d0ced1e2517484cb766d238951f2fdec3da2fed3522efb
ostree diff commit from: 436592e6eb93e899bebab8dbd17514c85be683390ef8bbce8c6d96069ce4c543
ostree diff commit to:   b3fc3a3e8513d7e424d0ced1e2517484cb766d238951f2fdec3da2fed3522efb
Upgraded:
  afterburn 4.3.1-2.fc31 -> 4.3.2-1.fc31
  afterburn-dracut 4.3.1-2.fc31 -> 4.3.2-1.fc31
  btrfs-progs 5.4-1.fc31 -> 5.6-1.fc31
  bubblewrap 0.4.0-1.fc31 -> 0.4.1-1.fc31
  c-ares 1.15.0-5.module_f31+7521+8d6677fc -> 1.16.0-1.module_f31+8331+2cfeb415
  conmon 2:2.0.10-2.fc31 -> 2:2.0.15-1.fc31
  containerd 1.2.6-2.20190627gitd68b593.fc31 -> 1.3.3-1.fc31
  containers-common 1:0.1.41-1.fc31 -> 1:0.2.0-1.fc31
  coreos-installer 0.1.2-1.fc31 -> 0.1.3-1.fc31
  coreos-installer-systemd 0.1.2-1.fc31 -> 0.1.3-1.fc31
  coreutils 8.31-6.fc31 -> 8.31-9.fc31
  coreutils-common 8.31-6.fc31 -> 8.31-9.fc31
  crun 0.12.2.1-1.fc31 -> 0.13-2.fc31
  cups-libs 1:2.2.12-3.fc31 -> 1:2.2.12-6.fc31
  cyrus-sasl-gssapi 2.1.27-2.fc31 -> 2.1.27-3.fc31
  cyrus-sasl-lib 2.1.27-2.fc31 -> 2.1.27-3.fc31
  device-mapper 1.02.165-1.fc31 -> 1.02.171-1.fc31
  device-mapper-event 1.02.165-1.fc31 -> 1.02.171-1.fc31
  device-mapper-event-libs 1.02.165-1.fc31 -> 1.02.171-1.fc31
  device-mapper-libs 1.02.165-1.fc31 -> 1.02.171-1.fc31
  dracut 049-27.git20181204.fc31.1 -> 050-26.git20200316.fc31
  dracut-network 049-27.git20181204.fc31.1 -> 050-26.git20200316.fc31
  elfutils-default-yama-scope 0.178-7.fc31 -> 0.179-1.fc31
  elfutils-libelf 0.178-7.fc31 -> 0.179-1.fc31
  elfutils-libs 0.178-7.fc31 -> 0.179-1.fc31
  fedora-gpg-keys 31-1 -> 31-3
  fedora-repos 31-1 -> 31-3
  fedora-repos-ostree 31-1 -> 31-3
  fuse-overlayfs 0.7.5-2.fc31 -> 0.7.8-1.fc31
  fuse-sshfs 3.7.0-2.fc31 -> 3.7.0-3.fc31
  gdisk 1.0.4-5.fc31 -> 1.0.5-1.fc31
  git-core 2.24.1-1.fc31 -> 2.25.3-1.fc31
  glib2 2.62.5-1.fc31 -> 2.62.6-1.fc31
  glibc 2.30-10.fc31 -> 2.30-11.fc31
  glibc-all-langpacks 2.30-10.fc31 -> 2.30-11.fc31
  glibc-common 2.30-10.fc31 -> 2.30-11.fc31
  gnutls 3.6.11-1.fc31 -> 3.6.13-1.fc31
  grub2-common 1:2.02-105.fc31 -> 1:2.02-107.fc31
  grub2-efi-x64 1:2.02-105.fc31 -> 1:2.02-107.fc31
  grub2-pc 1:2.02-105.fc31 -> 1:2.02-107.fc31
  grub2-pc-modules 1:2.02-105.fc31 -> 1:2.02-107.fc31
  grub2-tools 1:2.02-105.fc31 -> 1:2.02-107.fc31
  grub2-tools-extra 1:2.02-105.fc31 -> 1:2.02-107.fc31
  grub2-tools-minimal 1:2.02-105.fc31 -> 1:2.02-107.fc31
  ignition 2.1.1-5.git40c0b57.fc31 -> 2.2.1-3.git2d3ff58.fc31
  kernel 5.5.8-200.fc31 -> 5.5.17-200.fc31
  kernel-core 5.5.8-200.fc31 -> 5.5.17-200.fc31
  kernel-modules 5.5.8-200.fc31 -> 5.5.17-200.fc31
  libgcc 9.2.1-1.fc31 -> 9.3.1-2.fc31
  libgomp 9.2.1-1.fc31 -> 9.3.1-2.fc31
  libldb 2.0.8-1.fc31 -> 2.0.9-1.fc31
  libnghttp2 1.40.0-2.module_f31+7692+42f50940 -> 1.40.0-2.module_f31+8198+a4049931
  libpcap 14:1.9.1-1.fc31 -> 14:1.9.1-2.fc31
  librepo 1.11.1-1.fc31 -> 1.11.3-1.fc31
  libsmbclient 2:4.11.6-0.fc31 -> 2:4.11.7-0.fc31
  libssh 0.9.3-1.fc31 -> 0.9.4-2.fc31
  libssh-config 0.9.3-1.fc31 -> 0.9.4-2.fc31
  libstdc++ 9.2.1-1.fc31 -> 9.3.1-2.fc31
  libwbclient 2:4.11.6-0.fc31 -> 2:4.11.7-0.fc31
  libxcrypt 4.4.15-1.fc31 -> 4.4.16-1.fc31
  linux-firmware 20200122-105.fc31 -> 20200316-106.fc31
  linux-firmware-whence 20200122-105.fc31 -> 20200316-106.fc31
  lmdb-libs 0.9.23-3.fc31 -> 0.9.24-1.fc31
  lvm2 2.03.06-1.fc31 -> 2.03.09-1.fc31
  lvm2-libs 2.03.06-1.fc31 -> 2.03.09-1.fc31
  mdadm 4.1-1.fc31 -> 4.1-4.fc31
  ostree 2019.5-2.fc31 -> 2020.3-2.fc31
  ostree-libs 2019.5-2.fc31 -> 2020.3-2.fc31
  pcre 8.43-3.fc31 -> 8.44-1.fc31
  pcre2 10.34-7.fc31 -> 10.34-9.fc31
  podman 2:1.8.1-2.fc31 -> 2:1.9.0-1.fc31
  podman-plugins 2:1.8.1-2.fc31 -> 2:1.9.0-1.fc31
  rpm-ostree 2020.1-1.fc31 -> 2020.1.21.ge9011530-2.fc31
  rpm-ostree-libs 2020.1-1.fc31 -> 2020.1.21.ge9011530-2.fc31
  samba-client-libs 2:4.11.6-0.fc31 -> 2:4.11.7-0.fc31
  samba-common 2:4.11.6-0.fc31 -> 2:4.11.7-0.fc31
  samba-common-libs 2:4.11.6-0.fc31 -> 2:4.11.7-0.fc31
  selinux-policy 3.14.4-49.fc31 -> 3.14.4-50.fc31
  selinux-policy-targeted 3.14.4-49.fc31 -> 3.14.4-50.fc31
  shadow-utils 2:4.6-17.fc31 -> 2:4.6-18.fc31
  skopeo 1:0.1.41-1.fc31 -> 1:0.2.0-1.fc31
  slirp4netns 0.4.0-20.1.dev.gitbbd6f25.fc31 -> 1.0.0-1.fc31
  sudo 1.9.0-0.1.b1.fc31 -> 1.9.0-0.1.b4.fc31
  systemd 243.7-1.fc31 -> 243.8-1.fc31
  systemd-container 243.7-1.fc31 -> 243.8-1.fc31
  systemd-libs 243.7-1.fc31 -> 243.8-1.fc31
  systemd-pam 243.7-1.fc31 -> 243.8-1.fc31
  systemd-rpm-macros 243.7-1.fc31 -> 243.8-1.fc31
  systemd-udev 243.7-1.fc31 -> 243.8-1.fc31
  vim-minimal 2:8.2.348-1.fc31 -> 2:8.2.587-1.fc31
  zincati 0.0.6-1.fc31 -> 0.0.9-1.fc31
Removed:
  dhcp-client-12:4.4.1-19.fc31.x86_64
  dhcp-common-12:4.4.1-19.fc31.noarch
  ipcalc-0.3.0-1.fc31.x86_64
  libmodulemd1-1.8.16-1.fc31.x86_64
  whois-nls-5.5.6-1.fc31.noarch
Added:
  NetworkManager-tui-1:1.20.10-1.fc31.x86_64
  libmodulemd-2.9.3-1.fc31.x86_64
  libslirp-4.1.0-1.fc31.x86_64
  libsss_sudo-2.2.3-13.fc31.x86_64
  newt-0.52.21-2.fc31.x86_64
  pcre2-syntax-10.34-9.fc31.noarch
  slang-2.3.2-6.fc31.x86_64

Removing the --stream or --image-url didn't seem to make any difference in the installation time for me.

You'd have to use 31.20200505.2.0. Are you sure you were doing an "offline install"? It turns out that there is an --offline flag you can pass to force it. It looks like this: coreos/coreos-installer#187 (comment)

@dustymabe
Copy link
Member

dustymabe commented May 14, 2020

Since this issue has been resolved one way or another I'll close it out. Feel free to continue the conversation, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants