Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad packages/x86_64/APKINDEX can cause hang #1645

Open
smoser opened this issue Nov 14, 2024 · 2 comments
Open

bad packages/x86_64/APKINDEX can cause hang #1645

smoser opened this issue Nov 14, 2024 · 2 comments

Comments

@smoser
Copy link
Contributor

smoser commented Nov 14, 2024

This is quite possibly in the realm of user error or "don't do that".

I got my wolfi-dev/os tree into a state that would not build packages.

Here should be enough information to recreate.

Note: ncurses does depend on itself.

$ melange version | grep ^[A-Z]
GitVersion:    v0.15.7
GitCommit:     997f9fd699767f784c2879272b12546cbdb709cc
GitTreeState:  clean
BuildDate:     '2024-11-14T01:52:15Z'
GoVersion:     go1.23.3
Compiler:      gc
Platform:      linux/amd64

$ git log HEAD^.. --oneline  --no-decorate 
2e01d074b py3-flask/3.1.0 package update (#34105)

$ make clean
$ make package/ncurses
# ... happily builds ...
...
2024/11/14 10:18:53 INFO wrote packages/x86_64/ncurses-terminfo-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO generating apk index from packages in packages/x86_64
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-doc-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-dev-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-terminfo-base-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-static-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-terminfo-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO processing package packages/x86_64/ncurses-6.5_p20241006-r4.apk
2024/11/14 10:18:53 INFO updating index at packages/x86_64/APKINDEX.tar.gz with new packages: [ncurses-6.5_p20241006-r4 ncurses-static-6.5_p20241006-r4 ncurses-dev-6.5_p20241006-r4 ncurses-doc-6.5_p20241006-r4 ncurses-terminfo-base-6.5_p20241006-r4 ncurses-terminfo-6.5_p20241006-r4]
2024/11/14 10:18:53 INFO signing apk index at packages/x86_64/APKINDEX.tar.gz
2024/11/14 10:18:53 INFO signing index packages/x86_64/APKINDEX.tar.gz with key local-melange.rsa
2024/11/14 10:18:53 INFO appending signature RSA to index packages/x86_64/APKINDEX.tar.gz
2024/11/14 10:18:53 INFO writing signed index to packages/x86_64/APKINDEX.tar.gz
...

# now break the APKINDEX with 'rm' rather than 'make clean'
# probably because i wanted to build again after a change.
$ rm packages/x86_64/ncurses-*

# now try again to build
$ make package/ncurses
...
2024/11/14 10:20:56 INFO installing build-base (1-r8)
2024/11/14 10:20:56 INFO installing libcrypt1 (2.40-r3)
2024/11/14 10:20:56 INFO installing busybox (1.37.0-r0)
<hang here forever>
<give up, hit ctrl-c>
2024/11/14 10:21:35 INFO deleting guest dir /home/user/tmp/melange-guest-3021240161
2024/11/14 10:21:35 INFO deleting workspace dir /home/user/tmp/melange-workspace-2950730389
2024/11/14 10:21:35 ERRO failed to build package: unable to build guest: 
   unable to generate image: 
     installing apk packages: 
       installing packages: 
         expanding ncurses-terminfo-base (ver:6.5_p20241006-r4 arch:x86_64): 
         fetching package "ncurses-terminfo-base": 
            failed to read repository package apk /home/user/src/wolfi-os/packages/x86_64/ncurses-terminfo-base-6.5_p20241006-r4.apk:
              open /home/user/src/wolfi-os/packages/x86_64/ncurses-terminfo-base-6.5_p20241006-r4.apk: 
                  no such file or directory: context canceled
make[1]: *** [Makefile:125: packages/x86_64/ncurses-6.5_p20241006-r4.apk] Error 1

I prettied-up the output a bit of the final ERROR message (after the ctrl-c). It actually does give a reasonable error, but I dont' know that I've ever read error messages of a program after hitting ctrl-c, so I feel justified in my lost time debugging why it was hung.

@smoser
Copy link
Contributor Author

smoser commented Nov 14, 2024

OK. Here is a simple recreate that throws out the red-herring of a self-dependent package or the bootstrap archive.

  • test-me-dep.yaml

    package:
      name: test-me-dep
      version: 1.0
      epoch: 0
    
    environment:
      contents:
        packages:
          - busybox
    
    pipeline:
      - runs: |
          printf '#!/bin/busybox sh\necho hello world\n' > greet-me
          install -D -m755 -t "${{targets.contextdir}}/usr/bin" greet-me
  • test-me.yaml

    package:
      name: test-me
      version: 1.0
      epoch: 0
    
    environment:
      contents:
        packages:
          - busybox
          - test-me-dep
    
    pipeline:
      - runs: |
          mkdir ${{targets.contextdir}}/etc
          greet-me > ${{targets.contextdir}}/etc/greeting

Then from the wolfi-dev/os tree (commit listed above)

$ rm -Rf ~/.cache/dev.chainguard.go-apk/ 
$ make clean
$ make package/test-me-dep
$ rm packages/x86_64/test-me-dep-1.0-r0.apk
$ make package/test-me
melange build test-me.yaml --repository-append /home/user/src/wolfi-os/packages 
   --keyring-append local-melange.rsa.pub --signing-key local-melange.rsa 
   --arch x86_64 --env-file build-x86_64.env --namespace wolfi 
   --license 'Apache-2.0' 
   --git-repo-url 'https://github.com/wolfi-dev/os' 
   --generate-index false  
   --pipeline-dir ./pipelines/  
   -k https://packages.wolfi.dev/os/wolfi-signing.rsa.pub 
   -r https://packages.wolfi.dev/os
2024/11/14 10:37:19 INFO git commit for build config not provided, attempting to detect automatically
2024/11/14 10:37:19 WARN SOURCE_DATE_EPOCH is specified but empty, setting it to 1969-12-31 19:00:00 -0500 EST
2024/11/14 10:37:19 INFO melange is building:
2024/11/14 10:37:19 INFO   configuration file: test-me.yaml
2024/11/14 10:37:19 INFO   workspace dir: /home/user/tmp/melange-workspace-496249841
2024/11/14 10:37:19 INFO evaluating pipelines for package requirements
2024/11/14 10:37:19 INFO --cache-dir ./melange-cache/ not a dir; skipping
2024/11/14 10:37:19 INFO populating workspace /home/user/tmp/melange-workspace-496249841 from ./test-me/
2024/11/14 10:37:19 INFO building workspace in '/home/user/tmp/melange-guest-3256530904' with apko
2024/11/14 10:37:19 INFO setting apk repositories: [/home/user/src/wolfi-os/packages https://packages.wolfi.dev/os]
2024/11/14 10:37:19 INFO image configuration:
2024/11/14 10:37:19 INFO   contents:
2024/11/14 10:37:19 INFO     build repositories: []
2024/11/14 10:37:19 INFO     runtime repositories: []
2024/11/14 10:37:19 INFO     keyring:      []
2024/11/14 10:37:19 INFO     packages:     [busybox test-me-dep]
2024/11/14 10:37:19 INFO   accounts:
2024/11/14 10:37:19 INFO     runas:  
2024/11/14 10:37:19 INFO     users:
2024/11/14 10:37:19 INFO       - uid=1000(build) gid=1000
2024/11/14 10:37:19 INFO     groups:
2024/11/14 10:37:19 INFO       - gid=1000(build) members=[build]
2024/11/14 10:37:19 INFO auth configured for: []
2024/11/14 10:37:19 INFO installing ca-certificates-bundle (20241010-r2)
2024/11/14 10:37:19 INFO installing wolfi-baselayout (20230201-r15)
2024/11/14 10:37:19 INFO installing glibc (2.40-r3)
2024/11/14 10:37:19 INFO installing ld-linux (2.40-r3)
2024/11/14 10:37:19 INFO installing libgcc (14.2.0-r5)
2024/11/14 10:37:19 INFO installing glibc-locale-posix (2.40-r3)
2024/11/14 10:37:19 INFO installing libxcrypt (4.4.36-r8)
2024/11/14 10:37:19 INFO installing libcrypt1 (2.40-r3)
2024/11/14 10:37:19 INFO installing busybox (1.37.0-r0)

<hang here>
^C
2024/11/14 10:38:42 INFO deleting guest dir /home/user/tmp/melange-guest-3256530904
2024/11/14 10:38:42 INFO deleting workspace dir /home/user/tmp/melange-workspace-496249841
2024/11/14 10:38:42 ERRO failed to build package: 
  unable to build guest: unable to generate image: 
    installing apk packages: installing packages: 
      expanding test-me-dep (ver:1.0-r0 arch:x86_64):
        fetching package "test-me-dep": 
          failed to read repository package apk /home/user/src/wolfi-os/packages/x86_64/test-me-dep-1.0-r0.apk: 
            open /home/user/src/wolfi-os/packages/x86_64/test-me-dep-1.0-r0.apk: 
              no such file or directory: context canceled
make[1]: *** [Makefile:125: packages/x86_64/test-me-1.0-r0.apk] Error 1
make: *** [Makefile:115: package/test-me] Interrupt

@smoser
Copy link
Contributor Author

smoser commented Nov 14, 2024

I think the problem is https://github.com/chainguard-dev/apko/blob/b93f0a2bc55f4dc8a07c373fc338e37dec193a24/pkg/apk/apk/implementation.go#L703 . On error of expandPackage, no close is ever done and thus no signal communicated.

smoser added a commit to smoser/apko that referenced this issue Nov 14, 2024
This problem was found in melange
 chainguard-dev/melange#1645

Any time a download failed, we did not communicate the error
up the stack.

Signed-off-by: Scott Moser <[email protected]>
smoser added a commit to smoser/apko that referenced this issue Nov 14, 2024
This problem was found in melange
 chainguard-dev/melange#1645

Any time a download failed, we did not communicate the error
up the stack.

Signed-off-by: Scott Moser <[email protected]>
smoser added a commit to smoser/apko that referenced this issue Nov 14, 2024
This problem was found in melange
 chainguard-dev/melange#1645

Any time a download failed, we would hang waiting for a close
that would never occur.

Signed-off-by: Scott Moser <[email protected]>
smoser added a commit to chainguard-dev/apko that referenced this issue Nov 14, 2024
Make sure to close channel in InstallPackages, cleanup in CalculateWorld
    
This problem was found in melange
chainguard-dev/melange#1645
    
Any time a download failed, we would hang waiting for a close
that would never occur.

Signed-off-by: Scott Moser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant