Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

拉取https://registry.cn-zhangjiakou.aliyuncs.com 镜像报错/卡死 #1065

Closed
anjia0532 opened this issue Feb 14, 2022 · 4 comments · Fixed by #1067, #1069, #1073 or #1074
Closed

拉取https://registry.cn-zhangjiakou.aliyuncs.com 镜像报错/卡死 #1065

anjia0532 opened this issue Feb 14, 2022 · 4 comments · Fixed by #1067, #1069, #1073 or #1074

Comments

@anjia0532
Copy link
Contributor

anjia0532 commented Feb 14, 2022

通过 https://github.com/dragonflyoss/helm-charts/tree/main/charts/dragonfly 0.5.39 (docker 版本v2.0.2-rc.5) 安装到k8s集群(rke2 https://github.com/rancher/rke2/releases/tag/v1.21.7+rke2r2, Containerd 1.4.12)

values.yaml

cdn:
  replicas: 1
  config:
    console: true
    verbose: true
containerRuntime:
  containerd:
    configPathDir: /var/lib/rancher/rke2/agent/etc/containerd
    enable: true
    injectConfigPath: false
    registries:
    - https://registry.cn-zhangjiakou.aliyuncs.com
dfdaemon:
  replicas: 1
  config:
    console: true
    proxy:
      registryMirror:
        url: https://registry.cn-zhangjiakou.aliyuncs.com
    verbose: true
manager:
  replicas: 1
  config:
    console: true
    verbose: true
scheduler:
  replicas: 1
  config:
    console: true
    verbose: true
mysql:
  enable: true
redis:
  enable: true

config.toml(/var/lib/rancher/rke2/agent/etc/containerd/config.toml)

[plugins.opt]
  path = "/var/lib/rancher/rke2/agent/containerd"

[plugins.cri]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = false
  sandbox_image = "index.docker.io/rancher/pause:3.6"

[plugins.cri.containerd]
  disable_snapshot_annotations = true
  snapshotter = "overlayfs"

[plugins.cri.containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins.cri.registry.mirrors]

[plugins.cri.registry.mirrors."docker.io"]
  endpoint = ["https://xxx.mirror.aliyuncs.com"]

[plugins.cri.registry.mirrors."index.docker.io"]
  endpoint = ["https://xxx.mirror.aliyuncs.com"]

[plugins.cri.registry.mirrors."registry.cn-zhangjiakou.aliyuncs.com"]
  endpoint = ["http://127.0.0.1:65001", "https://registry.cn-zhangjiakou.aliyuncs.com"]
[plugins.cri.registry.configs."registry.cn-qingdao.aliyuncs.com".auth]
  username = "xxx"
  password = "xxx"
[plugins.cri.registry.configs."registry.cn-zhangjiakou.aliyuncs.com".auth]
  username = "xxxx"
  password = "xxxx"

提前将某个镜像从docker hub 转移到 registry.cn-zhangjiakou.aliyuncs.com,注意将xxx替换成实际命名空间(https://cr.console.aliyun.com/cn-zhangjiakou/instance/namespaces 通过这里创建命名空间)

docker login registry.cn-zhangjiakou.aliyuncs.com
docker pull nacos/nacos-server:v2.0.4
docker tag nacos/nacos-server:v2.0.4 registry.cn-zhangjiakou.aliyuncs.com/xxx/nacos-server:v2.0.4
docker push registry.cn-zhangjiakou.aliyuncs.com/xxx/nacos-server:v2.0.4

第一次,是可以成功的,但是看daemon是报401,应该是走failback了,23秒下完
1

sudo /var/lib/rancher/rke2/bin/crictl --config=/var/lib/rancher/rke2/agent/etc/crictl.yaml pull registry.cn-zhangjiakou.aliyuncs.com/xxx/nacos-server:v2.0.4
Image is up to date for sha256:ea54f31c46e4720fb558829ffa41b2db6c2b6b6502221abee15fed2294bed664

real    0m39.154s
user    0m0.018s
sys     0m0.026s

清理掉本地缓存后,重新拉取,就死循环,也不报错,也不成功,一直僵死

sudo /var/lib/rancher/rke2/bin/crictl --config=/var/lib/rancher/rke2/agent/etc/crictl.yaml rmi --prune

sudo /var/lib/rancher/rke2/bin/crictl --config=/var/lib/rancher/rke2/agent/etc/crictl.yaml pull registry.cn-zhangjiakou.aliyuncs.com/xxx/nacos-server:v2.0.4

daemon就疯狂刷屏
2

cdn节点也疯狂刷屏
3

schedule 也是疯狂刷屏
4

5

@gaius-qi
Copy link
Member

麻烦上传下各端完整日志。

@anjia0532
Copy link
Contributor Author

两次拉取镜像的日志都在一个文件里

cdn.log
scheduler.log
dfdaemon.log

@gaius-qi
Copy link
Member

两个问题:

  1. 私有镜像拉取 401.
  2. cdn peer 下载失败情况下,scheduler 没有将该 peer 状态设置成失败。

@anjia0532
Copy link
Contributor Author

  1. containerd config.toml 需要配置127.0.0.1:65001的auth,而不是registry的auth(怀疑是containerd是匹配host往后传的auth信息,可以通过 containerd log 来确认)
  2. 使用 v2.0.2-rc.6 +以上版本即可

该问题已修复,我将关闭。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment