Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mtls_br] enable_meta_ssl,br restore failed #263

Closed
jinyingsunny opened this issue Sep 8, 2023 · 3 comments
Closed

[mtls_br] enable_meta_ssl,br restore failed #263

jinyingsunny opened this issue Sep 8, 2023 · 3 comments
Assignees
Labels
affects/master PR/issue: this bug affects master version. process/fixed Process of bug ready-for-testing Progress: ready for the CI test severity/none Severity of bug type/bug Type: something is unexpected WeBank
Milestone

Comments

@jinyingsunny
Copy link

as title:
报错如下:

root@k8s-master:/home/sunny.liu/k8s_file/br_bakeup# kubectl -n nebula apply -f restore.yaml
nebularestore.apps.nebula-graph.io/restore1 created

root@k8s-master:/home/sunny.liu/k8s_file# kubectl -n nebula get rt
NAME       STATUS   STARTED   COMPLETED   AGE
restore1            91m                   91m

root@k8s-master:/home/sunny.liu/k8s_file# kubectl -n nebula get rt restore1 -o yaml
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaRestore
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps.nebula-graph.io/v1alpha1","kind":"NebulaRestore","metadata":{"annotations":{},"name":"restore1","namespace":"nebula"},"spec":{"br":{"backupName":"BACKUP_2023_09_08_09_52_32","clusterName":"nebulazcert","concurrency":5,"s3":{"bucket":"test-qa","endpoint":"http://192.168.8.202:32000","region":"us-west-2","secretName":"aws-s3-secret"}}}}
  creationTimestamp: "2023-09-08T10:45:59Z"
  generation: 1
  name: restore1
  namespace: nebula
  resourceVersion: "11566839"
  uid: 1abdc0de-9eb5-414f-ada8-3ba5371a3da7
spec:
  br:
    backupName: BACKUP_2023_09_08_09_52_32
    clusterName: nebulazcert
    concurrency: 5
    s3:
      bucket: test-qa
      endpoint: http://192.168.8.202:32000
      region: us-west-2
      secretName: aws-s3-secret
status:
  clusterName: ng9mrm
  conditions:
  - lastTransitionTime: "2023-09-08T10:47:20Z"
    message: 'restore metad service ng9mrm-metad-0.ng9mrm-metad-headless.nebula.svc.cluster.local:9559
      failed: EOF'
    reason: ExecuteFailed
    status: "True"
    type: Failed
  timeStarted: "2023-09-08T10:45:59Z"

resotre的配置:

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaRestore
metadata:
  name: restore1
  namespace: nebula-sc
spec:
  br:
    clusterName: nebula
    backupName: "BACKUP_2023_08_10_10_08_44"
    concurrency: 5
    s3:
      region: "us-west-2"
      bucket: "test-qa"
      endpoint: "http://192.168.8.202:32000"
      secretName: "aws-s3-secret"

nebula 集群配置:

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  annotations:
  creationTimestamp: "2023-09-05T11:37:16Z"
  generation: 14
  name: nebulazcert
  namespace: nebula
  resourceVersion: "11556204"
  uid: a09b9b3a-56ae-4648-aac1-12ae44513178
spec:
  agent:
    image: reg.vesoft-inc.com/cloud-dev/nebula-agent
    version: v3.5.1-sc
  alpineImage: reg.vesoft-inc.com/cloud-dev/nebula-alpine:latest
  enableBR: true
  enablePVReclaim: true
  exporter:
    image: vesoft/nebula-stats-exporter
    maxRequests: 20
    replicas: 1
    version: latest
  graphd:
    config:
      accept_partial_success: "true"
      ca_client_path: certs/root.crt
      ca_path: certs/root.crt
      cert_path: certs/server.crt
      enable_intra_zone_routing: "true"
      enable_meta_ssl: "true"
      key_path: certs/server.key
      logtostderr: "1"
      redirect_stdout: "false"
      stderrthreshold: "0"
      stick_to_intra_zone_on_failure: "true"
      timestamp_in_logfile_name: "false"
    image: reg.vesoft-inc.com/rc/nebula-graphd-ent
    initContainers:
    - args:
      - cp /certs/* /credentials/
      command:
      - /bin/sh
      - -c
      image: reg.vesoft-inc.com/cloud-dev/nebula-certs:v0.1
      imagePullPolicy: Always
      name: init-auth-sidecar
      volumeMounts:
      - mountPath: /credentials
        name: credentials
    replicas: 3
    resources:
      limits:
        cpu: "1"
        memory: 500Mi
      requests:
        cpu: 200m
        memory: 400Mi
    sidecarContainers:
    - image: reg.vesoft-inc.com/cloud-dev/nebula-certs:latest
      imagePullPolicy: Always
      name: auth-sidecar
      volumeMounts:
      - mountPath: /credentials
        name: credentials
    version: v3.5.0-sc
    volumeMounts:
    - mountPath: /usr/local/nebula/certs
      name: credentials
    volumes:
    - emptyDir:
        medium: Memory
      name: credentials
  imagePullPolicy: Always
  imagePullSecrets:
  - name: image-nebula-ent-sc-secret
  metad:
    config:
      ca_client_path: certs/root.crt
      ca_path: certs/root.crt
      cert_path: certs/server.crt
      enable_meta_ssl: "true"
      key_path: certs/server.key
      timestamp_in_logfile_name: "false"
      v: "2"
      zone_list: us-east-2a,us-east-2b,us-east-2c
    dataVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path
    image: reg.vesoft-inc.com/rc/nebula-metad-ent
    initContainers:
    - args:
      - cp /certs/* /credentials/
      command:
      - /bin/sh
      image: reg.vesoft-inc.com/cloud-dev/nebula-certs:v0.1
      imagePullPolicy: Always
      name: init-auth-sidecar
      volumeMounts:
      - mountPath: /credentials
        name: credentials
    licenseManagerURL: nebula-license-manager.nebula-license-manager.svc.cluster.local:9119
    logVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path
    replicas: 1
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 500Mi
    sidecarContainers:
    - image: reg.vesoft-inc.com/cloud-dev/nebula-certs:latest
      imagePullPolicy: Always
      name: auth-sidecar
      volumeMounts:
      - mountPath: /credentials
        name: credentials
    version: v3.5.0-sc
    volumeMounts:
    - mountPath: /usr/local/nebula/certs
      name: credentials
    volumes:
    - emptyDir:
        medium: Memory
      name: credentials
  nodeSelector:
    nebula: cloud
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: default-scheduler
  sslCerts:
    caCert: tls.crt
    caSecret: ca-s1
    clientCACert: ca.crt
    clientCert: tls.crt
    clientKey: tls.key
    clientSecret: client-s1
    insecureSkipVerify: true
    serverCert: tls.crt
    serverKey: tls.key
  storaged:
    config:
      ca_client_path: certs/root.crt
      ca_path: certs/root.crt
      cert_path: certs/server.crt
      enable_meta_ssl: "true"
      key_path: certs/server.key
      timestamp_in_logfile_name: "false"
    dataVolumeClaims:
    - resources:
        requests:
          storage: 1Gi
      storageClassName: local-path
    enableAutoBalance: true
    image: reg.vesoft-inc.com/rc/nebula-storaged-ent
    initContainers:
    - args:
      - cp /certs/* /credentials/
      command:
      - /bin/sh
      - -c
      image: reg.vesoft-inc.com/cloud-dev/nebula-certs:v0.1
      imagePullPolicy: Always
      name: init-auth-sidecar
      volumeMounts:
      - mountPath: /credentials
        name: credentials
    logVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path
    replicas: 12
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 500Mi
    sidecarContainers:
    - image: reg.vesoft-inc.com/cloud-dev/nebula-certs:latest
      imagePullPolicy: Always
      name: auth-sidecar
      volumeMounts:
      - mountPath: /credentials
        name: credentials
    version: v3.5.0-sc
    volumeMounts:
    - mountPath: /usr/local/nebula/certs
      name: credentials
    volumes:
    - emptyDir:
        medium: Memory
      name: credentials
  topologySpreadConstraints:
  - topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: ScheduleAnyway
status:
  conditions:
  - lastTransitionTime: "2023-09-08T09:39:13Z"
    lastUpdateTime: "2023-09-08T09:39:13Z"
    message: Nebula cluster is running
    reason: Ready
    status: "True"
    type: Ready
  graphd:
    phase: Running
    version: v3.5.0-sc
    workload:
      availableReplicas: 3
      collisionCount: 0
      currentReplicas: 3
      currentRevision: nebulazcert-graphd-779b944956
      observedGeneration: 33
      readyReplicas: 3
      replicas: 3
      updateRevision: nebulazcert-graphd-779b944956
      updatedReplicas: 3
  metad:
    phase: Running
    version: v3.5.0-sc
    workload:
      availableReplicas: 1
      collisionCount: 0
      currentReplicas: 1
      currentRevision: nebulazcert-metad-7f78dd6fb7
      observedGeneration: 13
      readyReplicas: 1
      replicas: 1
      updateRevision: nebulazcert-metad-7f78dd6fb7
      updatedReplicas: 1
  observedGeneration: 14
  storaged:
    phase: Update
    version: v3.5.0-sc
    workload:
      availableReplicas: 12
      collisionCount: 0
      currentReplicas: 12
      currentRevision: nebulazcert-storaged-75f5d8cb6c
      observedGeneration: 43
      readyReplicas: 12
      replicas: 12
      updateRevision: nebulazcert-storaged-75f5d8cb6c
      updatedReplicas: 12
  version: 3.5.0-sc-ent

Your Environments (required)

operator :reg.vesoft-inc.com/cloud-dev/nebula-operator:snap-1.12

@jinyingsunny jinyingsunny added type/bug Type: something is unexpected affects/master PR/issue: this bug affects master version. labels Sep 8, 2023
@github-actions github-actions bot added the severity/none Severity of bug label Sep 8, 2023
@jinyingsunny jinyingsunny added this to the v1.6.x milestone Sep 19, 2023
@kqzh kqzh self-assigned this Sep 27, 2023
@kqzh
Copy link
Contributor

kqzh commented Oct 9, 2023

no recurrence, restore success
image

known Issue
image

@MuYiYong MuYiYong modified the milestones: v1.6.x, v3.6.0 Oct 12, 2023
@Sophie-Xie Sophie-Xie modified the milestones: v3.6.0, v1.7.x Oct 12, 2023
@MegaByte875 MegaByte875 modified the milestones: v1.7.x, v1.8.x Oct 16, 2023
@jinyingsunny
Copy link
Author

update: 10.24 recurrence the problem, @kqzh is following.

@kqzh kqzh mentioned this issue Oct 25, 2023
3 tasks
@kqzh
Copy link
Contributor

kqzh commented Oct 25, 2023

update: 10.24 recurrence the problem, @kqzh is following.

原因是 operator 连接 meta 时,如果发现不是 leader, 会触发重连,重连时没带上tls认证信息,导致连接失败,导致恢复失败,已修复

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/master PR/issue: this bug affects master version. process/fixed Process of bug ready-for-testing Progress: ready for the CI test severity/none Severity of bug type/bug Type: something is unexpected WeBank
Projects
None yet
Development

No branches or pull requests

5 participants