Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【扩缩容-sc】当存在3-replica的space,修改storage的副本数是2,会报错;再改回3,发现storage的cm内容不对了 #268

Closed
jinyingsunny opened this issue Sep 13, 2023 · 2 comments
Assignees
Labels
affects/none PR/issue: this bug affects none version. process/done Process of bug severity/none Severity of bug type/bug Type: something is unexpected

Comments

@jinyingsunny
Copy link

如题,两边数据不一致了

# kubectl -n nebula get cm nebulazcert-storaged-zone -oyaml
apiVersion: v1
data:
  nebulazcert-storaged: ""
  nebulazcert-storaged-0: us-east-2c
  nebulazcert-storaged-1: us-east-2b
kind: ConfigMap

console中的内容:

(root@nebula) [baske3s]> show hosts
+---------------------------------------------------------------------------------+------+----------+--------------+----------------------------+------------------------------+----------------+
| Host                                                                            | Port | Status   | Leader count | Leader distribution        | Partition distribution       | Version        |
+---------------------------------------------------------------------------------+------+----------+--------------+----------------------------+------------------------------+----------------+
| "nebulazcert-storaged-0.nebulazcert-storaged-headless.nebula.svc.cluster.local" | 9779 | "ONLINE" | 8            | "baske3s:4, baske3s_int:4" | "baske3s:12, baske3s_int:12" | "3.5.0-sc-ent" |
| "nebulazcert-storaged-1.nebulazcert-storaged-headless.nebula.svc.cluster.local" | 9779 | "ONLINE" | 8            | "baske3s:4, baske3s_int:4" | "baske3s:12, baske3s_int:12" | "3.5.0-sc-ent" |
| "nebulazcert-storaged-2.nebulazcert-storaged-headless.nebula.svc.cluster.local" | 9779 | "ONLINE" | 8            | "baske3s:4, baske3s_int:4" | "baske3s:12, baske3s_int:12" | "3.5.0-sc-ent" |
+---------------------------------------------------------------------------------+------+----------+--------------+----------------------------+------------------------------+----------------+
Got 3 rows (time spent 1.756ms/3.070185ms)

Wed, 13 Sep 2023 16:10:56 CST

(root@nebula) [baske3s]> show zones
+--------------+---------------------------------------------------------------------------------+------+
| Name         | Host                                                                            | Port |
+--------------+---------------------------------------------------------------------------------+------+
| "us-east-2a" | "nebulazcert-storaged-2.nebulazcert-storaged-headless.nebula.svc.cluster.local" | 9779 |
| "us-east-2b" | "nebulazcert-storaged-1.nebulazcert-storaged-headless.nebula.svc.cluster.local" | 9779 |
| "us-east-2c" | "nebulazcert-storaged-0.nebulazcert-storaged-headless.nebula.svc.cluster.local" | 9779 |
+--------------+---------------------------------------------------------------------------------+------+
Got 3 rows (time spent 1.858ms/4.44409ms)

Wed, 13 Sep 2023 16:19:27 CST

nebula-operator有报错:

E0913 08:07:04.439568       1 pvc.go:64] get PVC [nebula/storaged-log-nebulazcert-storaged-3] failed: persistentvolumeclaims "storaged-log-nebulazcert-storaged-3" not found
E0913 08:07:04.441791       1 pvc.go:64] get PVC [nebula/storaged-data-nebulazcert-storaged-3] failed: persistentvolumeclaims "storaged-data-nebulazcert-storaged-3" not found

Your Environments (required)

nebula-operator:reg.vesoft-inc.com/cloud-dev/nebula-operator:snap-1.13
pushed time 9/12/23, 10:20 PM

How To Reproduce(required)

1. 存在3个节点的space;
2. # kubectl -n nebula edit nc nebulazcert 修改副本数,从3改成2
3. 查看数据后,再从2给成3

Expected behavior
kubectl -n nebula get cm nebulazcert-storaged-zone -oyaml 中storage所属zone的配置,和实际保持一致;
提交将storage的副本数,改成2时,直接报非法,不生效;

@jinyingsunny jinyingsunny added the type/bug Type: something is unexpected label Sep 13, 2023
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Sep 13, 2023
@jinyingsunny jinyingsunny changed the title 【zone扩缩容】当存在3-replica的space,修改storage的副本数是2,会报错;再改回3,发现storage的cm内容不对了 【扩缩容-sc】当存在3-replica的space,修改storage的副本数是2,会报错;再改回3,发现storage的cm内容不对了 Sep 13, 2023
@MegaByte875
Copy link
Contributor

#272

@jinyingsunny
Copy link
Author

jinyingsunny commented Sep 14, 2023

checked on reg.vesoft-inc.com/cloud-dev/nebula-operator:snap-1.14. rebuilded today.

  1. 不会缩容到2个节点,storage会保留在: phase: ScaleIn;(kubectl -n nebula get cm nebulazcert-graphd-zone -o yaml)
  storaged:
    balancedSpaces:
    - 2
    lastBalanceJob:
      jobID: 8.  //表明jobId
      spaceID: 2   //表明space
    phase: ScaleIn
    version: v3.5.0-sc
    workload:
      availableReplicas: 32
      collisionCount: 0
      currentReplicas: 32
      currentRevision: nebulazcert-storaged-58cb5cb744
      observedGeneration: 5
      readyReplicas: 32
      replicas: 32
      updateRevision: nebulazcert-storaged-58cb5cb744
      updatedReplicas: 32
  1. hosts不会被缩容掉,balance data remove的job会失败;
> show job 8
+------------------------+-------------------+------------+----------------------------+----------------------------+-------------------+
| Job Id(spaceId:partId) | Command(src->dst) | Status     | Start Time                 | Stop Time                  | State             |
+------------------------+-------------------+------------+----------------------------+----------------------------+-------------------+
| 8                      | "DATA_BALANCE"    | "FAILED"   | 2023-09-14T12:37:19.000000 | 2023-09-14T12:37:19.000000 | "E_KEY_NOT_FOUND" |
| "Total:0"              | "Succeeded:0"     | "Failed:0" | "In Progress:0"            | "Invalid:0"                | ""                |
+------------------------+-------------------+------------+----------------------------+----------------------------+-------------------+
Got 2 rows (time spent 2.95ms/4.77751ms)
  1. 有执行remove hosts失败的日志
E0914 12:36:16.985392       1 storaged_scaler.go:185] remove hosts [HostAddr({Host:nebulazcert-storaged-35.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-34.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-33.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-32.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-31.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-30.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-29.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-28.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-27.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-26.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-25.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-24.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-23.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-22.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-21.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-20.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-19.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-18.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-17.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-16.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-15.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-14.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-13.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-12.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-11.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-10.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-9.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-8.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-7.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-6.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-5.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-4.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-3.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779}) HostAddr({Host:nebulazcert-storaged-2.nebulazcert-storaged-headless.nebula.svc.cluster.local Port:9779})] failed: Balance job still in progress, jobID 8, spaceID 2

@github-actions github-actions bot added the process/fixed Process of bug label Sep 14, 2023
@jinyingsunny jinyingsunny added process/done Process of bug and removed process/fixed Process of bug labels Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/none PR/issue: this bug affects none version. process/done Process of bug severity/none Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

2 participants