rocksdb_db_options won't apply #309

wenhaocs · 2023-10-03T19:39:36Z

My own simulation （v1.6.3）

(1) I checked the max_background_jobs is the default value of 2 in rocksdb

(2) Then I changed it to 3 in kubectl edit

(3) Since this flag is dynamic, it will not trigger restart. The options of the existing rocksdb instance keep the same.

(4) Now I create a space, the newly created rocksdb instance should pick the new value (work as expected).

(5) After I add a new static flag, why is there no restart? (Update: because the default value is 4096)

When using curl to check the flags in storaged, it's updated. Apparently, this is not dynamic flag and should trigger restart....

(6) After I update the static flag, it will trigger restart.

I1003 23:01:45.341053       1 nebula_cluster_controller.go:162] Start to reconcile NebulaCluster
I1003 23:01:45.377849       1 storaged_updater.go:185] pod [nebula/nebula-storaged-2] leader count is 0, ready for rolling update
I1003 23:01:45.377925       1 helper.go:231] dynamic flags: map[rocksdb_db_options:{"max_subcompactions":"3","max_background_jobs":"3"} wal_ttl:500]
E1003 23:01:45.380769       1 nebula_cluster_control.go:124] reconcile storaged cluster failed: update storaged cluster nebula-storaged dynamic flags failed: get http://nebula-storaged-2.nebula-storaged-headless.nebula.svc.cluster.local:19779/flags response body is empty
I1003 23:01:45.395742       1 nebulacluster.go:119] NebulaCluster [nebula/nebula] updated successfully
I1003 23:01:45.395768       1 nebula_cluster_controller.go:173] NebulaCluster [nebula/nebula] reconcile details: waiting for nebulacluster ready
E1003 23:01:45.395777       1 nebula_cluster_controller.go:184] NebulaCluster [nebula/nebula] reconcile failed: update storaged cluster nebula-storaged dynamic flags failed: get http://nebula-storaged-2.nebula-storaged-headless.nebula.svc.cluster.local:19779/flags response body is empty
I1003 23:01:45.395781       1 nebula_cluster_controller.go:143] Finished reconciling NebulaCluster [nebula/nebula] (54.860001ms), result: {false 5s}
I1003 23:01:50.396939       1 nebula_cluster_controller.go:162] Start to reconcile NebulaCluster
I1003 23:01:50.503747       1 storaged_updater.go:185] pod [nebula/nebula-storaged-2] leader count is 0, ready for rolling update
I1003 23:01:50.504058       1 helper.go:231] dynamic flags: map[rocksdb_db_options:{"max_subcompactions":"3","max_background_jobs":"3"} wal_ttl:500]
E1003 23:01:50.506426       1 nebula_cluster_control.go:124] reconcile storaged cluster failed: update storaged cluster nebula-storaged dynamic flags failed: get http://nebula-storaged-2.nebula-storaged-headless.nebula.svc.cluster.local:19779/flags response body is empty
I1003 23:01:50.522574       1 nebulacluster.go:119] NebulaCluster [nebula/nebula] updated successfully
I1003 23:01:50.522594       1 nebula_cluster_controller.go:173] NebulaCluster [nebula/nebula] reconcile details: waiting for nebulacluster ready
E1003 23:01:50.522601       1 nebula_cluster_controller.go:184] NebulaCluster [nebula/nebula] reconcile failed: update storaged cluster nebula-storaged dynamic flags failed: get http://nebula-storaged-2.nebula-storaged-headless.nebula.svc.cluster.local:19779/flags response body is empty
I1003 23:01:50.522607       1 nebula_cluster_controller.go:143] Finished reconciling NebulaCluster [nebula/nebula] (125.732283ms), result: {false 5s}
I1003 23:01:55.523212       1 nebula_cluster_controller.go:162] Start to reconcile NebulaCluster
I1003 23:01:55.559036       1 storaged_updater.go:185] pod [nebula/nebula-storaged-2] leader count is 0, ready for rolling update
I1003 23:01:55.559397       1 helper.go:231] dynamic flags: map[rocksdb_db_options:{"max_subcompactions":"3","max_background_jobs":"3"} wal_ttl:500]
E1003 23:01:55.561028       1 nebula_cluster_control.go:124] reconcile storaged cluster failed: update storaged cluster nebula-storaged dynamic flags failed: get http://nebula-storaged-2.nebula-storaged-headless.nebula.svc.cluster.local:19779/flags response body is empty
I1003 23:01:55.576263       1 nebulacluster.go:119] NebulaCluster [nebula/nebula] updated successfully
I1003 23:01:55.576283       1 nebula_cluster_controller.go:173] NebulaCluster [nebula/nebula] reconcile details: waiting for nebulacluster ready
E1003 23:01:55.576290       1 nebula_cluster_controller.go:184] NebulaCluster [nebula/nebula] reconcile failed: update storaged cluster nebula-storaged dynamic flags failed: get http://nebula-storaged-2.nebula-storaged-headless.nebula.svc.cluster.local:19779/flags response body is empty

But the rocksdb_options are not correctly set. Though the gflags are set. (BUG)

(7) Thereafter, I remove the static flag I just added. It can restart. But rocksdb_db_options value will not take effect in rocksdb either. Though the gflags have been set. (BUG)

User's scenario

What was user doing is running kubectl apply to update flags. After the update, they saw very high ingestion latency. The disk load is also low. We saw a lot of write stall. Then we examined the rocksdb log, and found the compaction related flags are default. Seems rocksdb_db_options will only take effect if you set it before creating the cluster.

Issues

Why the user can still get the correct values being set from rocksdb LOG in some storaged instances, while others are using the default value?
Why is rocksdb_db_options is not reflected in rocksdb? Note: When a rocksdb instance is created, the flag must be ready. If you curl to modify the flag after the rocksdb instance is created, it will not take effect.
Please make sure all changes via kubectl apply and kubectl edit are equivalent. If people use kubectl apply first and then use kubectl edit to update the same static or dynamic flag, the value can be picked up.

Here is the logs of operator:
downloaded-logs-20231003-122906.csv

The text was updated successfully, but these errors were encountered:

wenhaocs · 2023-10-03T23:37:29Z

When testing, please make sure kubectl apply will behave as expected (It is how users are updating configs).

MegaByte875 · 2023-10-04T10:25:30Z

#310

wenhaocs · 2023-10-04T21:18:30Z

About Why the user can still get the correct values being set from rocksdb LOG in some storaged instances, while others are using the default value? It must because some storaged restarted. Once a storaged restarts, rocksdb will get created with the default value of rocksdb_db_options, and then Operator sends a curl command to update the rocksdb_db_options, which will not take effect on those created rocksdb instances. Even for those storaged which seem works fine and do not have issues of high compaction latency, not all the rocksdb instances in those storaged have the right value set. The rocksdb for default space is still using the default value. This is because that rocksdb instance was created before the dynamic flag gets set. All other rocksdb instances are created when creating a space, at which time the dynamic flag has already been set. That's why all the rocksdb instances on that storaged seems to pick the correct value, except for 1 rocksdb instance.

The process of updating static flags and dynamic flags are the same regardless it's first start or restart. It's always updating the static flags first and pass them as the startup flags of graphd/metad/storaged. After the whole stateful set is in running state, Operator will send curl commands to update the dynamic flags.

wenhaocs added the type/bug Type: something is unexpected label Oct 3, 2023

github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Oct 3, 2023

Sophie-Xie added this to the v1.6.x milestone Oct 4, 2023

wenhaocs closed this as completed Oct 4, 2023

github-actions bot added the process/fixed Process of bug label Oct 4, 2023

abby-cyber mentioned this issue Oct 7, 2023

remove parameters from dynamic flags vesoft-inc/nebula-docs-cn#3016

Closed

wey-gu mentioned this issue Oct 7, 2023

Weekly Report 2023-10-06 vesoft-inc/nebula-community#413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rocksdb_db_options won't apply #309

rocksdb_db_options won't apply #309

wenhaocs commented Oct 3, 2023 •

edited

Loading

wenhaocs commented Oct 3, 2023

MegaByte875 commented Oct 4, 2023

wenhaocs commented Oct 4, 2023 •

edited

Loading

rocksdb_db_options won't apply #309

rocksdb_db_options won't apply #309

Comments

wenhaocs commented Oct 3, 2023 • edited Loading

My own simulation （v1.6.3）

User's scenario

Issues

wenhaocs commented Oct 3, 2023

MegaByte875 commented Oct 4, 2023

wenhaocs commented Oct 4, 2023 • edited Loading

wenhaocs commented Oct 3, 2023 •

edited

Loading

wenhaocs commented Oct 4, 2023 •

edited

Loading