Server ID not being set when configured under spec.configuration.clusters[].layout.replicas[].settings #1610

dashashutosh24 · 2025-01-07T18:25:17Z

Hello, I am facing an issue with the latest CHK kind resource in keeper version with operator version 24.2 and keeper version 24.8.8.17 Here is the manifest that I am using:

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: chkeeper
  labels:
    app: clickhouse-keeper
    app.kubernetes.io/version: "24.8.8.17"
    app.kubernetes.io/managed-by: Helm
  annotations:
    meta.helm.sh/release-namespace: default
  namespace: default
spec:
  configuration:
    clusters:
      - name: chkeeper
        layout:
         replicas:
          - settings:
              keeper_server/raft_configuration/server/id: 1
              keeper_server/server_id: 1
    settings:
      logger/level: "trace"
      logger/console: "true"
      listen_host: "0.0.0.0"
      keeper_server/four_letter_word_white_list: "*"
      keeper_server/coordination_settings/raft_logs_level: "information"
      keeper_server/path: "/var/lib/clickhouse-keeper"
      keeper_server/snapshot_storage_path: /var/lib/clickhouse-keeper/snapshots
      keeper_server/log_storage_path: /var/lib/clickhouse-keeper/logs
      # keeper_server/server_id: 1
      prometheus/endpoint: "/metrics"
      prometheus/port: "7000"
      prometheus/metrics: "true"
      prometheus/events: "true"
      prometheus/asynchronous_metrics: "true"
      prometheus/status_info: "false"
  defaults:
    templates:
      # Templates are specified as default for all clusters
      podTemplate: pod
      dataVolumeClaimTemplate: datadir-volume
      serviceTemplate: svc
  templates:
    # set serviceTemplate to generate service with desired spec as part of chk installation
    serviceTemplates:
      - name: svc
        generateName: clickhouse-chk-svc
        metadata:
          labels:
            app: clickhouse-keeper
            app.kubernetes.io/version: "24.8.8.17"
            app.kubernetes.io/managed-by: Helm
          annotations:
            meta.helm.sh/release-namespace: default
            prometheus.io/port: metrics
            prometheus.io/scrape: "true"
        spec:
          type: ClusterIP
          selector:
            app: clickhouse-keeper
          ports:
            - name: client
              port: 2181
            - name: keeper-metrics
              port: 7000
    podTemplates:
      - name: pod
        metadata:
          labels:
            app: clickhouse-keeper
            app.kubernetes.io/version: "24.8.8.17"
            app.kubernetes.io/managed-by: Helm
          annotations:
            meta.helm.sh/release-namespace: default
            prometheus.io/port: '7000'
            prometheus.io/scrape: 'true'
        spec:
          affinity:
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - podAffinityTerm:
                  labelSelector:
                    matchLabels:
                      app: clickhouse-keeper
                  topologyKey: kubernetes.io/hostname
                weight: 1
          priorityClassName: app-high-priority
          containers:
            - name: clickhouse-keeper
              imagePullPolicy: IfNotPresent
              image: "clickhouse/clickhouse-keeper:24.8.8.17"
              resources:
                limits:
                  cpu: "0.33"
                  memory: 1Gi
                requests:
                  cpu: "0.33"
                  memory: 1Gi
          securityContext:
            fsGroup: 101
    volumeClaimTemplates:
      - name: datadir-volume
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi

The pod fails to come up with error:

2025.01.07 04:37:15.823745 [ 1 ] {} <Error> Application: Code: 568. DB::Exception: Our server id 1 not found in raft_configuration section. (RAFT_ERROR), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000a9229db
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000005508b6c
2. DB::Exception::Exception<int const&>(int, FormatStringHelperImpl<std::type_identity<int const&>::type>, int const&) @ 0x0000000006027d4b
3. DB::KeeperStateManager::parseServersConfiguration(Poco::Util::AbstractConfiguration const&, bool, bool) const @ 0x000000000c8fcd09
4. DB::KeeperStateManager::KeeperStateManager(int, String const&, String const&, Poco::Util::AbstractConfiguration const&, std::shared_ptr<DB::KeeperContext>) @ 0x000000000c8fe7d9
5. DB::KeeperServer::KeeperServer(std::shared_ptr<DB::KeeperConfigurationAndSettings> const&, Poco::Util::AbstractConfiguration const&, ConcurrentBoundedQueue<DB::KeeperStorageBase::ResponseForSession>&, ConcurrentBoundedQueue<DB::CreateSnapshotTask>&, std::shared_ptr<DB::KeeperContext>, DB::KeeperSnapshotManagerS3&, std::function<void (unsigned long, DB::KeeperStorageBase::RequestForSession const&)>) @ 0x000000000c8779a6
6. DB::KeeperDispatcher::initialize(Poco::Util::AbstractConfiguration const&, bool, bool, std::shared_ptr<DB::Macros const> const&) @ 0x000000000c8605d2
7. DB::Context::initializeKeeperDispatcher(bool) const @ 0x000000000b2a0c38
8. DB::Keeper::main(std::vector<String, std::allocator<String>> const&) @ 0x00000000054fe61f
9. Poco::Util::Application::run() @ 0x000000001046b846
10. DB::Keeper::run() @ 0x00000000054fb990
11. Poco::Util::ServerApplication::run(int, char**) @ 0x0000000010473b47
12. mainEntryClickHouseKeeper(int, char**) @ 0x00000000054fa5f2
13. main @ 0x00000000054f9550
14. ? @ 0x00007f2c5b5da083
15. _start @ 0x0000000004d7002e
 (version 24.8.8.17 (official build))

However when keeper_server/server_id: 1 is set under spec.configuration.settings, the pod comes up healthy. It seems like keeper_server/raft_configuration/server/id: 1 is being taken into consideration when set under cluster layout settings however keeper_server/server_id doesn't work when set under the same. I have tested the same by keeping keeper_server/raft_configuration/server/id under spec.configuration.clusters[].layout.replicas[].settings and keeper_server/server_id under spec.configuration.settings.

The requirement is to add specific server IDs to each replica rather than the default approach. Hence this cannot be achieved with spec.configuration.settings and must be done with spec.configuration.clusters[].layout.replicas[].settings. If not, is there any other way to achieve this?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server ID not being set when configured under spec.configuration.clusters[].layout.replicas[].settings #1610

Server ID not being set when configured under spec.configuration.clusters[].layout.replicas[].settings #1610

dashashutosh24 commented Jan 7, 2025

Server ID not being set when configured under spec.configuration.clusters[].layout.replicas[].settings #1610

Server ID not being set when configured under spec.configuration.clusters[].layout.replicas[].settings #1610

Comments

dashashutosh24 commented Jan 7, 2025