Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server ID not being set when configured under spec.configuration.clusters[].layout.replicas[].settings #1610

Open
dashashutosh24 opened this issue Jan 7, 2025 · 0 comments

Comments

@dashashutosh24
Copy link

Hello, I am facing an issue with the latest CHK kind resource in keeper version with operator version 24.2 and keeper version 24.8.8.17 Here is the manifest that I am using:

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: chkeeper
  labels:
    app: clickhouse-keeper
    app.kubernetes.io/version: "24.8.8.17"
    app.kubernetes.io/managed-by: Helm
  annotations:
    meta.helm.sh/release-namespace: default
  namespace: default
spec:
  configuration:
    clusters:
      - name: chkeeper
        layout:
         replicas:
          - settings:
              keeper_server/raft_configuration/server/id: 1
              keeper_server/server_id: 1
    settings:
      logger/level: "trace"
      logger/console: "true"
      listen_host: "0.0.0.0"
      keeper_server/four_letter_word_white_list: "*"
      keeper_server/coordination_settings/raft_logs_level: "information"
      keeper_server/path: "/var/lib/clickhouse-keeper"
      keeper_server/snapshot_storage_path: /var/lib/clickhouse-keeper/snapshots
      keeper_server/log_storage_path: /var/lib/clickhouse-keeper/logs
      # keeper_server/server_id: 1
      prometheus/endpoint: "/metrics"
      prometheus/port: "7000"
      prometheus/metrics: "true"
      prometheus/events: "true"
      prometheus/asynchronous_metrics: "true"
      prometheus/status_info: "false"
  defaults:
    templates:
      # Templates are specified as default for all clusters
      podTemplate: pod
      dataVolumeClaimTemplate: datadir-volume
      serviceTemplate: svc
  templates:
    # set serviceTemplate to generate service with desired spec as part of chk installation
    serviceTemplates:
      - name: svc
        generateName: clickhouse-chk-svc
        metadata:
          labels:
            app: clickhouse-keeper
            app.kubernetes.io/version: "24.8.8.17"
            app.kubernetes.io/managed-by: Helm
          annotations:
            meta.helm.sh/release-namespace: default
            prometheus.io/port: metrics
            prometheus.io/scrape: "true"
        spec:
          type: ClusterIP
          selector:
            app: clickhouse-keeper
          ports:
            - name: client
              port: 2181
            - name: keeper-metrics
              port: 7000
    podTemplates:
      - name: pod
        metadata:
          labels:
            app: clickhouse-keeper
            app.kubernetes.io/version: "24.8.8.17"
            app.kubernetes.io/managed-by: Helm
          annotations:
            meta.helm.sh/release-namespace: default
            prometheus.io/port: '7000'
            prometheus.io/scrape: 'true'
        spec:
          affinity:
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - podAffinityTerm:
                  labelSelector:
                    matchLabels:
                      app: clickhouse-keeper
                  topologyKey: kubernetes.io/hostname
                weight: 1
          priorityClassName: app-high-priority
          containers:
            - name: clickhouse-keeper
              imagePullPolicy: IfNotPresent
              image: "clickhouse/clickhouse-keeper:24.8.8.17"
              resources:
                limits:
                  cpu: "0.33"
                  memory: 1Gi
                requests:
                  cpu: "0.33"
                  memory: 1Gi
          securityContext:
            fsGroup: 101
    volumeClaimTemplates:
      - name: datadir-volume
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi

The pod fails to come up with error:

2025.01.07 04:37:15.823745 [ 1 ] {} <Error> Application: Code: 568. DB::Exception: Our server id 1 not found in raft_configuration section. (RAFT_ERROR), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000a9229db
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000005508b6c
2. DB::Exception::Exception<int const&>(int, FormatStringHelperImpl<std::type_identity<int const&>::type>, int const&) @ 0x0000000006027d4b
3. DB::KeeperStateManager::parseServersConfiguration(Poco::Util::AbstractConfiguration const&, bool, bool) const @ 0x000000000c8fcd09
4. DB::KeeperStateManager::KeeperStateManager(int, String const&, String const&, Poco::Util::AbstractConfiguration const&, std::shared_ptr<DB::KeeperContext>) @ 0x000000000c8fe7d9
5. DB::KeeperServer::KeeperServer(std::shared_ptr<DB::KeeperConfigurationAndSettings> const&, Poco::Util::AbstractConfiguration const&, ConcurrentBoundedQueue<DB::KeeperStorageBase::ResponseForSession>&, ConcurrentBoundedQueue<DB::CreateSnapshotTask>&, std::shared_ptr<DB::KeeperContext>, DB::KeeperSnapshotManagerS3&, std::function<void (unsigned long, DB::KeeperStorageBase::RequestForSession const&)>) @ 0x000000000c8779a6
6. DB::KeeperDispatcher::initialize(Poco::Util::AbstractConfiguration const&, bool, bool, std::shared_ptr<DB::Macros const> const&) @ 0x000000000c8605d2
7. DB::Context::initializeKeeperDispatcher(bool) const @ 0x000000000b2a0c38
8. DB::Keeper::main(std::vector<String, std::allocator<String>> const&) @ 0x00000000054fe61f
9. Poco::Util::Application::run() @ 0x000000001046b846
10. DB::Keeper::run() @ 0x00000000054fb990
11. Poco::Util::ServerApplication::run(int, char**) @ 0x0000000010473b47
12. mainEntryClickHouseKeeper(int, char**) @ 0x00000000054fa5f2
13. main @ 0x00000000054f9550
14. ? @ 0x00007f2c5b5da083
15. _start @ 0x0000000004d7002e
 (version 24.8.8.17 (official build))

However when keeper_server/server_id: 1 is set under spec.configuration.settings, the pod comes up healthy. It seems like keeper_server/raft_configuration/server/id: 1 is being taken into consideration when set under cluster layout settings however keeper_server/server_id doesn't work when set under the same. I have tested the same by keeping keeper_server/raft_configuration/server/id under spec.configuration.clusters[].layout.replicas[].settings and keeper_server/server_id under spec.configuration.settings.
Screenshot 2025-01-07 at 10 12 30 AM

The requirement is to add specific server IDs to each replica rather than the default approach. Hence this cannot be achieved with spec.configuration.settings and must be done with spec.configuration.clusters[].layout.replicas[].settings. If not, is there any other way to achieve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant