Panic on startup (nil dereference) running on Nomad #133

cottand · 2023-08-07T23:59:09Z

I am running this as a CSI plugin to Nomad. I followed this example, except

I use Nomad service discovery (not consul)
Filer is using leveldb2 store not Postgres
Master is single instance

The CSI plugin fails on any Nomad client (any pod) so I think the trace is not specific to the host machine, althoguh all my machines are configured very similarly. Version is latest for the CSI image, 3.55 for filer, volumes, master etc.

Logs:

I0807 23:46:18.075502 driver.go:105 starting
I0807 23:46:18.075881 server.go:94 Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xcbb35e]

goroutine 72 [running]:
github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver.(*ControllerServer).ControllerGetCapabilities(0x0, {0xc000125940?, 0x40da07?}, 0x10?)
	/go/src/github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver/controllerserver.go:179 +0x5e
github.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerGetCapabilities_Handler.func1({0x101cef0, 0xc0003f2ea0}, {0xe3f5e0?, 0xc000446200})
	/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:6546 +0x78
github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver.logGRPC({0x101cef0, 0xc0003f2ea0}, {0xe3f5e0, 0xc000446200}, 0xc000446220, 0xc0000a8318)
	/go/src/github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver/utils.go:64 +0x132
github.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerGetCapabilities_Handler({0xe6fe20?, 0x0}, {0x101cef0, 0xc0003f2ea0}, 0xc0002a4310, 0xf2ef48)
	/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:6548 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000356000, {0x1021760, 0xc0003fa4e0}, 0xc000336360, 0xc00031b410, 0x168cba8, 0x0)
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1360 +0xe23
google.golang.org/grpc.(*Server).handleStream(0xc000356000, {0x1021760, 0xc0003fa4e0}, 0xc000336360, 0x0)
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1737 +0xa36
google.golang.org/grpc.(*Server).serveStreams.func1.1()
	/go/pkg/mod/google.golang.org/[email protected]/server.go:982 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/pkg/mod/google.golang.org/[email protected]/server.go:980 +0x18c

CSI plugin job:

job "seaweedfs-plugin" {
  datacenters = ["dc1"]
  type        = "system"
  update {
    max_parallel = 1
    stagger      = "60s"
  }

  # only one plugin of a given type and ID should be deployed on
  # any given client node
  constraint {
    operator = "distinct_hosts"
    value    = true
  }

  group "nodes" {
    ephemeral_disk {
      migrate = false
      size    = 5000
      sticky  = false
    }
    restart {
      interval = "5m"
      attempts = 10
      delay    = "15s"
      mode     = "delay"
    }
    # does not need to run on a client with seaweed, only needs docker privileged
    task "plugin" {
      driver = "docker"

      template {
        destination = "config/.env"
        change_mode = "restart"
        env         = true
        data        = <<-EOF
{{ range $i, $s := nomadService "seaweedfs-filer-http" }}
{{- if eq $i 0 -}}
SEAWEEDFS_FILER_IP_http={{ .Address }}
SEAWEEDFS_FILER_PORT_http={{ .Port }}
{{- end -}}
{{ end }}
{{ range $i, $s := nomadService "seaweedfs-filer-grpc" }}
{{- if eq $i 0 -}}
SEAWEEDFS_FILER_IP_grpc={{ .Address }}
SEAWEEDFS_FILER_PORT_grpc={{ .Port }}
{{- end -}}
{{ end }}
EOF
      }

      config {
        network_mode = "host"
        image        = "chrislusf/seaweedfs-csi-driver:latest"
        force_pull   = "true"

        args = [
          "--endpoint=unix://csi/csi.sock",
          "--filer=${SEAWEEDFS_FILER_IP_http}:${SEAWEEDFS_FILER_PORT_http}.${SEAWEEDFS_FILER_PORT_grpc}",
          "--nodeid=${node.unique.name}",
          "--cacheCapacityMB=1000",
          "--cacheDir=${NOMAD_TASK_DIR}/cache_dir",
        ]

        privileged = true
      }

      csi_plugin {
        id        = "seaweedfs"
        type      = "monolith"
        mount_dir = "/csi"
      }
      resources {
        cpu        = 100
        memory     = 512
        memory_max = 2048
      }
    }
  }
}

Let me know if I should provide more info.

The text was updated successfully, but these errors were encountered:

chrislusf · 2023-08-08T00:25:03Z

cc @kvaster possibly related to recent PRs? Or the doc needs changes?

kvaster · 2023-08-08T05:37:20Z

I'm investigating. It's look really strange.

kvaster · 2023-08-08T05:39:33Z

Yes. It's really related to my changes, I will make one more PR in a 30 minutes. The problem is that I've introduced incompatibility with previous setups. From now you should run either --controller or --node or both of them the same time.

cottand · 2023-08-08T08:21:19Z

if this is the result of a breaking change, I would ideally expect

guidance on the releases page, possibly with a 'Breaking Changes' section - which I did actually look for!
possibly a minor version bump, depending on when the breaking change happened
an update to the documentation, specifically, the Nomad example I was using

thanks!

kvaster · 2023-08-08T08:24:26Z

It was not supposed to be a breaking change. I've made a PR which fixes the problem. It was supposed that previous installs would work without any changes.

cottand · 2023-08-08T08:26:08Z

I see, no worries then. In that case I would appreciate some docs on what --controller or --node do and other available options

kvaster · 2023-08-08T08:31:36Z

It was a big refactoring for running driver in kubernetes. Controller server should be running separate of node server. Node server is a daemon which runs on all nodes which can mount seaweedfs and controller should be just fail safe and HA.

kvaster · 2023-08-08T08:32:12Z

It's all about CSI.

cottand · 2023-08-08T08:56:00Z

to achieve the same behaviour as before - can I use both options on all boxes safely? Or will the controllers need to speak to each other/will that increase gossip somehow?

ie, is the example Nomad deployment unchanged (I might need to run a controller separately) or do I have better options now, for HA or performance?

edit - I still get nil dereference when using both options on my existing setup

cottand · 2023-08-08T13:11:05Z

@chrislusf you marked as complemeted but in #134 you did not update the Nomad example (but updated the helm charts) - do the default options for Nomad remain unchanged?

kvaster · 2023-08-13T04:01:18Z

Yes. Default options remain unchanged now - as it should be.

kvaster added a commit to kvaster/seaweedfs-csi-driver that referenced this issue Aug 8, 2023

Refactor options to be backward compatible, fixes seaweedfs#133

21e23d1

kvaster mentioned this issue Aug 8, 2023

Refactor options to be backward compatible #134

Merged

kvaster added a commit to kvaster/seaweedfs-csi-driver that referenced this issue Aug 8, 2023

Refactor options to be backward compatible, fixes seaweedfs#133

6770b37

kvaster added a commit to kvaster/seaweedfs-csi-driver that referenced this issue Aug 8, 2023

Refactor options to be backward compatible, fixes seaweedfs#133

3cdd18a

chrislusf closed this as completed in #134 Aug 8, 2023

chrislusf pushed a commit that referenced this issue Aug 8, 2023

Refactor options to be backward compatible, fixes #133

aed2235

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic on startup (nil dereference) running on Nomad #133

Panic on startup (nil dereference) running on Nomad #133

cottand commented Aug 7, 2023

chrislusf commented Aug 8, 2023 •

edited

Loading

kvaster commented Aug 8, 2023

kvaster commented Aug 8, 2023

cottand commented Aug 8, 2023 •

edited

Loading

kvaster commented Aug 8, 2023

cottand commented Aug 8, 2023 •

edited

Loading

kvaster commented Aug 8, 2023

kvaster commented Aug 8, 2023

cottand commented Aug 8, 2023 •

edited

Loading

cottand commented Aug 8, 2023

kvaster commented Aug 13, 2023

Panic on startup (nil dereference) running on Nomad #133

Panic on startup (nil dereference) running on Nomad #133

Comments

cottand commented Aug 7, 2023

chrislusf commented Aug 8, 2023 • edited Loading

kvaster commented Aug 8, 2023

kvaster commented Aug 8, 2023

cottand commented Aug 8, 2023 • edited Loading

kvaster commented Aug 8, 2023

cottand commented Aug 8, 2023 • edited Loading

kvaster commented Aug 8, 2023

kvaster commented Aug 8, 2023

cottand commented Aug 8, 2023 • edited Loading

cottand commented Aug 8, 2023

kvaster commented Aug 13, 2023

chrislusf commented Aug 8, 2023 •

edited

Loading

cottand commented Aug 8, 2023 •

edited

Loading

cottand commented Aug 8, 2023 •

edited

Loading

cottand commented Aug 8, 2023 •

edited

Loading