Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on startup (nil dereference) running on Nomad #133

Closed
cottand opened this issue Aug 7, 2023 · 11 comments · Fixed by #134
Closed

Panic on startup (nil dereference) running on Nomad #133

cottand opened this issue Aug 7, 2023 · 11 comments · Fixed by #134

Comments

@cottand
Copy link

cottand commented Aug 7, 2023

I am running this as a CSI plugin to Nomad. I followed this example, except

  • I use Nomad service discovery (not consul)
  • Filer is using leveldb2 store not Postgres
  • Master is single instance

The CSI plugin fails on any Nomad client (any pod) so I think the trace is not specific to the host machine, althoguh all my machines are configured very similarly. Version is latest for the CSI image, 3.55 for filer, volumes, master etc.

Logs:

I0807 23:46:18.075502 driver.go:105 starting
I0807 23:46:18.075881 server.go:94 Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xcbb35e]

goroutine 72 [running]:
github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver.(*ControllerServer).ControllerGetCapabilities(0x0, {0xc000125940?, 0x40da07?}, 0x10?)
	/go/src/github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver/controllerserver.go:179 +0x5e
github.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerGetCapabilities_Handler.func1({0x101cef0, 0xc0003f2ea0}, {0xe3f5e0?, 0xc000446200})
	/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:6546 +0x78
github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver.logGRPC({0x101cef0, 0xc0003f2ea0}, {0xe3f5e0, 0xc000446200}, 0xc000446220, 0xc0000a8318)
	/go/src/github.com/seaweedfs/seaweedfs-csi-driver/pkg/driver/utils.go:64 +0x132
github.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerGetCapabilities_Handler({0xe6fe20?, 0x0}, {0x101cef0, 0xc0003f2ea0}, 0xc0002a4310, 0xf2ef48)
	/go/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/csi.pb.go:6548 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000356000, {0x1021760, 0xc0003fa4e0}, 0xc000336360, 0xc00031b410, 0x168cba8, 0x0)
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1360 +0xe23
google.golang.org/grpc.(*Server).handleStream(0xc000356000, {0x1021760, 0xc0003fa4e0}, 0xc000336360, 0x0)
	/go/pkg/mod/google.golang.org/[email protected]/server.go:1737 +0xa36
google.golang.org/grpc.(*Server).serveStreams.func1.1()
	/go/pkg/mod/google.golang.org/[email protected]/server.go:982 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/pkg/mod/google.golang.org/[email protected]/server.go:980 +0x18c

CSI plugin job:

job "seaweedfs-plugin" {
  datacenters = ["dc1"]
  type        = "system"
  update {
    max_parallel = 1
    stagger      = "60s"
  }

  # only one plugin of a given type and ID should be deployed on
  # any given client node
  constraint {
    operator = "distinct_hosts"
    value    = true
  }

  group "nodes" {
    ephemeral_disk {
      migrate = false
      size    = 5000
      sticky  = false
    }
    restart {
      interval = "5m"
      attempts = 10
      delay    = "15s"
      mode     = "delay"
    }
    # does not need to run on a client with seaweed, only needs docker privileged
    task "plugin" {
      driver = "docker"

      template {
        destination = "config/.env"
        change_mode = "restart"
        env         = true
        data        = <<-EOF
{{ range $i, $s := nomadService "seaweedfs-filer-http" }}
{{- if eq $i 0 -}}
SEAWEEDFS_FILER_IP_http={{ .Address }}
SEAWEEDFS_FILER_PORT_http={{ .Port }}
{{- end -}}
{{ end }}
{{ range $i, $s := nomadService "seaweedfs-filer-grpc" }}
{{- if eq $i 0 -}}
SEAWEEDFS_FILER_IP_grpc={{ .Address }}
SEAWEEDFS_FILER_PORT_grpc={{ .Port }}
{{- end -}}
{{ end }}
EOF
      }

      config {
        network_mode = "host"
        image        = "chrislusf/seaweedfs-csi-driver:latest"
        force_pull   = "true"

        args = [
          "--endpoint=unix://csi/csi.sock",
          "--filer=${SEAWEEDFS_FILER_IP_http}:${SEAWEEDFS_FILER_PORT_http}.${SEAWEEDFS_FILER_PORT_grpc}",
          "--nodeid=${node.unique.name}",
          "--cacheCapacityMB=1000",
          "--cacheDir=${NOMAD_TASK_DIR}/cache_dir",
        ]

        privileged = true
      }

      csi_plugin {
        id        = "seaweedfs"
        type      = "monolith"
        mount_dir = "/csi"
      }
      resources {
        cpu        = 100
        memory     = 512
        memory_max = 2048
      }
    }
  }
}

Let me know if I should provide more info.

@chrislusf
Copy link
Contributor

chrislusf commented Aug 8, 2023

cc @kvaster possibly related to recent PRs? Or the doc needs changes?

@kvaster
Copy link
Contributor

kvaster commented Aug 8, 2023

I'm investigating. It's look really strange.

@kvaster
Copy link
Contributor

kvaster commented Aug 8, 2023

Yes. It's really related to my changes, I will make one more PR in a 30 minutes. The problem is that I've introduced incompatibility with previous setups. From now you should run either --controller or --node or both of them the same time.

kvaster added a commit to kvaster/seaweedfs-csi-driver that referenced this issue Aug 8, 2023
kvaster added a commit to kvaster/seaweedfs-csi-driver that referenced this issue Aug 8, 2023
kvaster added a commit to kvaster/seaweedfs-csi-driver that referenced this issue Aug 8, 2023
@cottand
Copy link
Author

cottand commented Aug 8, 2023

if this is the result of a breaking change, I would ideally expect

  • guidance on the releases page, possibly with a 'Breaking Changes' section - which I did actually look for!
  • possibly a minor version bump, depending on when the breaking change happened
  • an update to the documentation, specifically, the Nomad example I was using

thanks!

@kvaster
Copy link
Contributor

kvaster commented Aug 8, 2023

It was not supposed to be a breaking change. I've made a PR which fixes the problem. It was supposed that previous installs would work without any changes.

@cottand
Copy link
Author

cottand commented Aug 8, 2023

I see, no worries then. In that case I would appreciate some docs on what --controller or --node do and other available options

@kvaster
Copy link
Contributor

kvaster commented Aug 8, 2023

It was a big refactoring for running driver in kubernetes. Controller server should be running separate of node server. Node server is a daemon which runs on all nodes which can mount seaweedfs and controller should be just fail safe and HA.

@kvaster
Copy link
Contributor

kvaster commented Aug 8, 2023

It's all about CSI.

@cottand
Copy link
Author

cottand commented Aug 8, 2023

to achieve the same behaviour as before - can I use both options on all boxes safely? Or will the controllers need to speak to each other/will that increase gossip somehow?

ie, is the example Nomad deployment unchanged (I might need to run a controller separately) or do I have better options now, for HA or performance?

edit - I still get nil dereference when using both options on my existing setup

@cottand
Copy link
Author

cottand commented Aug 8, 2023

@chrislusf you marked as complemeted but in #134 you did not update the Nomad example (but updated the helm charts) - do the default options for Nomad remain unchanged?

@kvaster
Copy link
Contributor

kvaster commented Aug 13, 2023

Yes. Default options remain unchanged now - as it should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants