Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installation is failed #308

Closed
sarwanjassi opened this issue Nov 2, 2023 · 25 comments
Closed

installation is failed #308

sarwanjassi opened this issue Nov 2, 2023 · 25 comments
Assignees

Comments

@sarwanjassi
Copy link

OS: Fedora release 37 (Thirty Seven)
Kernel: Linux dcmir25ds21-01 6.5.8-100.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 20 16:11:27 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
SELinux status: disabled

installation is failed with following error. Please advise.
[root@ ceph-nvmeof]# make setup
Setup core dump pattern as /tmp/coredump/core.*
mkdir -p /tmp/coredump
sudo bash -c 'echo "|/usr/bin/env tee /tmp/coredump/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern'
sudo bash -c 'echo 2048 > "/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"'
Actual Hugepages allocation: 2048
[root@ ceph-nvmeof]# make pull
docker-compose pull spdk bdevperf nvmeof nvmeof-devel nvmeof-cli discovery ceph
Pulling spdk ... done
Pulling bdevperf ... done
Pulling ceph ... done
Pulling nvmeof ... done
Pulling discovery ... done
Pulling nvmeof-devel ... done
Pulling nvmeof-cli ... done
WARNING: Some service image(s) must be built from source by running:
docker-compose build nvmeof-devel bdevperf discovery nvmeof nvmeof-cli
[root@ ceph-nvmeof]# docker-compose build nvmeof-devel bdevperf discovery nvmeof nvmeof-cli
Building bdevperf
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
Install the buildx component to build images with BuildKit:
https://docs.docker.com/go/buildx/

Sending build context to Docker daemon 142.6MB
Step 1/41 : FROM quay.io/centos/centos:stream9 AS build
---> 3e44e66a723b
Step 2/41 : ARG SPDK_CEPH_VERSION SPDK_VERSION
---> Using cache
---> cef4960f0666
Step 3/41 : COPY <<EOF /etc/yum.repos.d/ceph.repo
COPY failed: no source files were specified
ERROR: Service 'bdevperf' failed to build : Build failed

@sarwanjassi
Copy link
Author

nvmeof-cli create_bdev --pool rbd --image demo_image --bdev demo_bdev
Creating ceph-nvmeof_nvmeof-cli_run ... done
usage: python3 -m control.cli [-h] [--server-address SERVER_ADDRESS] [--server-port SERVER_PORT] [--client-key CLIENT_KEY] [--client-cert CLIENT_CERT]
[--server-cert SERVER_CERT]
{create_bdev,delete_bdev,create_subsystem,delete_subsystem,add_namespace,remove_namespace,add_host,remove_host,create_listener,delete_listener,get_subsystems}
...
python3 -m control.cli: error: create_bdev failed: code=StatusCode.INTERNAL message=bdev_rbd_register_cluster() got an unexpected keyword argument 'core_mask'
ERROR: 2

@gbregman
Copy link
Contributor

gbregman commented Nov 2, 2023

@sarwanjassi can you try building using the command:

DOCKER_BUILDKIT=1 COMPOSE_DOCKER_CLI_BUILD=1 docker-compose  build  spdk bdevperf nvmeof nvmeof-devel nvmeof-cli discovery ceph

@sarwanjassi
Copy link
Author

Thanks @gbregman

We have next issue at #3 from using demo document "Usage Demo
Configuring the NVMe-oF Gateway"

and manual steps:

Create a bdev (Block Device) from an RBD image:

nvmeof-cli create_bdev --pool rbd --image demo_image --bdev demo_bdev

nvmeof-cli create_bdev --pool rbd --image demo_image --bdev demo_bdev

Creating ceph-nvmeof_nvmeof-cli_run ... done
usage: python3 -m control.cli [-h] [--server-address SERVER_ADDRESS] [--server-port SERVER_PORT] [--client-key CLIENT_KEY] [--client-cert CLIENT_CERT]
[--server-cert SERVER_CERT]
{create_bdev,delete_bdev,create_subsystem,delete_subsystem,add_namespace,remove_namespace,add_host,remove_host,create_listener,delete_listener,get_subsystems}
...
python3 -m control.cli: error: create_bdev failed: code=StatusCode.INTERNAL message=bdev_rbd_register_cluster() got an unexpected keyword argument 'core_mask'
ERROR: 2

@sarwanjassi
Copy link
Author

Make Demo has error now: progress is going on with your help: what is next please advise.

Creating ceph-nvmeof_nvmeof-cli_run ... done
usage: python3 -m control.cli create_listener [-h] -n SUBNQN -g GATEWAY_NAME [-t TRTYPE] [-f ADRFAM] -a TRADDR -s TRSVCID
python3 -m control.cli create_listener: error: argument -g/--gateway-name: expected one argument
ERROR: 2
make: *** [mk/demo.mk:17: demo] Error 2
[root@dcmir25ds21-01 ceph-nvmeof]#

sarwanjassi added a commit to sarwanjassi/ceph-nvmeof that referenced this issue Nov 3, 2023
Refer installation issue at Fedora37 : ceph#308

Signed-off-by: Sarwan Singh Jassi <[email protected]>
@sarwanjassi
Copy link
Author

#make demo block has error which i duplicated from manual steps:

]# nvmeof-cli add_namespace --subnqn nqn.2016-06.io.spdk:cnode1 --bdev demo_bdev
Creating ceph-nvmeof_nvmeof-cli_run ... done
usage: python3 -m control.cli [-h] [--server-address SERVER_ADDRESS] [--server-port SERVER_PORT] [--client-key CLIENT_KEY] [--client-cert CLIENT_CERT]
[--server-cert SERVER_CERT]
{create_bdev,delete_bdev,create_subsystem,delete_subsystem,add_namespace,remove_namespace,add_host,remove_host,create_listener,delete_listener,get_subsystems}
...
python3 -m control.cli: error: add_namespace failed: code=StatusCode.INTERNAL message=request:
{
"nqn": "nqn.2016-06.io.spdk:cnode1",
"namespace": {
"bdev_name": "demo_bdev"
},
"method": "nvmf_subsystem_add_ns",
"req_id": 12
}
Got JSON-RPC error response
response:
{
"code": -32602,
"message": "Invalid parameters"
}
ERROR: 2

@sarwanjassi
Copy link
Author

Docker version 24.0.5, build
OS: Fedora release 37 (Thirty Seven)
Kernel: Linux dcmir25ds21-01 6.5.8-100.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 20 16:11:27 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
SELinux status: disabled

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi the "make demo" issue was fixed in PR #280. In the meantime you can issue the failing command manually:

  • Run "demo ps" and see the container id of nvmeof
  • Run:
 "docker-compose  run --rm nvmeof-cli --server-address <ADDRESS> --server-port 5500 create_listener --subnqn "nqn.2016-06.io.spdk:cnode1" --gateway-name  <CONTAINER-ID>--traddr <IP-ADDR> --trsvcid 4420"

Where the addresses and the container id are the ones for your environment. Then you'll need to also run the last command which comes after the create_listener:

docker-compose  run --rm nvmeof-cli --server-address <ADDRESS> --server-port 5500 add_host --subnqn "nqn.2016-06.io.spdk:cnode1" --host "*"

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi about the "core_mask" error, it's probably because you don't use the latest SPDK. Could you run "git log -1 spdk" ?

@sarwanjassi
Copy link
Author

spdk

ceph-nvmeof]# git log -a spdk
commit d0f0b40
Author: Alexander Indenbaum [email protected]
Date: Tue Sep 19 10:39:01 2023 +0300

spdk cherry pick: bdev/rbd: Do not submit IOs through thread sending.

Signed-off-by: Alexander Indenbaum <[email protected]>

commit a5cab1d
Author: Alexander Indenbaum [email protected]
Date: Mon Sep 11 20:10:06 2023 +0300

Issue #28 Use bdev_rbd_register_cluster

Signed-off-by: Alexander Indenbaum <[email protected]>

commit a74318c
Author: Ernesto Puerta [email protected]
Date: Tue Sep 5 10:59:40 2023 +0200

revert(spdk): recover spdk v23.01.1

Fixes #214
Fixes #179

Signed-off-by: Ernesto Puerta <[email protected]>

commit 5c6ac22
Author: Ernesto Puerta [email protected]
Date: Fri Aug 11 10:27:43 2023 +0200

fix(package): move proto dir inside control

Fixes: #178

Signed-off-by: Ernesto Puerta <[email protected]>

commit 0dbeac1
Author: Alexander Indenbaum [email protected]

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi according to the "core_mask" error it seems that the code used didn't include the latest changes to SPDK to add "core_mask". To be on the safe side try running:

  • make clean
  • make setup

And then build from scratch using the command line I'd given you

@sarwanjassi
Copy link
Author

make demo failed at

[root@dcmir25ds21-01 ceph-nvmeof]# make demo
docker-compose exec ceph bash -c "rbd -p rbd info demo_image || rbd -p rbd create demo_image --size 10M"
2023-11-03T10:08:02.149+0000 7f5027a87c00 -1 WARNING: all dangerous and experimental features are enabled.
2023-11-03T10:08:02.149+0000 7f5027a87c00 -1 WARNING: all dangerous and experimental features are enabled.
2023-11-03T10:08:02.161+0000 7f5027a87c00 -1 WARNING: all dangerous and experimental features are enabled.
rbd: error opening image demo_image: (2) No such file or directory
2023-11-03T10:08:02.256+0000 7ffb5cad7c00 -1 WARNING: all dangerous and experimental features are enabled.
2023-11-03T10:08:02.256+0000 7ffb5cad7c00 -1 WARNING: all dangerous and experimental features are enabled.
2023-11-03T10:08:02.268+0000 7ffb5cad7c00 -1 WARNING: all dangerous and experimental features are enabled.
docker-compose run --rm nvmeof-cli --server-address 192.168.13.3 --server-port 5500 create_bdev --pool rbd --image demo_image --bdev demo_bdev
Creating ceph-nvmeof_nvmeof-cli_run ... done
INFO:main:Created bdev demo_bdev: True
docker-compose run --rm nvmeof-cli --server-address 2001:db8::3 --server-port 5500 create_bdev --pool rbd --image demo_image --bdev demo_bdev_ipv6
Creating ceph-nvmeof_nvmeof-cli_run ... done
INFO:main:Created bdev demo_bdev_ipv6: True
docker-compose run --rm nvmeof-cli --server-address 192.168.13.3 --server-port 5500 create_subsystem --subnqn "nqn.2016-06.io.spdk:cnode1"
Creating ceph-nvmeof_nvmeof-cli_run ... done
INFO:main:Created subsystem nqn.2016-06.io.spdk:cnode1: True
docker-compose run --rm nvmeof-cli --server-address 192.168.13.3 --server-port 5500 add_namespace --subnqn "nqn.2016-06.io.spdk:cnode1" --bdev demo_bdev
Creating ceph-nvmeof_nvmeof-cli_run ... done
INFO:main:Added namespace 1 to nqn.2016-06.io.spdk:cnode1, ANA group id None : True
docker-compose run --rm nvmeof-cli --server-address 192.168.13.3 --server-port 5500 add_namespace --subnqn "nqn.2016-06.io.spdk:cnode1" --bdev demo_bdev_ipv6
Creating ceph-nvmeof_nvmeof-cli_run ... done
INFO:main:Added namespace 2 to nqn.2016-06.io.spdk:cnode1, ANA group id None : True
docker-compose run --rm nvmeof-cli --server-address 192.168.13.3 --server-port 5500 create_listener --subnqn "nqn.2016-06.io.spdk:cnode1" --gateway-name --traddr 192.168.13.3 --trsvcid 4420
Creating ceph-nvmeof_nvmeof-cli_run ... done
usage: python3 -m control.cli create_listener [-h] -n SUBNQN -g GATEWAY_NAME [-t TRTYPE] [-f ADRFAM] -a TRADDR -s TRSVCID
python3 -m control.cli create_listener: error: argument -g/--gateway-name: expected one argument
ERROR: 2
make: *** [mk/demo.mk:17: demo] Error 2

container detail

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
20f34762c9cf IP: 192.168.13.3 quay.io/ceph/nvmeof:0.0.5 "python3 -m control …" 59 seconds ago Up 58 seconds 0.0.0.0:32800->4420/tcp, :::32800->4420/tcp, 0.0.0.0:32799->5500/tcp, :::32799->5500/tcp, 0.0.0.0:32798->8009/tcp, :::32798->8009/tcp ceph-nvmeof_nvmeof_1

d85a5c307d3d IP: 192.168.13.2 quay.io/ceph/vstart-cluster:18.2.0 "sh -c './vstart.sh …" About a minute ago Up About a minute (healthy)

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi , as I said, the problem with "make demo" was fixed in a later version. So until you get the fixed version, just run the failing commands manually:

docker-compose run --rm nvmeof-cli --server-address 192.168.13.3 --server-port 5500 create_listener --subnqn "nqn.2016-06.io.spdk:cnode1" --gateway-name  20f34762c9cf--traddr 192.168.13.3 --trsvcid 4420
docker-compose  run --rm nvmeof-cli --server-address 192.168.13.3 --server-port 5500 add_host --subnqn "nqn.2016-06.io.spdk:cnode1" --host "*"

@sarwanjassi
Copy link
Author

ALl done and big thanks to you @gbregman

next error is

]# nvme discover -t tcp -a 192.168.13.3 -s 4420
failed to add controller, error Unknown error -1

@sarwanjassi
Copy link
Author

58033.032159] br-df10d3ad8bb7: port 3(veth28a082f) entered disabled state
[58054.353502] nvme_fabrics: found same hostid 5bc24f86-febe-41f6-a178-59fe82e55bc4 but different hostnqn nqn.2014-08.org.nvmexpress:uuid:00000000-0000-0000-0000-ac1f6b586878
[58174.619811] nvme_fabrics: found same hostid 5bc24f86-febe-41f6-a178-59fe82e55bc4 but different hostnqn nqn.2014-08.org.nvmexpress:uuid:00000000-0000-0000-0000-ac1f6b586878
[58280.044435] nvme_fabrics: found same hostid 5bc24f86-febe-41f6-a178-59fe82e55bc4 but different hostnqn nqn.2014-08.org.nvmexpress:uuid:00000000-0000-0000-0000-ac1f6b586878
ceph-nvmeof]# cat /etc/nvme/hostnqn
nqn.2014-08.org.nvmexpress:uuid:00000000-0000-0000-0000-ac1f6b586878

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi can you send the output of:

grep discovery_controller ceph-nvmeof.conf

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi please try:

nvme discover -t tcp -a 192.168.13.3 -s 8009

@sarwanjassi
Copy link
Author

1 ceph-nvmeof]# grep discovery_controller ceph-nvmeof.conf
enable_spdk_discovery_controller = False

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi did you try using port 8009?

@sarwanjassi
Copy link
Author

[root@-01 ceph-nvmeof]# nvme discover -t tcp -a 192.168.13.3 -s 8009
failed to add controller, error Unknown error -1
[root@-01 ceph-nvmeof]#

Dmesg: error
[61250.362255] nvme_fabrics: found same hostid 5bc24f86-febe-41f6-a178-59fe82e55bc4 but different hostnqn nqn.2014-08.org.nvmexpress:uuid:00000000-0000-0000-0000-ac1f6b586878

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@leonidc can you have a look?

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi please send us the output of:

cat /etc/nvme/hostnqn
cat /etc/nvme/hostid
nvme gen-hostnqn

@sarwanjassi
Copy link
Author

[-01 ~]$ nvme gen-hostnqn
nqn.2014-08.org.nvmexpress:uuid:ce4d3d84-6109-416f-99f3-a2b299bfc176
[@-01 ~]$ cat /etc/nvme/hostid
cat: /etc/nvme/hostid: No such file or directory
[@-01 ~]$ cat /etc/nvme/hostnqn
nqn.2014-08.org.nvmexpress:uuid:00000000-0000-0000-0000-ac1f6b586878

@sarwanjassi
Copy link
Author

@gbregman I was able to fix it

[root@-02 ~]# nvme list
Node Generic SN Model Namespace Usage Format FW Rev


/dev/nvme7n1 /dev/ng7n1 SPDK00000000000001 SPDK bdev Controller 1 10.49 MB / 10.49 MB 512 B + 0

The 10MB block created is visible.

Next question is i need to make osd, pool . what is the best way to use cephadm and setup cluster and share the pool here.

@gbregman
Copy link
Contributor

gbregman commented Nov 3, 2023

@sarwanjassi I'm afraid I don't know much about OSD. Are we done with this issue? If you got all the issues fixed can you close it?

@caroav
Copy link
Collaborator

caroav commented Nov 5, 2023

@sarwanjassi the source problem of this issue is seems to be resolved. If there is any other issue, please open a new issue. Closing thx.

@caroav caroav closed this as completed Nov 5, 2023
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in NVMe-oF Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants