Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.0.5] [nvmeof restart] Failed to operate write op for oid nvmeof.state during service restart #290

Closed
rahullepakshi opened this issue Oct 25, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@rahullepakshi
Copy link
Contributor

Tracks https://tracker.ceph.com/issues/63317

Main issue is below

python3 m control.cli: error: create_subsystem failed: code=StatusCode.UNKNOWN message=Exception calling application: [errno 125] Failed to operate write op for oid nvmeof.state -- during nvmeof service restart

Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: INFO:control.grpc:Received request to create subsystem nqn.2016-06.io.spdk:cnode2, ana reporting: False  
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: INFO:control.grpc:create_subsystem nqn.2016-06.io.spdk:cnode2: True
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: ERROR:control.state:Unable to add key to omap: [errno 125] Failed to operate write op for oid nvmeof.state. Exiting!
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: ERROR:control.grpc:Error persisting create_subsystem nqn.2016-06.io.spdk:cnode2: [errno 125] Failed to operate write op for oid nvmeof.state
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: ERROR:grpc._server:Exception calling application: [errno 125] Failed to operate write op for oid nvmeof.state
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: Traceback (most recent call last):
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:   File "/src/__pypackages__/3.9/lib/grpc/_server.py", line 494, in _call_behavior
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:     response_or_iterator = behavior(argument, context)
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:   File "/src/control/grpc.py", line 301, in create_subsystem
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:     return self.create_subsystem_safe(request, context)
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:   File "/src/control/grpc.py", line 290, in create_subsystem_safe
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:     self.gateway_state.add_subsystem(request.subsystem_nqn,
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:   File "/src/control/state.py", line 367, in add_subsystem
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:     self.omap.add_subsystem(subsystem_nqn, val)
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:   File "/src/control/state.py", line 70, in add_subsystem
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:     self._add_key(key, val)
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:   File "/src/control/state.py", line 250, in _add_key
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:     self.ioctx.operate_write_op(write_op, self.omap_name)
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]:   File "rados.pyx", line 3654, in rados.Ioctx.operate_write_op
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: rados.OSError: [errno 125] Failed to operate write op for oid nvmeof.state

#258 prevents corruption on any grpc calls and I am not seeing any corruption but during this phase of restart if any addition of GW entity, in my case it is create_subsystem was issued multiple times upon command failure and finally it got acknowledged and a subsystem_cnode2 was created but the same was not updated in omap.

User tries to create subsystem during nvmeof service restart, subsequently it succeeds

[root@ceph-nvmf3-ryxj6t-node5 cephuser]# podman run quay.io/ceph/nvmeof-cli:v0.0.5 --server-address 10.0.209.152 --server-port 5500 create_subsystem  --subnqn nqn.2016-06.io.spdk:cnode2 --serial 2 --max-namespaces 500
usage: python3 -m control.cli [-h] [--server-address SERVER_ADDRESS]
                              [--server-port SERVER_PORT]
                              [--client-key CLIENT_KEY]
                              [--client-cert CLIENT_CERT]
                              [--server-cert SERVER_CERT]
                              {create_bdev,delete_bdev,create_subsystem,delete_subsystem,add_namespace,remove_namespace,add_host,remove_host,create_listener,delete_listener,get_subsystems}
                              ...
python3 -m control.cli: error: create_subsystem failed: code=StatusCode.UNKNOWN message=Exception calling application: [errno 125] Failed to operate write op for oid nvmeof.state
[root@ceph-nvmf3-ryxj6t-node5 cephuser]# podman run quay.io/ceph/nvmeof-cli:v0.0.5 --server-address 10.0.209.152 --server-port 5500 create_subsystem  --subnqn nqn.2016-06.io.spdk:cnode2 --serial 2 --max-namespaces 500
usage: python3 -m control.cli [-h] [--server-address SERVER_ADDRESS]
                              [--server-port SERVER_PORT]
                              [--client-key CLIENT_KEY]
                              [--client-cert CLIENT_CERT]
                              [--server-cert SERVER_CERT]
                              {create_bdev,delete_bdev,create_subsystem,delete_subsystem,add_namespace,remove_namespace,add_host,remove_host,create_listener,delete_listener,get_subsystems}
                              ...
python3 -m control.cli: error: create_subsystem failed: code=StatusCode.INTERNAL message=request:
{
  "nqn": "nqn.2016-06.io.spdk:cnode2",
  "serial_number": "2",
  "model_number": "SPDK bdev Controller",
  "max_namespaces": 500,
  "min_cntlid": 1,
  "max_cntlid": 65519,
  "method": "nvmf_create_subsystem",
  "req_id": 189
}
Got JSON-RPC error response
response:
{
  "code": -32603,
  "message": "Unable to create subsystem nqn.2016-06.io.spdk:cnode2" 
}

Somewhere in between 1st and 2nd request to create subsystem, subsystem nqn.2016-06.io.spdk:cnode2 is created as journalctl says -it already exists - check for "already exists" at http://magna002.ceph.redhat.com/cephci-jenkins/nvmeof_5_upstream.log

get_subsystems output -

[root@ceph-nvmf3-ryxj6t-node5 cephuser]# podman run quay.io/ceph/nvmeof-cli:v0.0.5 --server-address 10.0.209.152 --server-port 5500 get_subsystems
INFO:__main__:Get subsystems:
[
    {
        "nqn": "nqn.2016-06.io.spdk:cnode2",
        "subtype": "NVMe",
        "listen_addresses": [],
        "allow_any_host": false,
        "hosts": [],
        "serial_number": "2",
        "model_number": "SPDK bdev Controller",
        "max_namespaces": 500,
        "min_cntlid": 1,
        "max_cntlid": 65519,
        "namespaces": []
    },
    {
        "nqn": "nqn.2016-06.io.spdk:cnode1",
        "subtype": "NVMe",
        "listen_addresses": [
            {
                "transport": "TCP",
                "trtype": "TCP",
                "adrfam": "IPv4",
                "traddr": "10.0.209.152",
                "trsvcid": "5001" 
            }
        ],
        "allow_any_host": true,
        "hosts": [],
        "serial_number": "1",
        "model_number": "SPDK bdev Controller",
        "max_namespaces": 500,
        "min_cntlid": 1,
        "max_cntlid": 65519,
        "namespaces": [
            {
                "nsid": 1,
                "bdev_name": "UWI1-bdev1",
                "name": "UWI1-bdev1",
                "nguid": "65228668C41842058561AFBBC15C9C96",
                "uuid": "65228668-c418-4205-8561-afbbc15c9c96" 
            },
            {
                "nsid": 2,
                "bdev_name": "UWI1-bdev2",
                "name": "UWI1-bdev2",
                "nguid": "BBF19AD5F17240F9AA7A42B33376D35A",
                "uuid": "bbf19ad5-f172-40f9-aa7a-42b33376d35a" 

But OMAP does not has its entry - http://magna002.ceph.redhat.com/cephci-jenkins/nvmeof_5_restart_omap.log

namespace_nqn.2016-06.io.spdk:cnode1_98
value (95 bytes) :
00000000  7b 0a 20 20 22 73 75 62  73 79 73 74 65 6d 5f 6e  |{.  "subsystem_n|
00000010  71 6e 22 3a 20 22 6e 71  6e 2e 32 30 31 36 2d 30  |qn": "nqn.2016-0|
00000020  36 2e 69 6f 2e 73 70 64  6b 3a 63 6e 6f 64 65 31  |6.io.spdk:cnode1|
00000030  22 2c 0a 20 20 22 62 64  65 76 5f 6e 61 6d 65 22  |",.  "bdev_name"|
00000040  3a 20 22 55 57 49 31 2d  62 64 65 76 39 38 22 2c  |: "UWI1-bdev98",|
00000050  0a 20 20 22 6e 73 69 64  22 3a 20 39 38 0a 7d     |.  "nsid": 98.}|
0000005f

namespace_nqn.2016-06.io.spdk:cnode1_99
value (95 bytes) :
00000000  7b 0a 20 20 22 73 75 62  73 79 73 74 65 6d 5f 6e  |{.  "subsystem_n|
00000010  71 6e 22 3a 20 22 6e 71  6e 2e 32 30 31 36 2d 30  |qn": "nqn.2016-0|
00000020  36 2e 69 6f 2e 73 70 64  6b 3a 63 6e 6f 64 65 31  |6.io.spdk:cnode1|
00000030  22 2c 0a 20 20 22 62 64  65 76 5f 6e 61 6d 65 22  |",.  "bdev_name"|
00000040  3a 20 22 55 57 49 31 2d  62 64 65 76 39 39 22 2c  |: "UWI1-bdev99",|
00000050  0a 20 20 22 6e 73 69 64  22 3a 20 39 39 0a 7d     |.  "nsid": 99.}|
0000005f

omap_version
value (3 bytes) :
00000000  35 31 34                                          |514|
00000003

subsystem_nqn.2016-06.io.spdk:cnode1
value (100 bytes) :
00000000  7b 0a 20 20 22 73 75 62  73 79 73 74 65 6d 5f 6e  |{.  "subsystem_n|
00000010  71 6e 22 3a 20 22 6e 71  6e 2e 32 30 31 36 2d 30  |qn": "nqn.2016-0|
00000020  36 2e 69 6f 2e 73 70 64  6b 3a 63 6e 6f 64 65 31  |6.io.spdk:cnode1|
00000030  22 2c 0a 20 20 22 73 65  72 69 61 6c 5f 6e 75 6d  |",.  "serial_num|
00000040  62 65 72 22 3a 20 22 31  22 2c 0a 20 20 22 6d 61  |ber": "1",.  "ma|
00000050  78 5f 6e 61 6d 65 73 70  61 63 65 73 22 3a 20 35  |x_namespaces": 5|
00000060  30 30 0a 7d                                       |00.}|
00000064

Further on restarting the GW, subsystem nqn.2016-06.io.spdk:cnode2 is no more displayed which was earlier displayed
get_subsystems output,

[root@ceph-nvmf3-ryxj6t-node5 cephuser]# podman run quay.io/ceph/nvmeof-cli:v0.0.5 --server-address 10.0.209.152 --server-port 5500 get_subsystems
INFO:__main__:Get subsystems:
[
    {
        "nqn": "nqn.2016-06.io.spdk:cnode1",
        "subtype": "NVMe",
        "listen_addresses": [
            {
                "transport": "TCP",
                "trtype": "TCP",
                "adrfam": "IPv4",
                "traddr": "10.0.209.152",
                "trsvcid": "5001" 
            }
        ],
        "allow_any_host": true,
        "hosts": [],
        "serial_number": "1",
        "model_number": "SPDK bdev Controller",
        "max_namespaces": 500,
        "min_cntlid": 1,
        "max_cntlid": 65519,
        "namespaces": [
            {
                "nsid": 1,
                "bdev_name": "UWI1-bdev1",
                "name": "UWI1-bdev1",
                "nguid": "65228668C41842058561AFBBC15C9C96",
                "uuid": "65228668-c418-4205-8561-afbbc15c9c96" 
            },
            {
                "nsid": 2,
                "bdev_name": "UWI1-bdev2",
                "name": "UWI1-bdev2",
                "nguid": "BBF19AD5F17240F9AA7A42B33376D35A",
                "uuid": "bbf19ad5-f172-40f9-aa7a-42b33376d35a" 
            },
@rahullepakshi rahullepakshi added the bug Something isn't working label Oct 25, 2023
@github-project-automation github-project-automation bot moved this to 🆕 New in NVMe-oF Oct 25, 2023
@gbregman
Copy link
Contributor

gbregman commented Oct 25, 2023

The error comes from Ceph. From the file src/pybind/rados/rados.pyx:

    def operate_write_op(self,
                         write_op: WriteOp,
                         oid: str,
                         mtime: int = 0,
                         flags: int = LIBRADOS_OPERATION_NOFLAG):
        """
        execute the real write operation
        :para write_op: write operation object
        :para oid: object name
        :para mtime: the time to set the mtime to, 0 for the current time
        :para flags: flags to apply to the entire operation
        """

        oid_raw = cstr(oid, 'oid')
        cdef:
            WriteOp _write_op = write_op
            char *_oid = oid_raw
            time_t _mtime = mtime
            int _flags = flags

        with nogil:
            ret = rados_write_op_operate(_write_op.write_op, self.io, _oid, &_mtime, _flags)
        if ret != 0:
            raise make_ex(ret, "Failed to operate write op for oid %s" % oid)

The error we get is ECANCELED

@gbregman
Copy link
Contributor

Looking at the log we can see that indeed there was an error adding the subsystem to OMAP:

Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: INFO:control.grpc:Received request to create subsystem nqn.2016-06.io.spdk:cnode2, ana reporting: False
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: INFO:control.grpc:create_subsystem nqn.2016-06.io.spdk:cnode2: True
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: ERROR:control.state:Unable to add key to omap: [errno 125] Failed to operate write op for oid nvmeof.state. Exiting!
Oct 25 06:41:40 ceph-nvmf3-ryxj6t-node5 ceph-7ffac0a2-730f-11ee-9dde-fa163e47fc24-nvmeof-rbd-ceph-nvmf3-ryxj6t-node5-ywspzj[118520]: ERROR:control.grpc:Error persisting create_subsystem nqn.2016-06.io.spdk:cnode2: [errno 125] Failed to operate write op for oid nvmeof.state

So, it was added to SPDK whence it is seen in get_subsystems but not to the OMAP file.

@rahullepakshi
Copy link
Contributor Author

@gbregman in this case, considering #258 what is the order of failure sequence?
Is it

  1. restart -> acquire_lock -> update_complete ->release_lock ->create_subsystem ? or
  2. restart -> acquire_lock but create_subsystem got allowed somehow -> release lock -> OMAP updation failure as lock was taken?

@gbregman
Copy link
Contributor

@rahullepakshi the GRPC lock is taken when calling the GRPC function. So, when create_subsystem() is called, we take the lock, perform the subsystem creation logic and release the lock. This logic is to first call the SPDK to create the subsystem and in case it was successful add the subsystem to the OMAP. So, on an update we won't hold the lock for the entire duration of the update. Each individual GRPC function called by the update gets the lock and then release it.

@rahullepakshi
Copy link
Contributor Author

Thanks @gbregman .
IMO, I feel this is one such scenario of write op - new entity addition among many but I am concerned with many such scenarios like modification/deletion of present entity and hence OMAP not getting updated would be a real major bug.

So, it would be best we do not allow any such operations until GW is "UP" and OMAP is intact and "healthy" to accept writes/ or any operations for that matter or acquire lock for both GRPC call + omap update so everything is intact? WDYT?

@caroav @baum ^^

@gbregman
Copy link
Contributor

@rahullepakshi we have a separate issue for that. We should deal with it soon. See issue #56

@gbregman
Copy link
Contributor

@rahullepakshi the code has changed completely since the creation of this issue. Could you see if it's still relevant? Either reproduce it with the current code or close it?

@rahullepakshi
Copy link
Contributor Author

Not seeing this issue anymore, closing it @gbregman

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in NVMe-oF Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants