Ceph RBD primary storage fails connection and renders node unusable #2611

giorgiomassar8 · 2018-04-27T15:54:04Z

ISSUE TYPE

Bug Report

COMPONENT NAME

Cloudstack agent

CLOUDSTACK VERSION

4.11

CONFIGURATION

KVM cluster with CEPH backed RDB primary storage

OS / ENVIRONMENT

Ubuntu 16.04 / 14.04

SUMMARY

On a perfectly working 4.10 node with KVM hypervisor and Ceph RBD primary storage, after upgrading to 4.11, cloudstack agent is unable to connect the BRD pool in libvirt, giving just a generic "operation not supported" error in its logs:

2018-04-06 16:27:37,650 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-04-06 16:27:37,652 WARN [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Storage pool be80af6a-7201-3410-8da4-9b3b58c4954f was not found running in libvirt. Need to create it.
2018-04-06 16:27:37,653 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Didn't find an existing storage pool be80af6a-7201-3410-8da4-9b3b58c4954f by UUID, checking for pools with duplicate paths
2018-04-06 16:27:37,664 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported
2018-04-06 16:27:42,762 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid Lost connection to the server. Dealing with the remaining commands...

Exactly the same pool was previously working before upgrade:

2018-04-06 12:53:52,847 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-04-06 12:53:52,850 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.
2018-04-06 12:53:52,850 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt
2018-04-06 12:53:53,171 INFO [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:14dace5e) Proccess agent ready command, agent id = 46

STEPS TO REPRODUCE

Take an existing and working cloudstack cluster with 4.10, with RDB primary storage and Ubuntu 14.04 based agents and upgrade them to version 4.11 of the agent.

EXPECTED RESULTS

The cluster should be working fine, the agents should be connecting and the RDB pool should be correctly opened in libvirt.

ACTUAL RESULTS

Cloudstack agents fails to boot with a generic "Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported" error and loops in a failed state, rendering the machine unusable.

WORKAROUND

To workaround the issue I have tried to use the following XML config (dumped from another node where it is correctly running) and define the pool directly in libvirt, and it worked as expected:

be80af6a-7201-3410-8da4-9b3b58c4954f be80af6a-7201-3410-8da4-9b3b58c4954f cephstor1

virsh pool-define test.xml
Pool be80af6a-7201-3410-8da4-9b3b58c4954f defined from test.xml

root@compute6:~# virsh pool-start be80af6a-7201-3410-8da4-9b3b58c4954f
Pool be80af6a-7201-3410-8da4-9b3b58c4954f started

root@compute6:~# virsh pool-info be80af6a-7201-3410-8da4-9b3b58c4954f
Name: be80af6a-7201-3410-8da4-9b3b58c4954f
UUID: be80af6a-7201-3410-8da4-9b3b58c4954f
State: running
Persistent: yes
Autostart: no
Capacity: 10.05 TiB
Allocation: 2.22 TiB
Available: 2.71 TiB

And now the cloudstack agent correctly starts:

2018-04-09 10:29:19,989 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-04-09 10:29:19,990 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.
2018-04-09 10:29:19,991 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:f0021131) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt
2018-04-09 10:29:20,372 INFO [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:f0021131) Proccess agent ready command, agent id = 56

rohityadavcloud · 2018-05-01T14:43:04Z

I've marked this as blocker applicable for Ceph users, we'll try to get this fixed. @giorgiomassar8 can you advise any documentation on how to setup a small ceph storagepool to test this against KVM?

wido · 2018-05-02T10:15:11Z

Can you increase the logging to DEBUG on the Agent and show the XML which the Agent generates to define the storage pool?

The error you are seeing comes from libvirt+librados underneath so I'd need to see the XML it generates.

giorgiomassar8 · 2018-05-02T11:48:49Z

I have set debugging on one of the agents and this is what I get:

2018-05-02 13:41:44,401 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:d06ebad5) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-05-02 13:41:44,401 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-4:null) (logid:d06ebad5) Looking for libvirtd connection at: qemu:///system
2018-05-02 13:41:44,403 WARN  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:d06ebad5) Storage pool be80af6a-7201-3410-8da4-9b3b58c4954f was not found running in libvirt. Need to create it.
2018-05-02 13:41:44,403 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:d06ebad5) Didn't find an existing storage pool be80af6a-7201-3410-8da4-9b3b58c4954f by UUID, checking for pools with duplicate paths
2018-05-02 13:41:44,404 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:d06ebad5) Checking path of existing pool 9fb695bd-a47f-45d3-9fe7-f029dff431d8 against pool we want to create
2018-05-02 13:41:44,408 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:d06ebad5) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f
2018-05-02 13:41:44,408 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:d06ebad5) <pool type='rbd'>
<name>be80af6a-7201-3410-8da4-9b3b58c4954f</name>
<uuid>be80af6a-7201-3410-8da4-9b3b58c4954f</uuid>
<source>
<host name='storagepool1' port='6789'/>
<name>cephstor1</name>
</source>
</pool>

2018-05-02 13:41:44,415 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:null) (logid:d06ebad5) Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported
2018-05-02 13:41:44,415 DEBUG [cloud.agent.Agent] (agentRequest-Handler-4:null) (logid:d06ebad5) Seq 1-6775102689425489922:  { Ans: , MgmtId: 29010072043598, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.UnsupportedAnswer":{"result":false,"details":"Unsupported command issued: com.cloud.agent.api.ModifyStoragePoolCommand.  Are you sure you got the right type of server?","wait":0}}] }
2018-05-02 13:41:44,451 DEBUG [kvm.resource.LibvirtComputingResource] (Agent-Handler-4:null) (logid:d06ebad5) Execution is successful.
2018-05-02 13:41:44,453 DEBUG [cloud.agent.Agent] (Agent-Handler-4:null) (logid:d06ebad5) Watch Sent: Seq 1-6775102689425489921:  { Ans: , MgmtId: 29010072043598, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":true,"details":"","wait":0}}] }
2018-05-02 13:41:44,464 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Location 1: Socket Socket[addr=/10.20.0.21,port=8250,localport=45374] closed on read.  Probably -1 returned: Connection closed with -1 on reading size.
2018-05-02 13:41:44,464 DEBUG [utils.nio.NioConnection] (Agent-NioConnectionHandler-1:null) (logid:) Closing socket Socket[addr=/10.20.0.21,port=8250,localport=45374]
2018-05-02 13:41:44,464 DEBUG [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Clearing watch list: 2
2018-05-02 13:41:46,728 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Connected to the server
2018-05-02 13:41:49,465 INFO  [cloud.agent.Agent] (Agent-Handler-1:null) (logid:) Lost connection to the server. Dealing with the remaining commands...

In the generated XML there isn't the secret part, perhaps the agent is trying to use another authentication mechanism which this version of libvirt does not support?

This is what I have been using as a workaround, and it works:

<pool type="rbd">
  <name>be80af6a-7201-3410-8da4-9b3b58c4954f</name>
  <uuid>be80af6a-7201-3410-8da4-9b3b58c4954f</uuid>
  <source>
    <name>cephstor1</name>
    <host name='storagepool1' port='6789'/>
    <auth username='admin' type='ceph'>
      <secret uuid='OMITTED'/>
    </auth>
  </source>
</pool>

wido · 2018-05-02T11:54:33Z

Ok, I see. The operation not supported is odd, but it seems to be missing the cephx part indeed.

At the moment I do not have a 4.11 environment to test this with, but I'll look at the code later.

fix issue where kvm / ceph cannot create volumes apache#2611

nathanejohnson · 2018-05-03T22:59:52Z

@wido The issue seems to be that for any KVM host, the disk format is always hard coded to QCOW2 here:

https://github.com/apache/cloudstack/blob/master/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/VolumeOrchestrator.java#L768

This gets sent along to the agent. I guess this is a reasonable default for KVM, but obviously for RBD it needs to be RAW, and it won't know that until it looks up the storage pool. In the short term, I think hard coding the disk format to RAW on the agent side for RBD is valid, it can't be anything but that, but it might also be good for the management server to send along the correct disk type.

nathanejohnson · 2018-05-03T23:17:30Z

On second thought, this might be a separate bug we were chasing. We were getting seeing this error in the logs:

2018-05-03 09:58:03,932 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:fb29bd87) Attempting to create volume f2d800a0-fc19-4b41-b886-57
5af6742425 (RBD) in pool e485c675-896a-303e-be82-c42357c6b4c1 with size 5368709120
2018-05-03 09:58:03,932 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:fb29bd87) <volume>
<name>f2d800a0-fc19-4b41-b886-575af6742425</name>
<capacity>5368709120</capacity>
<target>
<format type='qcow2'/>
<permissions><mode>0744</mode></permissions></target>
</volume>

2018-05-03 09:58:03,932 DEBUG [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-3:null) (logid:fb29bd87) Failed to create volume: 
com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: unsupported configuration: only RAW volumes are supported by this storage pool
at com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.createPhysicalDiskByLibVirt(LibvirtStorageAdaptor.java:705)
at com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.createPhysicalDisk(LibvirtStorageAdaptor.java:664)
at com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.createPhysicalDisk(LibvirtStoragePool.java:115)
at com.cloud.hypervisor.kvm.storage.KVMStorageProcessor.createVolume(KVMStorageProcessor.java:1323)
at com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:119)
at com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:58)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtStorageSubSystemCommandWrapper.execute(LibvirtStorageSubSystemCommandWrapper.java:36)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtStorageSubSystemCommandWrapper.execute(LibvirtStorageSubSystemCommandWrapper.java:30)
at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:75)
at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1438)
at com.cloud.agent.Agent.processRequest(Agent.java:563)
at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:976)
at com.cloud.utils.nio.Task.call(Task.java:83)
at com.cloud.utils.nio.Task.call(Task.java:29)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

@giorgiomassar8 were you seeing similar messages? We were receiving these messages on a freshly created pool when attempting to deploy a VM.

wido · 2018-05-04T07:27:56Z

@kiwiflyer Have you seen this one? The Agent is not providing the secret for the RBD storage pool to the libvirt XML definition.

GabrielBrascher · 2018-05-04T13:32:03Z

I will see if I can deploy a test environment to reproduce this issue meanwhile.

giorgiomassar8 · 2018-05-04T14:43:23Z

@nathanejohnson nope I don't see qcow2/raw mismatch errors here, I do believe the issue is related to the fact that for some reasons, no authentication information is put in the pool's XML sent to libvirt.

nathanejohnson · 2018-05-04T16:12:05Z

@giorgiomassar8 here are the logs from when the rbd pool was configured on our host:

2018-05-02 12:07:01,225 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:328be924) Request:Seq 1-8891231564336201737:  { Cmd , MgmtId: 14038005752684, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.CreateStoragePoolCommand":{"_createDatastore":false,"add":true,"pool":{"id":3,"uuid":"e485c675-896a-303e-be82-c42357c6b4c1","host":"192.168.100.100","path":"acs411testing","userInfo":"acs411testing:supersecretkeygoeshere==","port":6789,"type":"RBD"},"localPath":"/mnt//dcf0b890-984a-3c12-9190-050f3294a590","wait":0}}] }
2018-05-02 12:07:01,225 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:328be924) Processing command: com.cloud.agent.api.CreateStoragePoolCommand
2018-05-02 12:07:01,225 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:328be924) Seq 1-8891231564336201737:  { Ans: , MgmtId: 14038005752684, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":true,"details":"success","wait":0}}] }
2018-05-02 12:07:01,313 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:328be924) Request:Seq 1-8891231564336201738:  { Cmd , MgmtId: 14038005752684, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":true,"pool":{"id":3,"uuid":"e485c675-896a-303e-be82-c42357c6b4c1","host":"192.168.100.100","path":"acs411testing","userInfo":"acs411testing:supersecretkeygoeshere==","port":6789,"type":"RBD"},"localPath":"/mnt//dcf0b890-984a-3c12-9190-050f3294a590","wait":0}}] }
2018-05-02 12:07:01,313 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) (logid:328be924) Processing command: com.cloud.agent.api.ModifyStoragePoolCommand
2018-05-02 12:07:01,313 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) Attempting to create storage pool e485c675-896a-303e-be82-c42357c6b4c1 (RBD) in libvirt
2018-05-02 12:07:01,313 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:null) (logid:328be924) Looking for libvirtd connection at: qemu:///system
2018-05-02 12:07:01,319 WARN  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) Storage pool e485c675-896a-303e-be82-c42357c6b4c1 was not found running in libvirt. Need to create it.
2018-05-02 12:07:01,319 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) Didn't find an existing storage pool e485c675-896a-303e-be82-c42357c6b4c1 by UUID, checking for pools with duplicate paths
2018-05-02 12:07:01,319 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) Checking path of existing pool d358f5c2-f5fd-43a8-844e-cf01183de220 against pool we want to create
2018-05-02 12:07:01,322 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) Attempting to create storage pool e485c675-896a-303e-be82-c42357c6b4c1
2018-05-02 12:07:01,323 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) <secret ephemeral='no' private='no'>
<uuid>e485c675-896a-303e-be82-c42357c6b4c1</uuid>
<usage type='ceph'>
<name>[email protected]:6789/acs411testing</name>
</usage>
</secret>

2018-05-02 12:07:01,388 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) <pool type='rbd'>
<name>e485c675-896a-303e-be82-c42357c6b4c1</name>
<uuid>e485c675-896a-303e-be82-c42357c6b4c1</uuid>
<source>
<host name='192.168.100.100' port='6789'/>
<name>acs411testing</name>
<auth username='acs411testing' type='ceph'>
<secret uuid='e485c675-896a-303e-be82-c42357c6b4c1'/>
</auth>
</source>
</pool>

2018-05-02 12:07:01,434 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) Trying to fetch storage pool e485c675-896a-303e-be82-c42357c6b4c1 from libvirt
2018-05-02 12:07:01,434 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-5:null) (logid:328be924) Looking for libvirtd connection at: qemu:///system
2018-05-02 12:07:01,439 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) (logid:328be924) Succesfully refreshed pool e485c675-896a-303e-be82-c42357c6b4c1 Capacity: 11413921890304 Used: 0 Available: 10496547282944

do you see the key go across before it attempts to create the libvirt storage pool?

kiwiflyer · 2018-11-16T20:07:20Z

So working with a user, it does appear that the original issue outlined by @giorgiomassar8 is different from the one we found. The user in question is running Ubuntu 16.04 and no authentication is being passed to libvirt.

This particular installation is a 4.9.3 -> 4.11.1 upgrade.

kiwiflyer · 2018-11-16T20:26:25Z

wido · 2018-11-18T08:35:00Z

Are there by any chance slashes (/) in the secret of this RBD pool?

kiwiflyer · 2018-11-18T12:50:47Z

Funny you should mention that. There is a single slash and I did notice a very old open bug report on jira for that issue. This user had upgraded a prod system, so I had a very limited window to take a look.

…

On Sun, Nov 18, 2018, 2:35 AM Wido den Hollander ***@***.***> wrote: Are there by any chance slashes (/) in the secret of this RBD pool? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#2611 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQek8gqrA6MTISjaYPX6UambYNX8trOZks5uwRu7gaJpZM4Tqnd3> .

wido · 2018-11-18T18:54:01Z

Could you check that for me? As I have seen issues with that in the secret information and URL parsing inside Java.

Otherwise I wouldn't know where this is coming from.

kiwiflyer · 2018-11-19T17:44:26Z

The pool secret apparently does not have any slashes. The encrypted version in the DB does, but obviously that shouldn't matter. I'm trying to work back through the code to see what might have changed between 4.9.x and 4.11.x.
The secret is set correctly in the 4.9.3 release.

kiwiflyer · 2018-11-21T16:59:03Z

ok...well it's definitely not decrypting the userInfo before it gets sent to the agent in the upgrade case above.
He's our 4.11 lab with debug enabled.
I restarted libvirtd to remove the storage pool.
I then restarted the cloudstack-agent. the agent then tries to add the storage pool back -
2018-11-21 10:54:18,733 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) (logid:3144d09f) Request:Seq 1-5622462659795353602: { Cmd , MgmtId: 14038005752684, via: 1, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.ModifyStoragePoolCommand":{"add":true,"pool":{"id":3,"uuid":"e485c675-896a-303e-be82-c42357c6b4c1","host":"192.168.100.100","path":"acs411testing","userInfo":"acs411testing:AQDk3ela0NMAMxAA-4BkLN20KPNiEm2aJ8WVAg==","port":6789,"type":"RBD"},"localPath":"/mnt//dcf0b890-984a-3c12-9190-050f3294a590","wait":0}}] }

kiwiflyer · 2018-11-21T17:39:13Z

In the Ubuntu case above, it looks like the encrypted db payload in the userInfo field, rather than the decrypted version with colon delimiter that is expected in the code here:

cloudstack/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/LibvirtStorageAdaptor.java

Line 245 in 4809fe7

String[] userInfoTemp = userInfo.split(":");

, so hence NULL gets returned and then libvirt throws an exception.

kiwiflyer · 2018-11-21T17:39:31Z

Note that my test lab is Centos 7.

kiwiflyer · 2018-11-21T17:45:16Z

@wido or @GabrielBrascher , do either of you guys have both an older 4.9/4.10 lab as well as a 4.11 lab on Ubuntu 16.04?

It would be interesting to test trying a manual decrypt on the user_info field from the storage_pool data using a 4.9.x payload, but trying the decrypt using a 4.11 installation.

e.g. java -classpath /usr/share/cloudstack-common/lib/jasypt-1.9.2.jar org.jasypt.intf.cli.JasyptPBEStringDecryptionCLI decrypt.sh input=encrypted_payload_from_user_info password=encrypt_password verbose=true

wido · 2018-11-23T15:17:36Z

@kiwiflyer All our test and production systems are 4.11 at the moment running on Ubuntu 16.04

They were all upgraded from 4.9 to 4.10 and then to 4.11. In none of the cases we ran in to this.

rohityadavcloud · 2018-12-06T19:45:45Z

@wido @kiwiflyer is this still a bug wrt 4.11.2.0 or next 4.11.3.0/4.12.0.0 ?

wido · 2018-12-07T07:56:24Z

I have not been able to verify this bug, so I can't tell you.

kiwiflyer · 2018-12-10T13:20:40Z

I saw this problem in an ACS mailing list users system after attempting an upgrade from 4.9.3 -> 4.11.1. Unfortunately, the bug was in production, so I only had an extremely limited window to take a look before they restored their previous 4.9.3 version. At the present time, I'm not sure what caused it and I haven't been able to reproduce this in a lab environment.
This definitely wasn't a one off situation, as they attempting this upgrade multiple times with the same result.

kiwiflyer · 2019-02-22T22:53:02Z

So, @nathanejohnson and I just started some upgrade testing on some lab environments going from our 4.8 production branch to 4.12 and we may have just run into this same bug. We'll triage it and update everyone early next week.

nathanejohnson · 2019-02-23T04:32:38Z

@wido I can second what @kiwiflyer mentioned earlier: The reason pools aren't being created has to do with the user_info being passed along encrypted, the split doesn't find a semicolon so it skips the secret section of the pool xml. It looks like the only time it's sent along decrypted is when the pool is initially created..

Obviously I'm missing something, I can't find anything significant that's different between 4.8 and 4.12's handling of this logic-wise, but this is failing consistently now in our lab environment. Would love some ideas on troubleshooting

nathanejohnson · 2019-02-23T04:51:16Z

as a followup, I cannot reproduce this in our fresh installed lab, only the one we upgraded. The userInfo comes across decrypted there.

wido · 2019-02-25T10:07:13Z

This seems like a management server thing where it doesn't send the data properly. Because I can't think of a reason why this happens on the Agent/KVM side.

I honestly have no idea.

nathanejohnson · 2019-02-25T15:08:08Z

I narrowed down the issue to where at some point during the upgrade process the db.properties file had switched from using encryption to not, so the user_info field was stored encrypted in the db but then was being sent without being decrypted on the way to the agent. I see in the spec file that the db.properties file is listed as config(noreplace), but I think in some situations that will still end up overwritten. At one point during the upgrade, I completely removed the RPM's and had to downgrade, even though I left the /etc/cloudstack/management directory in place I think when I then reinstalled the RPMs it overwrote db.properties since the file wasn't marked as owned by a previous installation, if this makes sense.

At any rate, I suspect something similar might have occurred with the original bug. I'm going to try upgrading another cluster, paying attention to the db.properties file and see if this happens again.

GabrielBrascher · 2019-03-05T19:20:51Z

@nathanejohnson any update on this one?
I am not sure if this issue is a blocker for 4.12, considering that not all contributors were able to reproduce it (@wido was able to upgrade from 4.10 -> 4.11 -> 4.12).

GabrielBrascher · 2019-03-07T13:56:26Z

We are still unable to reproduce this issue. Thus, I am removing this as a Blocker for 4.12.

nathanejohnson · 2019-03-07T16:31:03Z

Sorry for the late reply, i have not had a chance to upgrade our other development cluster but I agree this shouldn't be a blocker.

rohityadavcloud · 2019-06-24T09:13:21Z

@nathanejohnson @giorgiomassar8 @GabrielBrascher @wido is this still an issue, should we close it or is there a PR coming?

rohityadavcloud · 2019-06-25T10:10:59Z

@nathanejohnson @giorgiomassar8 @GabrielBrascher @wido kindly discuss and advise a milestone in case a PR is coming soon. I've removed this issue from 4.13.0.0.

DaanHoogland · 2021-10-08T08:29:36Z

@giorgiomassar8 there's been no activity on this for two year. I'm closing the ticket, please reopen if you feel it still should be.

xuanyuanaosheng · 2021-12-03T09:55:57Z

Any date ? #5741

rohityadavcloud added the Severity:BLOCKER label May 1, 2018

rohityadavcloud added this to the 4.11.1.0 milestone May 1, 2018

rohityadavcloud assigned wido and GabrielBrascher May 2, 2018

nathanejohnson added a commit to myENA/cloudstack that referenced this issue May 3, 2018

fixes apache#2611

8705541

fix issue where kvm / ceph cannot create volumes apache#2611

nathanejohnson mentioned this issue May 3, 2018

fixes #2611 #2623

Merged

7 tasks

rohityadavcloud closed this as completed in a53dcd6 May 8, 2018

kiwiflyer reopened this Nov 16, 2018

rohityadavcloud added the component:ceph label Dec 6, 2018

kiwiflyer modified the milestones: 4.11.1.0, 4.12.0.0 Feb 22, 2019

GabrielBrascher removed the Severity:BLOCKER label Mar 7, 2019

GabrielBrascher modified the milestones: 4.12.0.0, 5.0.0.0 Mar 7, 2019

rohityadavcloud removed this from the 4.13.0.0 milestone Jun 25, 2019

DaanHoogland added this to the 5.0 milestone Oct 28, 2020

DaanHoogland added component:primary-storage type:bug labels Oct 28, 2020

DaanHoogland closed this as completed Oct 8, 2021

xuanyuanaosheng mentioned this issue Dec 8, 2021

Can not create ceph primary storage? #5741

Closed

Ceph RBD primary storage fails connection and renders node unusable #2611

Ceph RBD primary storage fails connection and renders node unusable #2611

Comments

giorgiomassar8 commented Apr 27, 2018

ISSUE TYPE

COMPONENT NAME

CLOUDSTACK VERSION

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS

WORKAROUND

rohityadavcloud commented May 1, 2018

wido commented May 2, 2018

giorgiomassar8 commented May 2, 2018

wido commented May 2, 2018

nathanejohnson commented May 3, 2018 • edited Loading

nathanejohnson commented May 3, 2018

wido commented May 4, 2018

GabrielBrascher commented May 4, 2018

giorgiomassar8 commented May 4, 2018

nathanejohnson commented May 4, 2018

kiwiflyer commented Nov 16, 2018

kiwiflyer commented Nov 16, 2018

wido commented Nov 18, 2018

kiwiflyer commented Nov 18, 2018 via email

wido commented Nov 18, 2018 • edited Loading

kiwiflyer commented Nov 19, 2018

kiwiflyer commented Nov 21, 2018 • edited Loading

kiwiflyer commented Nov 21, 2018

kiwiflyer commented Nov 21, 2018

kiwiflyer commented Nov 21, 2018

wido commented Nov 23, 2018

rohityadavcloud commented Dec 6, 2018

wido commented Dec 7, 2018

kiwiflyer commented Dec 10, 2018 • edited Loading

kiwiflyer commented Feb 22, 2019 • edited Loading

nathanejohnson commented Feb 23, 2019

nathanejohnson commented Feb 23, 2019

wido commented Feb 25, 2019

nathanejohnson commented Feb 25, 2019 • edited Loading

GabrielBrascher commented Mar 5, 2019

GabrielBrascher commented Mar 7, 2019

nathanejohnson commented Mar 7, 2019

rohityadavcloud commented Jun 24, 2019

rohityadavcloud commented Jun 25, 2019

DaanHoogland commented Oct 8, 2021

xuanyuanaosheng commented Dec 3, 2021

nathanejohnson commented May 3, 2018 •

edited

Loading

wido commented Nov 18, 2018 •

edited

Loading

kiwiflyer commented Nov 21, 2018 •

edited

Loading

kiwiflyer commented Dec 10, 2018 •

edited

Loading

kiwiflyer commented Feb 22, 2019 •

edited

Loading

nathanejohnson commented Feb 25, 2019 •

edited

Loading