Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate initiator exceptions on fresh systems and single NIC exceptions #41

Closed
msilcher opened this issue May 6, 2023 · 16 comments
Closed
Labels
bug Something isn't working next release This will be closed in the next release

Comments

@msilcher
Copy link

msilcher commented May 6, 2023

Hi there,

I gave truenas-csp a try but can't get a pod mount a volume. Pod description at start shows the following:

AttachVolume.Attach failed for volume "pvc-8a464e26-5d41-499a-a151-4d99f895a25c" : rpc error: code = Internal desc = Failed to add ACL to volume Data_K8s_pvc-8a464e26-5d41-499a-a151-4d99f895a25c for node &{ debian-k8s cc2cb9e8-24c5-16f2-766e-3c0059f3be1c [0xc0005a9e00] [0xc000803110 0xc000803120 0xc000803130 0xc000803140 0xc000803150 0xc000803160 0xc000803170 0xc000803180 0xc000803190 0xc0008031a0 0xc0008031b0 0xc0008031c0 0xc0008031d0] [] } via CSP, err: Request failed with status code 500 and errors Error code (Exception) and message (Traceback (most recent call last):
File "/app/truenascsp.py", line 151, in on_put
'initiator': initiator.get('id')
AttributeError: 'list' object has no attribute 'get'

Same message is seen on the trunas-cps provisioner pod:

Sat, 06 May 2023 17:27:05 +0000 backend INFO Volume found: Data_K8s_pvc-8a464e26-5d41-499a-a151-4d99f895a25c
Sat, 06 May 2023 17:27:06 +0000 backend INFO Volume found: Data_K8s_pvc-ad63e81c-52d0-48e7-ad82-54893b606451
Sat, 06 May 2023 17:27:06 +0000 backend INFO Host updated: cc2cb9e8-24c5-16f2-766e-3c0059f3be1c
Sat, 06 May 2023 17:27:06 +0000 backend INFO Host updated: cc2cb9e8-24c5-16f2-766e-3c0059f3be1c
Sat, 06 May 2023 17:27:06 +0000 backend ERROR Exception: Traceback (most recent call last):
File "/app/truenascsp.py", line 151, in on_put
'initiator': initiator.get('id')
AttributeError: 'list' object has no attribute 'get'.

I'm using Kubernetes 1.27.1 on a debain 11.7 VM. Using latest version of hpe-storage and truenas-csp manifests. This happens on both TruneNAS Core and TrueNAS Scale (always latest versions) via iSCSI (as the guide shows).

By the way: I see the provisioner is using APIv1, is there a way to force/set APIv2? Would it make any difference?

Thank you!

@datamattsson
Copy link
Collaborator

Thanks for reporting this. It appears this call returns more than one initiator. I found one occurrence this could happen and fixed that, but this is yet another corner case.

If you don't mind, would you want to test this image: quay.io/datamattsson/truenas-csp:v2.3.0-initfix?

@datamattsson
Copy link
Collaborator

Looking at the logs there I'm assuming that you're provisioning a workload with two PVCs attached on a completely fresh system?

Another workaround is to delete one of the duplicate initiators that got created in the publishing process.

@datamattsson datamattsson added bug Something isn't working next release This will be closed in the next release labels May 6, 2023
@msilcher
Copy link
Author

msilcher commented May 6, 2023

Looking at the logs there I'm assuming that you're provisioning a workload with two PVCs attached on a completely fresh system?

Another workaround is to delete one of the duplicate initiators that got created in the publishing process.

That's correct! I just tested the storage provisioner with a pihole instance that requests 2 PVCs for the same pod.

@msilcher
Copy link
Author

msilcher commented May 6, 2023

Thanks for reporting this. It appears this call returns more than one initiator. I found one occurrence this could happen and fixed that, but this is yet another corner case.

If you don't mind, would you want to test this image: quay.io/datamattsson/truenas-csp:v2.3.0-initfix?

Sure, I'll test it and give you a feedback!

@datamattsson
Copy link
Collaborator

That's correct! I just tested the storage provisioner with a pihole instance that requests 2 PVCs for the same pod.

Makes sense. If the initiator doesn't exist on the backend (TrueNAS) and two or more requests comes in at the same time, the same initiator gets created twice and will cause problems in the staging phase.

@msilcher
Copy link
Author

msilcher commented May 6, 2023

Tested with new image you mentioned but still fails. I see again 2 identical initiators created:
image

This time the provisioner complains about something else:

Sat, 06 May 2023 20:59:18 +0000 backend INFO Volume found: Data_K8s_pvc-ea371f8c-1175-408c-a581-ee9ca8a27215
Sat, 06 May 2023 20:59:18 +0000 backend INFO Volume found: Data_K8s_pvc-97f910f6-454d-43d4-9b47-1cdaae1233ad
Sat, 06 May 2023 20:59:18 +0000 backend INFO Host updated: cc2cb9e8-24c5-16f2-766e-3c0059f3be1c
Sat, 06 May 2023 20:59:18 +0000 backend INFO Host updated: cc2cb9e8-24c5-16f2-766e-3c0059f3be1c
Sat, 06 May 2023 20:59:18 +0000 backend ERROR Exception: Traceback (most recent call last):
File "/app/truenascsp.py", line 167, in on_put
req_backend['auth_networks'] = api.ipaddrs_to_networks(discovery_ips)
File "/app/backend.py", line 120, in ipaddrs_to_networks
for alias in interface['aliases']:
TypeError: string indices must be integers

@datamattsson
Copy link
Collaborator

Tested with new image you mentioned but still fails. I see again 2 identical initiators created:

The duplicate initiator being created is a race I don't think I can mitigate. Living with it is what the patched image fixed.

This time the provisioner complains about something else:

Oh, I think I know what this is. Either you have the hpe-csi portal misconfigured or just one IP address assigned to it?

@msilcher
Copy link
Author

msilcher commented May 6, 2023

Tested with new image you mentioned but still fails. I see again 2 identical initiators created:

The duplicate initiator being created is a race I don't think I can mitigate. Living with it is what the patched image fixed.

This time the provisioner complains about something else:

Oh, I think I know what this is. Either you have the hpe-csi portal misconfigured or just one IP address assigned to it?

It is a homlab, only 1 IP address is assigned for TrueNAS/iSCSI portal:

image

I was not aware that there must be more than 1 IP available. It would make sense in a PROD env though. Is there a workaround for this?

P.S: I could add a second IP to TrueNAS I guess

@msilcher
Copy link
Author

msilcher commented May 6, 2023

Tested with new image you mentioned but still fails. I see again 2 identical initiators created:

The duplicate initiator being created is a race I don't think I can mitigate. Living with it is what the patched image fixed.

This time the provisioner complains about something else:

Oh, I think I know what this is. Either you have the hpe-csi portal misconfigured or just one IP address assigned to it?

It is a homlab, only 1 IP address is assigned for TrueNAS/iSCSI portal:

image

I was not aware that there must be more than 1 IP available. It would make sense in a PROD env though. Is there a workaround for this?

P.S: I could add a second IP to TrueNAS I guess

Added a second IP to the portal but issue persists:

image

Is there somethins else I need to do on the provisioner side?

@datamattsson
Copy link
Collaborator

Added a second IP to the portal but issue persists:

I'm trying to reproduce this on my end. Do you only have one NIC on this system? (configured or not)

@datamattsson
Copy link
Collaborator

I'm trying to reproduce this on my end.

I got it broken now. It's the single NIC that is causing issues. I'll have a new image shortly.

@msilcher
Copy link
Author

msilcher commented May 6, 2023

Added a second IP to the portal but issue persists:

I'm trying to reproduce this on my end. Do you only have one NIC on this system? (configured or not)

Yes, only one NIC at the moment. I added a second IP to the same NIC but it's still not working. Everything is running in a virtual environment, I could add a second NIC and split IPs (one per NIC) for testing purposes, but to be honest this exceeds the idea of having a simple homelab setup :)

@datamattsson
Copy link
Collaborator

Ok, this image quay.io/datamattsson/truenas-csp:v2.3.0-initfix-sif should work. You can remove your additional IP address, it's not needed.

@msilcher
Copy link
Author

msilcher commented May 6, 2023

quay.io/datamattsson/truenas-csp:v2.3.0-initfix-sif

Yes sir!!! Pod mounted both volumes without issues. Provisioner logs are clean:

Sat, 06 May 2023 22:25:52 +0000 backend ERROR Not found: Volume with name pvc-444aa725-5c30-4984-a051-79249897721f not found.
Sat, 06 May 2023 22:25:52 +0000 backend ERROR Not found: Volume with name pvc-367e95d4-3dc0-4276-bee7-0c9bb1048ce3 not found.
Sat, 06 May 2023 22:25:53 +0000 backend INFO Volume created: pvc-367e95d4-3dc0-4276-bee7-0c9bb1048ce3
Sat, 06 May 2023 22:25:53 +0000 backend INFO Volume created: pvc-444aa725-5c30-4984-a051-79249897721f
Sat, 06 May 2023 22:26:33 +0000 backend INFO Volume found: Data_K8s_pvc-444aa725-5c30-4984-a051-79249897721f
Sat, 06 May 2023 22:26:33 +0000 backend INFO Volume found: Data_K8s_pvc-367e95d4-3dc0-4276-bee7-0c9bb1048ce3
Sat, 06 May 2023 22:26:33 +0000 backend INFO Host updated: cc2cb9e8-24c5-16f2-766e-3c0059f3be1c
Sat, 06 May 2023 22:26:33 +0000 backend INFO Host updated: cc2cb9e8-24c5-16f2-766e-3c0059f3be1c
Sat, 06 May 2023 22:26:34 +0000 backend INFO Volume published: Data_K8s_pvc-444aa725-5c30-4984-a051-79249897721f
Sat, 06 May 2023 22:26:34 +0000 backend INFO Volume published: Data_K8s_pvc-367e95d4-3dc0-4276-bee7-0c9bb1048ce3
Sat, 06 May 2023 22:26:42 +0000 backend INFO Token created (not logged)
Sat, 06 May 2023 22:26:42 +0000 backend INFO Volume found: Data_K8s_pvc-444aa725-5c30-4984-a051-79249897721f
Sat, 06 May 2023 22:26:44 +0000 backend INFO Volume found: Data_K8s_pvc-367e95d4-3dc0-4276-bee7-0c9bb1048ce3

Thanks a lot for your quick support!!!

@datamattsson
Copy link
Collaborator

Thanks a lot for your quick support!!!

You're most welcome and thank you for working with me on this! These fixes will be part of the next release.

@datamattsson datamattsson changed the title Failed to add ACL to volume Duplicate initiator exceptions on fresh systems and single NIC exceptions May 6, 2023
datamattsson added a commit to datamattsson/truenas-csp that referenced this issue May 8, 2023
Signed-off-by: Michael Mattsson <[email protected]>
datamattsson added a commit that referenced this issue May 8, 2023
Signed-off-by: Michael Mattsson <[email protected]>
@datamattsson
Copy link
Collaborator

Fixed in #42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working next release This will be closed in the next release
Projects
None yet
Development

No branches or pull requests

2 participants