recheck from db if service is reported as down. #45

kpawar-sap · 2022-08-12T08:09:09Z

In regions with high load & payload it become necessary to check once
against lastest 'created_at' or 'updated_at' from db, if earlier
fetched value report service as down.

only shrink share is expected to decrease max_files. This is flawed in multiple ways: - setting max_files_multiplier to < 1 may not work as intended in this case - netapp is doing some rounding, not every max inode number can be set Change-Id: I94215b212ceccdba151e64cb38db9f26f7fbc1d2

Change-Id: I80b030c39b5a328ad212674880ffcd4b4725aff6

we look at the host (NetApp cluster) instead of pool (NetApp node) Change-Id: I2d4b51aa78e9aa7fda800b99a6d9a6e6a42c55f2

can happen that share has been deleted in the meantime Change-Id: Id01618a490f6bf4e55304471d4dc3388582e8e2b

errors we have seen are temporary. Once they were in error state, they got stuck there, whereas a simple re-apply would have helped. So we go this way to re-apply and log the event. Change-Id: Id0a3d50bf0f82cb6d079988514ecff282c90de41

was added back then in 4b07b64 for easier error handling, but in the meantime we catch reexport aka ensure errors more centrally Change-Id: Id2a4547400bd630168268250e9ef8ee05cd93ae2

follows 01b9d7c We need to use the backend cluster client to check for existence of a vserver in a different backend. Change-Id: I8eb35e714a9a5c02f40b50fc226b27e05ed9c2f7

applies to access rules which are older than 7 days. I.e. in the first week it will be re-tried on every ensure run. Log known errors with lower level. Change-Id: I359bf99e092c35f0b2785a74ec7a9a5afbd181d5

according to https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry to fix "Temporary failure in name resolution" Retry is visible in log like: ``` WARNING urllib3.connectionpool [req-d92bd8d6-f05f-404c-8ded-087d29c9bf9f] Retrying (Retry(total=4, connect=4, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fcde1cfb3a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /servlets/netapp.servlets.admin.XMLrequest_filer WARNING urllib3.connectionpool [req-d92bd8d6-f05f-404c-8ded-087d29c9bf9f] Retrying (Retry(total=3, connect=3, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fcddbf223a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /servlets/netapp.servlets.admin.XMLrequest_filer WARNING urllib3.connectionpool [req-d92bd8d6-f05f-404c-8ded-087d29c9bf9f] Retrying (Retry(total=2, connect=2, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fcde0443340>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /servlets/netapp.servlets.admin.XMLrequest_filer WARNING urllib3.connectionpool [req-d92bd8d6-f05f-404c-8ded-087d29c9bf9f] Retrying (Retry(total=1, connect=1, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fcde0144a00>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /servlets/netapp.servlets.admin.XMLrequest_filer ``` Change-Id: Ic9ff8208f10df9dbed09717d6b218f6293d2338a

Change-Id: Ibc21b6c72d76a3a804f67e66e7604b3d0be4373f Closes-Bug: #1971710

The filter_properties operates with pools. A set intersection would limit affinity to a certain pool, but we want to allow a set of pools within the same backend cluster. E.g. 'same_host': [ 'HostA@BackendA#Pool1', 'HostA@BackendA#Pool2'] is fine in filter_propeties Change-Id: If96192f25f03e517a78fefa65db91e02d0ba20e9

is required to get it into volume comment Change-Id: I657c19a74eb561f441dc6749210bfa906687a002

…ities configure dns before everything else for security services, this is a prereq configure certs and use signed sessions for active directory fail unit test if certs would expire in less than 60 days Change-Id: Id50894f9dda06741d05949e41817ba340f17dd2c

When trying to compare two values that are non-numeric using the driver filter, the filter function will give an error. This is not desirable as it might be interesting to support comparatives with non-numeric values provided by the filter objects (share, host, etc). For example, the following formula failed before the fix: filter_function = '(share.project_id == "bb212f09317a4f4a8952ef3f729c2551")' Copied from cinder https://opendev.org/openstack/cinder/commit/87a7e80a2cbc4c8abcf4394242a02fcc5140e44b Change-Id: Icbfabb3bc0f608ebdd0784337db0921cc7763c53

fallback to setup cifs without ldaps and session signing after 3 failed attempts. ensure: don't re-apply cifs security settings to allow manual override Change-Id: I3a5341e2bd5c6343cff6fc50f05d855dc4f09312

'reserved_share_extend_percentage' backend config option allows Manila to consider different reservation percentage for share extend operation. With this option, under existing limit of 'reserved_share_percentage', we do not want user to create new share if limit is hit, but allow user to extend existing share. DocImpact Closes-Bug: #1961087 Change-Id: I000a7f530569ff80495b1df62a91981dc5865023 (cherry picked from commit 6431b86)

If the customer is configuring "Servers" as AD Server in the Security Service then the domain controller discovery mode should be changed to "none" and only these servers should be used.

NFS v4.0 was previously handled differently in NetApp driver. We want to explicitly disable v4.0 if it is not set in 'netapp_enabled_share_protocols'. Also enable v4.1 options like read/write delegation, pnfs, acls.

During share network create API, if failure occurs quota is not rolled back and its usable only after quota reservations timed out (waiting conf.reservation_expire seconds). Closes-bug: #1975483 Change-Id: I3de8f5bfa6ac4580da9b1012caa25657a6df71ec (cherry picked from commit 8c854a1)

For all projects enable logical space reporting - For neo disable dedupe and compression - For share replica, share from snapshot, share modify(extend/shrink) retain behaviour parent share.

In regions with high load & payload it become necessary to check once against lastest 'created_at' or 'updated_at' from db, if earlier fetched value report service as down.

kpawar-sap · 2022-08-12T08:16:51Z

this fix is based on observation of bug, I was not able to reproduce the issue on devstack. though we reproduce it on some regions, we can try this change there.

kpawar-sap · 2022-08-19T08:23:57Z

After checking all call invocations of service_is_up(), all calls are made with service fetched from db and default value of 60 seconds is good enough to rule out that we need to fetch db values again. The issues seems more like service is taking more than 60 seconds to be up. Current workaround we have in production is 10 minutes (600 seconds)

this makes only solution is call to service_is_up by caller passing the value of threshold (if None passed, take conf default)

Carthaca · 2023-03-22T09:03:34Z

please rebase

Carthaca and others added 21 commits April 25, 2022 17:59

ccloud: use existing share_utils to get hostname from host state

106b5f0

Change-Id: I80b030c39b5a328ad212674880ffcd4b4725aff6

ccloud: don't replicate to same backend

c5dfe0f

we look at the host (NetApp cluster) instead of pool (NetApp node) Change-Id: I2d4b51aa78e9aa7fda800b99a6d9a6e6a42c55f2

ccloud: catch StorageResourceNotFound in ensure_shares

6629d63

can happen that share has been deleted in the meantime Change-Id: Id01618a490f6bf4e55304471d4dc3388582e8e2b

remove ccloud reexport flag in netapp _get_vserver

0615353

was added back then in 4b07b64 for easier error handling, but in the meantime we catch reexport aka ensure errors more centrally Change-Id: Id2a4547400bd630168268250e9ef8ee05cd93ae2

amend fix for https://bugs.launchpad.net/manila/+bug/1964592

c5eb8e4

follows 01b9d7c We need to use the backend cluster client to check for existence of a vserver in a different backend. Change-Id: I8eb35e714a9a5c02f40b50fc226b27e05ed9c2f7

ccloud: every 7 days re-try to apply access rules in error state

dc89613

applies to access rules which are older than 7 days. I.e. in the first week it will be re-tried on every ensure run. Log known errors with lower level. Change-Id: I359bf99e092c35f0b2785a74ec7a9a5afbd181d5

ccloud fix of https://bugs.launchpad.net/manila/+bug/1971710

c3d2b02

Change-Id: Ibc21b6c72d76a3a804f67e66e7604b3d0be4373f Closes-Bug: #1971710

ccloud: fix missing share type name in share_instance dict

ece166b

is required to get it into volume comment Change-Id: I657c19a74eb561f441dc6749210bfa906687a002

ccloud: keep share servers with sec services on error

4e9f9b3

fallback to setup cifs without ldaps and session signing after 3 failed attempts. ensure: don't re-apply cifs security settings to allow manual override Change-Id: I3a5341e2bd5c6343cff6fc50f05d855dc4f09312

Set DC discovery-mode to 'none' in case server specified

9421783

If the customer is configuring "Servers" as AD Server in the Security Service then the domain controller discovery mode should be changed to "none" and only these servers should be used.

Allow disabling NFSv4.0

474f68b

NFS v4.0 was previously handled differently in NetApp driver. We want to explicitly disable v4.0 if it is not set in 'netapp_enabled_share_protocols'. Also enable v4.1 options like read/write delegation, pnfs, acls.

Dedupe and logical space reporting

3192919

For all projects enable logical space reporting - For neo disable dedupe and compression - For share replica, share from snapshot, share modify(extend/shrink) retain behaviour parent share.

recheck from db if service is reported as down.

f4ed9e4

In regions with high load & payload it become necessary to check once against lastest 'created_at' or 'updated_at' from db, if earlier fetched value report service as down.

kpawar-sap requested a review from galkindmitrii August 12, 2022 08:09

kpawar-sap force-pushed the stable/xena-m3 branch 2 times, most recently from 26107fa to 650fa9c Compare August 18, 2022 16:16

chuan137 force-pushed the stable/xena-m3 branch 2 times, most recently from bddb4dc to 2e4560f Compare August 22, 2022 12:43

kpawar-sap force-pushed the stable/xena-m3 branch 2 times, most recently from 3ebf07c to 0d528e5 Compare September 8, 2022 11:46

kpawar-sap force-pushed the stable/xena-m3 branch 6 times, most recently from 334c8a7 to 251cc83 Compare September 14, 2022 07:19

kpawar-sap force-pushed the stable/xena-m3 branch 2 times, most recently from 8365486 to fd00178 Compare September 29, 2022 17:22

Carthaca force-pushed the stable/xena-m3 branch from 81d3bc6 to 6da5d11 Compare October 26, 2022 10:55

kpawar-sap force-pushed the stable/xena-m3 branch 3 times, most recently from 4bcd0a9 to 909108b Compare October 27, 2022 19:44

Carthaca force-pushed the stable/xena-m3 branch from 301504e to 0d8ed76 Compare November 1, 2022 12:52

Carthaca force-pushed the stable/xena-m3 branch from 276f0ad to 6514ce7 Compare November 21, 2022 07:49

kpawar-sap force-pushed the stable/xena-m3 branch from 047db96 to ee5d991 Compare December 1, 2022 17:21

Carthaca force-pushed the stable/xena-m3 branch 3 times, most recently from a94d862 to 40110c4 Compare January 20, 2023 08:01

Carthaca force-pushed the stable/xena-m3 branch from f0f623e to 5b27408 Compare April 20, 2023 15:47

Carthaca force-pushed the stable/xena-m3 branch from 250d725 to 7c91e2f Compare May 9, 2023 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recheck from db if service is reported as down. #45

recheck from db if service is reported as down. #45

kpawar-sap commented Aug 12, 2022

kpawar-sap commented Aug 12, 2022

kpawar-sap commented Aug 19, 2022

Carthaca commented Mar 22, 2023

recheck from db if service is reported as down. #45

Are you sure you want to change the base?

recheck from db if service is reported as down. #45

Conversation

kpawar-sap commented Aug 12, 2022

kpawar-sap commented Aug 12, 2022

kpawar-sap commented Aug 19, 2022

Carthaca commented Mar 22, 2023