Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recheck from db if service is reported as down. #45

Open
wants to merge 21 commits into
base: stable/xena-m3
Choose a base branch
from

Conversation

kpawar-sap
Copy link

In regions with high load & payload it become necessary to check once
against lastest 'created_at' or 'updated_at' from db, if earlier
fetched value report service as down.

Carthaca and others added 21 commits April 25, 2022 17:59
only shrink share is expected to decrease max_files.
This is flawed in multiple ways:
- setting max_files_multiplier to < 1 may not work as intended in this case
- netapp is doing some rounding, not every max inode number can be set

Change-Id: I94215b212ceccdba151e64cb38db9f26f7fbc1d2
Change-Id: I80b030c39b5a328ad212674880ffcd4b4725aff6
we look at the host (NetApp cluster) instead of pool (NetApp node)

Change-Id: I2d4b51aa78e9aa7fda800b99a6d9a6e6a42c55f2
can happen that share has been deleted in the meantime

Change-Id: Id01618a490f6bf4e55304471d4dc3388582e8e2b
errors we have seen are temporary.
Once they were in error state, they got stuck there, whereas a simple
re-apply would have helped. So we go this way to re-apply and log
the event.

Change-Id: Id0a3d50bf0f82cb6d079988514ecff282c90de41
was added back then in 4b07b64
for easier error handling, but in the meantime we catch
reexport aka ensure errors more centrally

Change-Id: Id2a4547400bd630168268250e9ef8ee05cd93ae2
follows 01b9d7c

We need to use the backend cluster client to check for existence
of a vserver in a different backend.

Change-Id: I8eb35e714a9a5c02f40b50fc226b27e05ed9c2f7
applies to access rules which are older than 7 days.
I.e. in the first week it will be re-tried on every ensure run.

Log known errors with lower level.

Change-Id: I359bf99e092c35f0b2785a74ec7a9a5afbd181d5
according to https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry

to fix
"Temporary failure in name resolution"

Retry is visible in log like:
```
WARNING urllib3.connectionpool [req-d92bd8d6-f05f-404c-8ded-087d29c9bf9f] Retrying (Retry(total=4, connect=4, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fcde1cfb3a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /servlets/netapp.servlets.admin.XMLrequest_filer
WARNING urllib3.connectionpool [req-d92bd8d6-f05f-404c-8ded-087d29c9bf9f] Retrying (Retry(total=3, connect=3, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fcddbf223a0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /servlets/netapp.servlets.admin.XMLrequest_filer
WARNING urllib3.connectionpool [req-d92bd8d6-f05f-404c-8ded-087d29c9bf9f] Retrying (Retry(total=2, connect=2, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fcde0443340>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /servlets/netapp.servlets.admin.XMLrequest_filer
WARNING urllib3.connectionpool [req-d92bd8d6-f05f-404c-8ded-087d29c9bf9f] Retrying (Retry(total=1, connect=1, read=2, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fcde0144a00>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /servlets/netapp.servlets.admin.XMLrequest_filer
```

Change-Id: Ic9ff8208f10df9dbed09717d6b218f6293d2338a
Change-Id: Ibc21b6c72d76a3a804f67e66e7604b3d0be4373f
Closes-Bug: #1971710
The filter_properties operates with pools. A set intersection would limit
affinity to a certain pool, but we want to allow a set of pools within
the same backend cluster.

E.g. 'same_host': [
'HostA@BackendA#Pool1',
'HostA@BackendA#Pool2']
is fine in filter_propeties

Change-Id: If96192f25f03e517a78fefa65db91e02d0ba20e9
is required to get it into volume comment

Change-Id: I657c19a74eb561f441dc6749210bfa906687a002
…ities

configure dns before everything else for security services,
this is a prereq

configure certs and use signed sessions for active directory

fail unit test if certs would expire in less than 60 days

Change-Id: Id50894f9dda06741d05949e41817ba340f17dd2c
When trying to compare two values that are non-numeric using the driver
filter, the filter function will give an error. This is not desirable as
it might be interesting to support comparatives with non-numeric values
provided by the filter objects (share, host, etc). For example, the
following formula failed before the fix:

filter_function = '(share.project_id == "bb212f09317a4f4a8952ef3f729c2551")'

Copied from cinder https://opendev.org/openstack/cinder/commit/87a7e80a2cbc4c8abcf4394242a02fcc5140e44b

Change-Id: Icbfabb3bc0f608ebdd0784337db0921cc7763c53
fallback to setup cifs without ldaps and session signing after 3 failed attempts.
ensure: don't re-apply cifs security settings to allow manual override

Change-Id: I3a5341e2bd5c6343cff6fc50f05d855dc4f09312
'reserved_share_extend_percentage' backend config option allows Manila
to consider different reservation percentage for share extend
operation. With this option, under existing limit of
'reserved_share_percentage', we do not want user to create new share if
limit is hit, but allow user to extend existing share.

DocImpact

Closes-Bug: #1961087
Change-Id: I000a7f530569ff80495b1df62a91981dc5865023
(cherry picked from commit 6431b86)
If the customer is configuring "Servers" as AD Server in the Security
Service then the domain controller discovery mode should be changed
to "none" and only these servers should be used.
NFS v4.0 was previously handled differently in NetApp driver.
We want to explicitly disable v4.0 if it is not set in 'netapp_enabled_share_protocols'.

Also enable v4.1 options like read/write delegation, pnfs, acls.
During share network create API, if failure occurs quota is not rolled
back and its usable only after quota reservations timed out (waiting
conf.reservation_expire seconds).

Closes-bug: #1975483
Change-Id: I3de8f5bfa6ac4580da9b1012caa25657a6df71ec
(cherry picked from commit 8c854a1)
For all projects enable logical space reporting
- For neo disable dedupe and compression
- For share replica, share from snapshot, share modify(extend/shrink)
  retain behaviour parent share.
In regions with high load & payload it become necessary to check once
against lastest 'created_at' or 'updated_at' from db, if earlier
fetched value report service as down.
@kpawar-sap
Copy link
Author

this fix is based on observation of bug, I was not able to reproduce the issue on devstack. though we reproduce it on some regions, we can try this change there.

@kpawar-sap kpawar-sap force-pushed the stable/xena-m3 branch 2 times, most recently from 26107fa to 650fa9c Compare August 18, 2022 16:16
@kpawar-sap
Copy link
Author

After checking all call invocations of service_is_up(), all calls are made with service fetched from db and default value of 60 seconds is good enough to rule out that we need to fetch db values again. The issues seems more like service is taking more than 60 seconds to be up. Current workaround we have in production is 10 minutes (600 seconds)

this makes only solution is call to service_is_up by caller passing the value of threshold (if None passed, take conf default)

@chuan137 chuan137 force-pushed the stable/xena-m3 branch 2 times, most recently from bddb4dc to 2e4560f Compare August 22, 2022 12:43
@kpawar-sap kpawar-sap force-pushed the stable/xena-m3 branch 2 times, most recently from 3ebf07c to 0d528e5 Compare September 8, 2022 11:46
@Carthaca
Copy link
Collaborator

please rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants