Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock at mqmetric.collectQueueStatus() #170

Closed
liurui-1 opened this issue Jun 15, 2021 · 3 comments
Closed

Deadlock at mqmetric.collectQueueStatus() #170

liurui-1 opened this issue Jun 15, 2021 · 3 comments

Comments

@liurui-1
Copy link

liurui-1 commented Jun 15, 2021

We have a question about the Golang sdk for IBM MQ ( https://github.com/ibm-messaging/mq-golang/tree/79e82b431c9febfc4791fb8b2b37f1c33dab017f ). Our agent is blocked at mqmetric.CollectQueueStatus() for days. Following is stacktrace:

goroutine 1202789 [syscall]:
github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/ibmmq._Cfunc_MQGET(0x687e000011, 0xc001052000, 0xc002b043f0, 0x2800, 0xc000c69000, 0xc002381834, 0xc002381830, 0xc0023817ec)
	_cgo_gotypes.go:1179 +0x4a
github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/ibmmq.MQObject.getInternal.func1(0xc000000068, 0xc000558c00, 0xc00217ede0, 0x14, 0xc001052000, 0xc002b043f0, 0x2800, 0xc000c69000, 0xc002381834, 0xc002381830, ...)
	github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/ibmmq/mqi.go:690 +0x130
github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/ibmmq.MQObject.getInternal(0xc000000068, 0xc000558c00, 0xc00217ede0, 0x14, 0xc0005ca640, 0xc00044b1f0, 0xc000c69000, 0x2800, 0x2800, 0xc000494200, ...)
	github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/ibmmq/mqi.go:690 +0x271
github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/ibmmq.MQObject.Get(...)
	github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/ibmmq/mqi.go:616
github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/mqmetric.statusGetReply(0xc000558c00, 0x0, 0x0, 0x0, 0x0, 0x2700, 0x7f8a136e0a20, 0xc001a49ce0)
	github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/mqmetric/status.go:206 +0xf2
github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/mqmetric.collectQueueStatus(0xc000558c00, 0x7f8a130e7009, 0x1, 0x1, 0xffffffffffffffff, 0x0)
	github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/mqmetric/queue.go:238 +0x37a
github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/mqmetric.CollectQueueStatus(0xc000558c00, 0x7f8a130e7009, 0x1, 0xc0008e4120, 0x0)
	github.ibm.com/Unified-Agent/ibmmq/vendor/github.com/ibm-messaging/mq-golang/v5/mqmetric/queue.go:189 +0x2d8
github.ibm.com/Unified-Agent/ibmmq.(*Ibmmq).GatherQueues(0xc0000bc410, 0xc000d8c720, 0xd, 0xc0008e4101, 0x1)
	github.ibm.com/Unified-Agent/ibmmq/ibmmq.go:584 +0x27c5
github.ibm.com/Unified-Agent/ibmmq.(*Ibmmq).Gather.func1(0xc0020ee8f0, 0xc0000bc410, 0xc000d8c720, 0xd)
	github.ibm.com/Unified-Agent/ibmmq/ibmmq.go:247 +0x3c5
created by github.ibm.com/Unified-Agent/ibmmq.(*Ibmmq).Gather
	github.ibm.com/Unified-Agent/ibmmq/ibmmq.go:232 +0x198

There was no any error messages from ibmmq during this deadlock. So we suspect that there was some special scenario causing MQ to return 2033 (MQRC_NO_MSG_AVAILABLE) repeatedly when collecting queue status. Please review the code at

GatherQueues() in github.com/ibm-messaging/mq-golang/v5/mqmetric/queue.go
and
statusGetReply() in github.com/ibm-messaging/mq-golang/v5/mqmetric/status.go

Do you think it can be the root cause of the deadlock? Even with the latest code of MQ Golang SDK, this code logic is same.

@liurui-1 liurui-1 changed the title Deadlock at mqmetric.collectQueueStatus Deadlock at mqmetric.collectQueueStatus() Jun 15, 2021
@ibmmqmet
Copy link
Collaborator

  • I don't see how that stack trace indicates a deadlock For one reason, there's only a single thread showing and a deadlock would involve at least two.
  • If you really are using v5.0.0-beta level of code then you are very out of date and need to upgrade. The word "beta" ought to give some urgency to that.
  • Newer versions of the package better deal with some bad configurations and larger-scale responses including reply queues filling up. The newer versions also have more error reporting and a Trace level of logging for additional diagnostics. Turning on tracing should show the return values from statusGetReply.
  • The GatherQueues function is not part of this package; it's one of yours. So I can't tell what it is doing or returning.
  • Remember that this is an as-is project with no guaranteed support. Sending the same question via multiple routes is not going to get it looked at faster.

@liurui-1
Copy link
Author

liurui-1 commented Jun 16, 2021

Hi @ibmmqmet ,

  • The MQ agent code is there running without any logs for 3 days. There are logs everywhere in the mq agent code. We took stacktrace and the stacktrace is always same. So we think it is deadlock.
  • The GatherQueues is in our agent code but it does not have any impact when code is deadlocked in the Golang SDK and cannot return back from the Golang SDK.

In the following code, unless err == nil and cfh.Control == ibmmq.MQCFC_LAST , it will not return from the loop which can be deadlocked at some special scenarios.
GatherQueues() in github.com/ibm-messaging/mq-golang/v5/mqmetric/queue.go
and
statusGetReply() in github.com/ibm-messaging/mq-golang/v5/mqmetric/status.go

We will upgrade to use new version SDK but we can see the above code logic is not updated.

ibmmqmet added a commit that referenced this issue Aug 4, 2021
 mqmetric - Add qmgr_status metric so that Prometheus collector can report it even when qmgr is unavailable
 mqmetric - Check more failure scenarios (#170)
 mqmetric - Add a cluster_suspend metric
@ibmmqmet
Copy link
Collaborator

should be fixed in current releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants