Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link shutdown routine and sigterm handler to main thread #5555

Merged

Conversation

khushboobhatia01
Copy link
Contributor

@khushboobhatia01 khushboobhatia01 commented Jan 26, 2022

Currently when a SIGTERM signal is received by worker or workflow engine, sigterm handler runs in a new thread. Sigterm handler throws SystemExit exception which should be caught in the main thread for shutdown routine to be executed.
(https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/cmd/actionrunner.py#L74 https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/cmd/workflow_engine.py#L75)

However the above expected behaviour doesn't happen always. When we have other green threads processing messages SystemExit exception can be caught by these thread. And if the try-except block doesn't re-raise the exception, the shutdown routine is never executed.

This PR will take care of raising the exception in main thread, so that it can be caught and shutdown routine can complete properly. Ref https://eventlet.net/doc/modules/greenthread.html#eventlet.greenthread.GreenThread.kill

Issues seen locally when multiple executions are being processed.

2022-01-26 13:17:18,487 ERROR [-] ActionsQueueConsumer failed to process message: LiveActionDB(action="core.echo", action_is_workflow=False, callback={}, context={'pack': 'core', 'user': 'stanley', 'parent': {'execution_id': '61f0fc4e6d622ab35002ca72', 'user': 'stanley', 'pack': 'examples'}, 'orquesta': {'workflow_execution_id': '61f0fc5062a1a5d4e9affdce', 'task_execution_id': '61f0fc52420502aa20934e3d', 'task_name': 'task1', 'task_id': 'task1', 'task_route': 0, 'item_id': 18}}, delay=None, end_timestamp=None, id=61f0fc73420502aa20934e78, notify=None, parameters={'message': 's, resistance is futile!'}, result={}, runner_info={}, start_timestamp="2022-01-26 07:46:59.554497+00:00", status="scheduled", task_execution="61f0fc52420502aa20934e3d", workflow_execution="61f0fc5062a1a5d4e9affdce")
Traceback (most recent call last):
File "/home/ubuntu/st2/st2common/st2common/transport/consumers.py", line 77, in _process_message
self._handler.process(body)
File "/home/ubuntu/st2/st2actions/st2actions/worker.py", line 138, in process
return dispatchersliveaction.status
File "/home/ubuntu/st2/st2actions/st2actions/worker.py", line 207, in _run_action
result = self.container.dispatch(liveaction_db)
File "/home/ubuntu/st2/st2actions/st2actions/container/base.py", line 88, in dispatch
liveaction_db = funcsliveaction_db.status
File "/home/ubuntu/st2/st2actions/st2actions/container/base.py", line 161, in _do_run
runner.liveaction = self._update_status(
File "/home/ubuntu/st2/st2actions/st2actions/container/base.py", line 365, in _update_status
liveaction_db, state_changed = self._update_live_action_db(
File "/home/ubuntu/st2/st2actions/st2actions/container/base.py", line 329, in _update_live_action_db
liveaction_db = update_liveaction_status(
File "/home/ubuntu/st2/st2common/st2common/util/action_db.py", line 304, in update_liveaction_status
liveaction_db = LiveAction.add_or_update(liveaction_db)
File "/home/ubuntu/st2/st2common/st2common/persistence/base.py", line 185, in add_or_update
model_object = cls._get_impl().add_or_update(model_object, validate=True)
File "/home/ubuntu/st2/st2common/st2common/models/db/init.py", line 602, in add_or_update
instance.save(validate=validate)
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/mongoengine/document.py", line 393, in save
self.ensure_indexes()
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/mongoengine/document.py", line 894, in ensure_indexes
collection.create_index(fields, background=background, **opts)
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/pymongo/collection.py", line 2059, in create_index
return self.__create_indexes([index], session, **cmd_options)[0]
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/pymongo/collection.py", line 1919, in __create_indexes
with self._socket_for_writes(session) as sock_info:
File "/usr/lib/python3.8/contextlib.py", line 113, in enter
return next(self.gen)
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1243, in _get_socket
@contextlib.contextmanager
File "/home/ubuntu/st2/st2actions/st2actions/cmd/actionrunner.py", line 45, in sigterm_handler
sys.exit(0)
SystemExit: 0

2022-01-26 13:21:25,699 ERROR [-] Publish failed.
Traceback (most recent call last):
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/amqp/connection.py", line 512, in channel
return self.channels[channel_id]
KeyError: None
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/st2/st2common/st2common/persistence/base.py", line 208, in add_or_update
cls.publish_create(model_object)
File "/home/ubuntu/st2/st2common/st2common/persistence/base.py", line 281, in publish_create
publisher.publish_create(model_object)
File "/home/ubuntu/st2/st2common/st2common/transport/publishers.py", line 126, in publish_create
self._publisher.publish(payload, self._exchange, CREATE_RK)
File "/home/ubuntu/st2/st2common/st2common/transport/publishers.py", line 91, in publish
retry_wrapper.run(connection=connection, wrapped_callback=do_publish)
File "/home/ubuntu/st2/st2common/st2common/transport/connection_retry_wrapper.py", line 131, in run
channel = connection.channel()
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/kombu/connection.py", line 283, in channel
chan = self.transport.create_channel(self.connection)
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/kombu/transport/pyamqp.py", line 98, in create_channel
return connection.channel()
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/amqp/connection.py", line 514, in channel
channel = self.Channel(self, channel_id, on_open=callback)
File "/home/ubuntu/st2/virtualenv/lib/python3.8/site-packages/amqp/channel.py", line 104, in init
AMQP_LOGGER.debug('using channel_id: %s', channel_id)
File "/usr/lib/python3.8/logging/init.py", line 1434, in debug
self._log(DEBUG, msg, args, **kwargs)
File "/usr/lib/python3.8/logging/init.py", line 1589, in _log
self.handle(record)
File "/usr/lib/python3.8/logging/init.py", line 1599, in handle
self.callHandlers(record)
File "/usr/lib/python3.8/logging/init.py", line 1661, in callHandlers
hdlr.handle(record)
File "/usr/lib/python3.8/logging/init.py", line 954, in handle
self.emit(record)
File "/usr/lib/python3.8/logging/init.py", line 1088, in emit
stream.write(msg + self.terminator)
File "/home/ubuntu/st2/st2actions/st2actions/cmd/workflow_engine.py", line 47, in sigterm_handler
sys.exit(0)
SystemExit: 0

@pull-request-size pull-request-size bot added the size/S PR that changes 10-29 lines. Very easy to review. label Jan 26, 2022
Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. That's obscure. Nice catch!

@cognifloyd cognifloyd added this to the 3.7.0 milestone Jan 26, 2022
@pull-request-size pull-request-size bot added size/M PR that changes 30-99 lines. Good size to review. and removed size/S PR that changes 10-29 lines. Very easy to review. labels Feb 2, 2022
@cognifloyd cognifloyd merged commit 7636d35 into StackStorm:master Feb 2, 2022
Copy link
Member

@arm4b arm4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!

@khushboobhatia01 @cognifloyd the PR is just missing a Changelog here, which is an important ledger for the community to navigate the fixes and changes.
Could you folks please add the missing Changelog in a subsequent PR?

@khushboobhatia01
Copy link
Contributor Author

Good stuff!

@khushboobhatia01 @cognifloyd the PR is just missing a Changelog here, which is an important ledger for the community to navigate the fixes and changes.

Could you folks please add the missing Changelog in a subsequent PR?

@armab Yes, I'll add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M PR that changes 30-99 lines. Good size to review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants