Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection reset during workflow after stage is complete. pika setting is probably still needed #137

Closed
Weiming-Hu opened this issue Feb 8, 2021 · 4 comments
Assignees

Comments

@Weiming-Hu
Copy link
Contributor

I have experimented with the workflow without pika settings, specifically without the following two lines in my user code:

pika.connection.Parameters.DEFAULT_HEARTBEAT_INTERVAL = 0
pika.connection.Parameters.DEFAULT_HEARTBEAT_TIMEOUT = 0

But then I got the following error in my client-side sandbox:

1612651593.608 : radical.entk.task_manager.0000 : 15773 : 140166928791296 : INFO     : Transition task.0000 to EXECUTED
1612651593.608 : radical.entk.task_manager.0000 : 15773 : 140166928791296 : DEBUG    : task.0000 (EXECUTED) to sync with amgr
1612651593.609 : radical.entk.task_manager.0000 : 15773 : 140166928791296 : ERROR    : Transition task.0000 to state EXECUTED failed, error: (-1, "ConnectionResetError(104, 'Connection reset by peer')")
Traceback (most recent call last):
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/execman/base/task_manager.py", line 206, in _advance
    self._sync_with_master(obj, obj_type, channel, queue)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/execman/base/task_manager.py", line 153, in _sync_with_master
    properties=pika.BasicProperties(correlation_id=corr_id))
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 2120, in basic_publish
    mandatory, immediate)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 2207, in publish
    self._flush_output()
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 1292, in _flush_output
    *waiters)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 477, in _flush_output
    result.reason_text)
pika.exceptions.ConnectionClosed: (-1, "ConnectionResetError(104, 'Connection reset by peer')")
1612651593.611 : radical.entk.task_manager.0000 : 15773 : 140166928791296 : DEBUG    : task.0000 (DESCRIBED) to sync with amgr
1612651593.611 : radical.entk.task_manager.0000 : 15773 : 140166928791296 : ERROR    : Error in RP callback thread:
Traceback (most recent call last):
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/execman/base/task_manager.py", line 206, in _advance
    self._sync_with_master(obj, obj_type, channel, queue)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/execman/base/task_manager.py", line 153, in _sync_with_master
    properties=pika.BasicProperties(correlation_id=corr_id))
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 2120, in basic_publish
    mandatory, immediate)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 2207, in publish
    self._flush_output()
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 1292, in _flush_output
    *waiters)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 477, in _flush_output
    result.reason_text)
pika.exceptions.ConnectionClosed: (-1, "ConnectionResetError(104, 'Connection reset by peer')")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/execman/rp/task_manager.py", line 251, in unit_state_cb
    mq_channel, '%s-cb-to-sync' % self._sid)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/execman/base/task_manager.py", line 213, in _advance
    self._sync_with_master(obj, obj_type, channel, queue)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/radical/entk/execman/base/task_manager.py", line 153, in _sync_with_master
    properties=pika.BasicProperties(correlation_id=corr_id))
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 2120, in basic_publish
    mandatory, immediate)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 2206, in publish
    immediate=immediate)
  File "/glade/u/home/wuh20/venv_Predictability/lib/python3.7/site-packages/pika/channel.py", line 415, in basic_publish
    raise exceptions.ChannelClosed()
pika.exceptions.ChannelClosed

When I added back those lines to my user code, things are working again. I suppose there are still some leftover issues with pika?

Thank you

@Weiming-Hu Weiming-Hu self-assigned this Feb 8, 2021
@lee212
Copy link

lee212 commented Feb 8, 2021

@Weiming-Hu , can you confirm your radical-stack?

This might have to reopen this: radical-cybertools/radical.entk#509

@Weiming-Hu
Copy link
Contributor Author

Sure thing. Please see below:

(venv_Predictability) wuh20@cheyenne2:~> radical-stack

  python               : /glade/u/home/wuh20/venv_Predictability/bin/python3
  pythonpath           :
  version              : 3.7.9
  virtualenv           : /glade/u/home/wuh20/venv_Predictability

  radical.analytics    : 1.5.0
  radical.entk         : 1.5.8
  radical.gtod         : 1.5.0
  radical.pilot        : 1.5.12
  radical.saga         : 1.5.9
  radical.utils        : 1.5.9

(venv_Predictability) wuh20@cheyenne2:~>

@lee212
Copy link

lee212 commented Feb 8, 2021

Thank you @Weiming-Hu , can you try again after you update entk to v1.5.12?
The actual versioning got complicated like radical.entk : [email protected] but you should have updated pika handling if you see 1.5.12 in the radical-stack.

@Weiming-Hu
Copy link
Contributor Author

Got it. Please let me try again with the updated version and I will make sure to report back here. Thanks.

@mturilli mturilli closed this as completed Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants