Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master thread safe loader #54878

Closed
wants to merge 14 commits into from
Closed

Conversation

Akm0d
Copy link
Contributor

@Akm0d Akm0d commented Oct 3, 2019

What does this PR do?

Merge PR from develop #50655 which was a new version of #39948, #48598, and #50648 so the description has been copied from there. This PR will hopefully resolve the problems that existed in the original PRs. The code is identical to #50648, but this PR is based on the develop branch instead of the 2018.3.3 branch because Jenkins obviously cannot run tests for branches based on the 2018.3.3 branch.

There was a race condition in the salt loader when injecting global values (e.g. "pillar" or "salt") into modules. One effect of this race condition was that in a setup with multiple threads, some threads may see pillar data intended for other threads or the pillar data seen by a thread might even change spuriously.

There have been earlier attempts to fix this problem (#27937, #29397). These patches tried to fix the problem by storing the dictionary that keeps the relevant data in a thread-local variable and referencing this thread-local variable from the variables that are injected into the modules.

These patches did not fix the problem completely because they only work when a module is loaded through a single loader instance only. When there is more than one loader, there is more than one thread-local variable and the variable injected into a module is changed to point to another thread-local variable when the module is loaded again. Thus, the problem resurfaced while working on #39670.

This patch attempts to solve the problem from a slightly different angle, complementing the earlier patches: The value injected into the modules now is a proxy that internally uses a thread-local variable to decide to which object it points. This means that when loading a module
again through a different loader (possibly passing different pillar data), the data is actually only changed in the thread in which the loader is used. Other threads are not affected by such a change.

This means that it will work correctly in the current situation where loaders are possibly created by many different modules and these modules do not necessary know in which context they are executed. Thus it is much more flexible and reliable than the more explicit approach
used by the two earlier patches.
What issues does this PR fix or reference?

This PR fixes problems that surfaced when developing the parallel runners feature (#39670), but is also related to problems reported earlier (#23373). The problems described in #29028 might also be related, though this needs further confirmation.
Previous Behavior

Changes to pillar data in one thread might have influenced pillar data in a different thread. Whether or when this problem actually appeared was unpredictable (due to the nature of a race condition).
New Behavior

Changes to pillar data in one thread will never affect the pillar data in a different thread.
Tests written?

Yes
Regression Potential

The change to the loader code itself is rather small, but it affects a central component. So if there is a problem, the impact might affect nearly all components.

@Akm0d Akm0d requested a review from a team as a code owner October 3, 2019 23:57
@ghost ghost requested a review from xeacott October 3, 2019 23:57
@Akm0d Akm0d requested a review from dwoz October 4, 2019 00:39
@Akm0d Akm0d added the Needs-Testcase PR needs test cases written, or the issue is about a bug/feature that needs test cases label Oct 4, 2019
@max-arnold
Copy link
Contributor

@Akm0d FYI: the backported PR also needs #51499 to fix the infinite recursion bug.

@Akm0d
Copy link
Contributor Author

Akm0d commented Oct 11, 2019

I'm getting this error on all the tests now:

 File "/tmp/kitchen/testing/tests/integration/output/test_output.py", line 97, in test_output_yaml_namespaced_dict_wrapper
   assert isinstance(loaded_yaml, dict)
AssertionError```

@Akm0d
Copy link
Contributor Author

Akm0d commented Oct 11, 2019

It looks like #49269 is an issue again even though all it's fixes are present in master

@Akm0d
Copy link
Contributor Author

Akm0d commented Oct 15, 2019

The threadsafe loader has trouble representing undefined data types in the yamldumper (similar to #43694).

Copy link
Collaborator

@s0undt3ch s0undt3ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more than the thread proxy stuff right?
Separate changes?

@Akm0d
Copy link
Contributor Author

Akm0d commented Oct 31, 2019

There's more than the thread proxy stuff right?
Separate changes?

This is all the threadsafe loader + other changes required to make tests pass that failed with the threadsafe loader

@Akm0d
Copy link
Contributor Author

Akm0d commented Nov 4, 2019

On this PR the Py3/macosxmojave build hangs right after test_unsupported_kwarg. On a PR where the same build is passing, the subsequent test gives the following log:

[2019-10-31T22:51:56.696Z]          test_pam_auth_invalid_user (integration.shell.test_auth.AuthTest) ... 16:51:55,145 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/localaccounts', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:51:56.696Z]        16:51:55,146 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:51:56.696Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:51:56.696Z]        16:51:55,146 [ERROR   ] retcode: 54
[2019-10-31T22:51:56.696Z]        16:51:55,178 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/everyone', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:51:56.696Z]        16:51:55,179 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:51:56.696Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:51:56.696Z]        16:51:55,179 [ERROR   ] retcode: 54
[2019-10-31T22:51:56.696Z]        16:51:55,210 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/_lpoperator', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:51:56.696Z]        16:51:55,211 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:51:56.696Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:51:56.696Z]        16:51:55,211 [ERROR   ] retcode: 54
[2019-10-31T22:51:56.696Z]        16:51:55,243 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/com.apple.sharepoint.group.1', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:51:56.696Z]        16:51:55,243 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:51:56.696Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:51:56.696Z]        16:51:55,244 [ERROR   ] retcode: 54
[2019-10-31T22:52:02.974Z]        16:52:01,936 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/everyone', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:52:02.974Z]        16:52:01,936 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:52:02.974Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:52:02.974Z]        16:52:01,937 [ERROR   ] retcode: 54
[2019-10-31T22:52:02.974Z]        16:52:01,971 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/com.apple.sharepoint.group.1', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:52:02.974Z]        16:52:01,971 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:52:02.974Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:52:02.974Z]        16:52:01,971 [ERROR   ] retcode: 54
[2019-10-31T22:52:02.974Z]        16:52:02,007 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/_lpoperator', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:52:02.974Z]        16:52:02,007 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:52:02.974Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:52:02.974Z]        16:52:02,008 [ERROR   ] retcode: 54
[2019-10-31T22:52:02.974Z]        16:52:02,077 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/localaccounts', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:52:02.974Z]        16:52:02,078 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:52:02.974Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:52:02.974Z]        16:52:02,078 [ERROR   ] retcode: 54
[2019-10-31T22:52:09.109Z]        OK (64.618s)
[2019-10-31T22:53:11.101Z]          test_pam_auth_valid_group (integration.shell.test_auth.AuthTest) ... 16:53:03,447 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/localaccounts', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:53:11.101Z]        16:53:03,447 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:53:11.101Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:53:11.101Z]        16:53:03,447 [ERROR   ] retcode: 54
[2019-10-31T22:53:11.101Z]        16:53:03,479 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/everyone', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:53:11.102Z]        16:53:03,480 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:53:11.102Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:53:11.102Z]        16:53:03,480 [ERROR   ] retcode: 54
[2019-10-31T22:53:11.102Z]        16:53:03,512 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/_lpoperator', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:53:11.102Z]        16:53:03,512 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:53:11.102Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:53:11.102Z]        16:53:03,513 [ERROR   ] retcode: 54
[2019-10-31T22:53:11.102Z]        16:53:03,549 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/com.apple.sharepoint.group.1', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:53:11.102Z]        16:53:03,549 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:53:11.102Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:53:11.102Z]        16:53:03,550 [ERROR   ] retcode: 54
[2019-10-31T22:53:11.102Z]        16:53:10,109 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/_lpoperator', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:53:11.102Z]        16:53:10,110 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:53:11.102Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:53:11.102Z]        16:53:10,110 [ERROR   ] retcode: 54
[2019-10-31T22:53:11.102Z]        16:53:10,145 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/localaccounts', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:53:11.102Z]        16:53:10,145 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:53:11.102Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:53:11.102Z]        16:53:10,146 [ERROR   ] retcode: 54
[2019-10-31T22:53:11.102Z]        16:53:10,181 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/com.apple.sharepoint.group.1', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:53:11.102Z]        16:53:10,181 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:53:11.102Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:53:11.102Z]        16:53:10,181 [ERROR   ] retcode: 54
[2019-10-31T22:53:11.102Z]        16:53:10,248 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/everyone', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:53:11.102Z]        16:53:10,249 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:53:11.102Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:53:11.102Z]        16:53:10,249 [ERROR   ] retcode: 54
[2019-10-31T22:53:17.380Z]        OK (67.844s)
[2019-10-31T22:54:18.909Z]          test_pam_auth_valid_user (integration.shell.test_auth.AuthTest) ... 16:54:11,486 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/localaccounts', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:54:18.909Z]        16:54:11,486 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:54:18.909Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:54:18.909Z]        16:54:11,486 [ERROR   ] retcode: 54
[2019-10-31T22:54:18.909Z]        16:54:11,519 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/everyone', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:54:18.909Z]        16:54:11,520 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:54:18.909Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:54:18.909Z]        16:54:11,520 [ERROR   ] retcode: 54
[2019-10-31T22:54:18.909Z]        16:54:11,551 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/_lpoperator', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:54:18.909Z]        16:54:11,552 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:54:18.909Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:54:18.909Z]        16:54:11,552 [ERROR   ] retcode: 54
[2019-10-31T22:54:18.909Z]        16:54:11,582 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/com.apple.sharepoint.group.1', 'GroupMembership', 'saltdev']' failed with return code: 54
[2019-10-31T22:54:18.909Z]        16:54:11,583 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:54:18.909Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:54:18.909Z]        16:54:11,583 [ERROR   ] retcode: 54
[2019-10-31T22:54:18.909Z]        16:54:18,339 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/everyone', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:54:18.910Z]        16:54:18,339 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:54:18.910Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:54:18.910Z]        16:54:18,340 [ERROR   ] retcode: 54
[2019-10-31T22:54:18.910Z]        16:54:18,373 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/_lpoperator', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:54:18.910Z]        16:54:18,373 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:54:18.910Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:54:18.910Z]        16:54:18,373 [ERROR   ] retcode: 54
[2019-10-31T22:54:18.910Z]        16:54:18,408 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/com.apple.sharepoint.group.1', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:54:18.910Z]        16:54:18,409 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:54:18.910Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:54:18.910Z]        16:54:18,409 [ERROR   ] retcode: 54
[2019-10-31T22:54:18.910Z]        16:54:18,478 [ERROR   ] Command '['dscl', 'localhost', '-delete', '/Local/Default/Groups/localaccounts', 'GroupMembership', 'saltadm']' failed with return code: 54
[2019-10-31T22:54:18.910Z]        16:54:18,479 [ERROR   ] stderr: <main> attribute status: eDSAttributeNotFound
[2019-10-31T22:54:18.910Z]        <dscl_cmd> DS Error: -14134 (eDSAttributeNotFound)
[2019-10-31T22:54:18.910Z]        16:54:18,479 [ERROR   ] retcode: 54
[2019-10-31T22:54:26.304Z]        OK (68.509s)```

@Akm0d
Copy link
Contributor Author

Akm0d commented Nov 4, 2019

Verified, skipping test_pam_auth_invalid_user causes py3 mac tests to pass

@Akm0d
Copy link
Contributor Author

Akm0d commented Nov 5, 2019

Is this issue is a problem again? saltstack/salt-ci-images#504
Maybe workaround it used may not be necessary since: python/cpython@e359bc2#diff-064208ccd6ee72e507818c0676619d07

@Akm0d
Copy link
Contributor Author

Akm0d commented Nov 5, 2019

Found it! This is the line of code causing all the trouble on mac:
self.run_call('group.info {0}'.format(self.group))

@Akm0d
Copy link
Contributor Author

Akm0d commented Nov 7, 2019

The last commit did it, there might be a problem with the macosxmojave build or with the version of the grp module, but for now I think everything is working :)

@Akm0d
Copy link
Contributor Author

Akm0d commented Nov 7, 2019

re-run full centos7

@Akm0d
Copy link
Contributor Author

Akm0d commented Nov 7, 2019

re-run full opensuse15

salt/log/handlers/fluent_mod.py Outdated Show resolved Hide resolved
salt/log/handlers/fluent_mod.py Outdated Show resolved Hide resolved
@Akm0d Akm0d requested a review from s0undt3ch November 7, 2019 20:45
s0undt3ch
s0undt3ch previously approved these changes Nov 7, 2019
@Akm0d
Copy link
Contributor Author

Akm0d commented Nov 18, 2019

Since this is such a large PR we'll break it up into smaller PRs to get it into salt. The msgpack stuff will come out first.

@dwoz
Copy link
Contributor

dwoz commented Jan 16, 2020

re-run full all

@Akm0d
Copy link
Contributor Author

Akm0d commented Jan 17, 2020

The centos7 test failure is a serious issue with this PR:

  File "/tmp/kitchen/testing/tests/integration/scheduler/test_skip.py", line 301, in test_run_seconds_skip
    self.schedule.eval(now=run_time)
  File "/tmp/kitchen/testing/salt/utils/schedule.py", line 1593, in eval
    now)
  File "/tmp/kitchen/testing/salt/utils/schedule.py", line 224, in _check_max_running
    current_jobs = salt.utils.minion.running(self.opts)
  File "/tmp/kitchen/testing/salt/utils/minion.py", line 33, in running
    data = _read_proc_file(path, opts)
  File "/tmp/kitchen/testing/salt/utils/minion.py", line 73, in _read_proc_file
    data = serial.loads(buf)
  File "/tmp/kitchen/testing/salt/payload.py", line 140, in loads
    exc,
  File "/tmp/kitchen/testing/salt/ext/six.py", line 725, in raise_from
    raise value
SaltDeserializationError: Could not deserialize msgpack message. See log for more info.

@sagetherage sagetherage added the ZRelease-Sodium retired label label Mar 23, 2020
@dwoz dwoz self-requested a review April 11, 2020 07:38
dwoz
dwoz previously approved these changes Apr 11, 2020
@dwoz dwoz self-requested a review April 11, 2020 07:40
@Akm0d Akm0d dismissed stale reviews from dwoz, s0undt3ch, and garethgreenaway via 7b50486 April 24, 2020 09:47
@dwoz
Copy link
Contributor

dwoz commented Apr 26, 2020

@Akm0d I believe this is being addressed in a more comprehensive way for the proxy minion work @garethgreenaway is working on. We should hold off on this to see how that work pans out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core relates to code central or existential to Salt master-port
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants