Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test failure of file role store auto-reload #56398

Merged
merged 9 commits into from
May 13, 2020

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented May 8, 2020

Ensure file content is replaced atomically to prevent file watcher from reading imcomplete/empty file.

Resolves: #52955

@ywangd ywangd added >test Issues or PRs that are addressing/adding tests :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.0.0 v7.8.1 v7.9.0 labels May 8, 2020
@ywangd ywangd requested a review from albertzaharovits May 8, 2020 00:35
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authorization)

@elasticmachine elasticmachine added the Team:Security Meta label for security team label May 8, 2020
Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am on the fence about the proposed fix, but it does fix the test failures and is minimally invasive, hence LGTM.

The expectation from tests is that the file write with truncation operation is atomic. This is wrong because even ordinary file write is not atomic.

To get around it, the test is not using write with truncate anymore, but instead uses file replace.
I believe we are testing a slightly different thing in this case. But because, from the core code's perspective, this difference is not relevant, I believe the proposed test is valid.

Ideally, I would like us to test the scenario were the file under observation is edited, not that it is replaced, but I don't have a good suggestion about how to go about it.

@ywangd
Copy link
Member Author

ywangd commented May 12, 2020

To get around it, the test is not using write with truncate anymore, but instead uses file replace.

@albertzaharovits Your analysis is accurate. I don't see an easy way to have atomic file modification behaviour unless resorting to some in-memory FileSystem, which feels overkill for this purpose.

Alternatively, we could fix the syptom only. Let me elaborate: The failure occurs in the following assertion:

descriptors = store.roleDescriptors(Collections.singleton("role5"));
assertThat(descriptors, notNullValue());

This failure is due to two reasons: 1) file modification is not atomic; 2) the changed role reported by the FileWatcher is role5 for both the file truncation and subsequent write.

Currently the PR tries to fix item 1. But we could also fix it with item 2. Given the original file content is:

role5:
  cluster: ...
    - 'MONITOR'

We could modify it to be

role5x:
   cluster: ...
     - 'ALL'

Note that we change the role name from role5 to role5x so that FileWatcher will report file trunction with role5 and subsequent write with role5x. And the code would be changed to something like the follows

store = new FileRolesStore(settings, env, watcherService, roleSet -> {
                modifiedFileRolesModified.addAll(roleSet);
                if (roleSet.contains("role5x")) {
                    modifyLatch.countDown();
                }
            }, new XPackLicenseState(Settings.EMPTY), xContentRegistry());
...
modifyLatch.await(1, TimeUnit.SECONDS);
assertEquals(2, modifiedFileRolesModified.size());
assertTrue(modifiedFileRolesModified.contains("role5x"));
descriptors = store.roleDescriptors(Collections.singleton("role5x"));
assertThat(descriptors, notNullValue());

This version is still slightly different from the current one, but is less so compared to the file move fix. Would you be more comfortable with this approach?

@albertzaharovits
Copy link
Contributor

I like the idea to count down the latch based on the role name @ywangd !

As a further improvement, I would suggest we always append dummy marker role at the end of the role file and count down the latch when we spot it, even in the append cases.
This relies on the fact that roles are parsed in the order they are defined in the file, so that we can be assured that the roles preceding the dummy role have all been parsed completely (there is currently the theoretical risk that a role is not read completely, even in the append without truncate case).

This is different from what you're suggesting because the dummy marker role has the sole purpose of counting down the latch, and it is not used to verify the parsing.

Let me know what you think about this.

@ywangd
Copy link
Member Author

ywangd commented May 12, 2020

@albertzaharovits I updated the PR with the countdown latch change as discussed. I didn't add dummy marker roles since I don't think they are necessary.

I understand the intention of the marker role is to ensure anything comes before it is already parsed when the marker role appears in the changset. However, if we can guarantee that each role name is only reported once by the FileWatcher, we can already be sure when a name appears in the changeset, it is fully parsed and ready to be asserted.

The original issue was because the same name was sometimes reported twice in both truncation and modification. Also we cannot differentiate them because other than the name, there is no context attached to it.

  • For tests use just append, each appended role is unique. So when it is in the changeset, it is ready to be asserted. If FileWatcher is triggered before the newly appended role is fully written, the parsing fails and nothing will be reported. The FileWatcher will be triggered again when the write is fully completed and append role will then be reported.
  • Similarly, for tests use truncation and then add new role names, when the new role name is reported, it must have be fully written and recognised by the FileWatcher and parser. We can just assert it without need for a marker role.

I hope this makes sense. Thanks!

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks Yang!

@ywangd
Copy link
Member Author

ywangd commented May 12, 2020

After some more thoughts, I decided to use dummy marker roles for the two tests involves truncation. Even though they are not technically necessary, they help to maintain the semantics better. The two tests had the intention to ensure an existing role is preseved if it is not part of the trunction or is updated if it is modified. So they do imply that the role names are the same before and after file truncation/update. Hence the marker roles are helpful for keeping use the same role name, i.e. role5. Also dropped code of asserting exact content of the change set because it can be different due to non-atomic file operation descibed above. Sorry for the back and forth. I appreicate all the discussion and I believe this should be the final version.

modifiedFileRolesModified.addAll(roleSet);
modifyLatch.countDown();
if (roleSet.contains("dummy2")) {
modifyLatch.countDown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal but you can maintain the modifiedFileRolesModified.addAll(roleSet); from before and assert it contains role5.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back.

@albertzaharovits
Copy link
Contributor

The two tests had the intention to ensure an existing role is preseved if it is not part of the trunction or is updated if it is modified

Good point!

@albertzaharovits
Copy link
Contributor

LGTM Thank you Yang!

@ywangd
Copy link
Member Author

ywangd commented May 13, 2020

@elasticmachine update branch

@ywangd ywangd merged commit 23095d4 into elastic:master May 13, 2020
ywangd added a commit to ywangd/elasticsearch that referenced this pull request May 15, 2020
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
ywangd added a commit to ywangd/elasticsearch that referenced this pull request May 15, 2020
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
ywangd added a commit that referenced this pull request May 15, 2020
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
ywangd added a commit that referenced this pull request May 15, 2020
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jun 10, 2020
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
ywangd added a commit that referenced this pull request Jun 11, 2020
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team >test Issues or PRs that are addressing/adding tests v7.8.1 v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FileRolesStoreTests#testReload fails
4 participants