-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure Session token remains safe with multiple processes. #236
Comments
Refer to https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/210#note_664718851 for some context. |
The primary thing to understand here is that the multi-process situation here is about the PK client, no the PK agent. We have a client-server architecture now between clients and agents. Every single CLI call This means there are going to be multiple PK client processes running simultaneously. The PK GUI is also a PK client, it's just a long-running one. Therefore we need to ensure that the session token usage by the PK clients are safe across multiple simultaneous process executions and that means we need simulation tests inside |
A diagram should be made for this and ported to the sessions wiki that shows the component of multiple clients, a single agent, and the usage of a common session key stored at the HOME directory or client state directory that is being used to authenticate. A second component diagram can be used to represent a multi-step process (similar to a sequence diagram) but with labelled ordinal arrows. |
This should illustrate the entire architecture. In addition to this, one should use the same diagram, but illustrate a labelled ordinal actions. I have to redo the diagram cause asciiflow broke.. so the above is a snapshot. The labelled ordinal action diagram is actually a "top-down" sequence diagram. It's the application of the sequence of steps to a component diagram. An example is this: https://datatracker.ietf.org/doc/html/rfc6749#section-1.2. The advantage of using this over a sequence diagram are:
Disadvantage is:
So the normal sequence diagram is better for fewer agents and complex sequences and sequence nesting. While top down sequence is better for multiple agents, and in order to map the a high level sequence to multiple components. |
Because each The token is updated on every single call. The PK agent returns a new valid token to be used. In our case, we expect that any call holds a lock to the token, and if another process discovers that the lock is held, it simply drops and cancels the session token update request. That way the token is only updated by the first process that is starting a call, one the process finishes their call, they should release the token file lock. All of this logic should be inside the |
Diagram fixed up and with extra details.
|
The https://github.com/mafintosh/fd-lock library is a much simpler locking library. Since the concurrency management only requires you to check if the lock is in fact locked and not have to wait/block for the lock to be unlocked, then it can be a much simpler library to use instead of proper-lockfile. BTW I don't think proper-lockfile uses fd-lock. The proper-lockfile is much older, bigger attack surface. So it's better to use something simpler. |
@CMCDragonkai I've remade your diagram above with sequence numbers included. This is my understanding of what safety of the session token should look like across multiple processes. Please let me know if I'm off about anything.
Note that the second and third PolykeyClients must also lock |
Just a quick comment, in this diagram, you don't need the other 2 PolykeyClient. Focus on just the interaction between client and agent. Then you can bring up the second client to show the locking interaction. |
So just like this? Or are the arrows labelled 4 and 5 also not necessary?
|
Also I had a look at |
As 2 separate diagrams. |
That's fine. We don't need to wait for it to be locked. As soon as it is locked we cancel our update operation. No need to determine who locked it, you just have to ensure that there's a try finally block that unlocks the file.
You only update the token when you have the lock. You don't wait if it is already locked.
On 29 September 2021 10:27:16 am AEST, emmacasolin ***@***.***> wrote:
Also I had a look at `fd-lock` and I don't think it will work for our purposes. If a file is already locked then you can't lock it again, and there's no way to determine who locked the file and what processes are still using it. All it offers is locking and unlocking so we would need another way to track which processes are still reliant on the file before we unlock and refresh the token.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#236 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
But what if the first client (who locks the file) finishes before the second client?
In this case, wouldn't B now be using an invalid token? |
No the JWTs are valid on a time-basis. So B client still works, it doesn't need to wait. The JWTs only become invalid if we reset the session key.
On 29 September 2021 11:21:10 am AEST, emmacasolin ***@***.***> wrote:
But what if the first client (who locks the file) finishes before the second client?
For example:
1. A locks the file
2. B starts running a command using the token stored in the locked file
3. A finishes its process, unlocks the file, and refreshes the token. It has no way of knowing that B is still using the original token
In this case, wouldn't B now be using an invalid token?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#236 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
We may be using a file locking in the agent for the agent lockfile. It might be using the same library. Even in that case fd-lock should be sufficient too so we reduce the complexity of our dependencies. Although I wonder how difficult it is to extend it in the future to enable blocking waits. |
Ahh ok I think I understand now. So in the first diagram we have a single agent and client interaction:
And then in the second diagram is two clients operating at the same time:
|
The MR for this has been merged now. There were a few things that I'll be continuing over in #240 (tests for renewal of the session token) but otherwise there are only a couple outstanding issues:
|
Regarding the second kind of tests, why not just do a unit test where you lock the session file and then use fd-lock and test whether the file is locked or not. That should work.
For concurrent, it's considered experimental but is it expected to be standard later? We're using a bunch of experimental features in js-async-init anyway.
On 8 October 2021 1:18:30 pm AEDT, emmacasolin ***@***.***> wrote:
The MR for this has been merged now. There were a few things that I'll be continuing over in #240 (tests for renewal of the session token) but otherwise there are only a couple outstanding issues:
- The tests for concurrent commands use jest's `concurrent` feature at the moment - this feature is considered experimental so we may run into issues with it in the future. It works really well at the moment but we should eventually look into alternatives.
- The current tests are unable to test the actual locking and unlocking of the session file - I looked into mocking the `Session` class in order to achieve this, however decided to leave it for now. We'll need to write tests for this in the future. At the moment we're assuming that the locking function is working due to the concurrent process tests passing, but we don't know for sure.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#236 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
@emmacasolin before closing this issue, can you update the session wiki reference with the diagrams you did above. Suggest you update to using arrow labels on the side rather than putting labels inline. |
Then confirm and close. |
In the future remember to cross reference the MR and PR. |
Yeah I could add in a test that does that, but I was referring more to the order in which the locking and unlocking happens and being able to test that it's happening while a process is occurring - for example being able to pause the process during its execution and check that the session file is locked, then letting the process finish and checking again that the session file is unlocked.
I couldn't find anything confirming or denying this, but if I had to guess I'd say it will probably eventually become standard. There are still a few outstanding issues with it (for example, concurrent tests currently don't wait for And yep I'll fix up the diagram labels and then add them to the session wiki right now. |
The Sessions wiki (https://github.com/MatrixAI/js-polykey/wiki/Sessions) has been updated with the diagrams from above, so this issue can now be closed with this comment. |
In terms of pausing the process, this can be done by executing a separate node script that holds the lock and then prompts the user. But this we can assume works already upstream.
Will be more of an issue if we extend the C++ to enable blocking locking.
On 11 October 2021 8:20:28 am AEDT, emmacasolin ***@***.***> wrote:
>Regarding the second kind of tests, why not just do a unit test where you lock the session file and then use fd-lock and test whether the file is locked or not. That should work.
Yeah I could add in a test that does that, but I was referring more to the order in which the locking and unlocking happens and being able to test that it's happening while a process is occurring - for example being able to pause the process during its execution and check that the session file is locked, then letting the process finish and checking again that the session file is unlocked.
>For concurrent, it's considered experimental but is it expected to be standard later? We're using a bunch of experimental features in js-async-init anyway.
I couldn't find anything confirming or denying this, but if I had to guess I'd say it will probably eventually become standard. There are still a few outstanding issues with it (for example, concurrent tests currently don't wait for `beforeAll()` or `afterAll()` blocks so I had to create my own async functions that are awaited at the beginning of each concurrent test and resolved at the end of each before/afterAll block) so it's very likely that there will be changes to the way it works that might affect the tests I wrote, but we can deal with that if/when it becomes an issue.
And yep I'll fix up the diagram labels and then add them to the session wiki right now.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#236 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Regarding wiki...
I don't think the "Agent starts up a new Client" is correct. The Client is started from the CLI, starting clients have nothing to do with the Agent. Can you update this? |
The session doesn't request a valid token from the session manager. The session only just reads and writes tokens. It also contains the session token data. The PolykeyClient is what requests the token from the agent. |
It actually unlocks the session file as soon as the write is completed. PK clients can send repeated calls to the agent, each time we get a new token from the agent, and the PK client can initiate a write to the session file. |
I'm writing these comments in relation to the code by the way. I know I wrote before:
It turns out that this isn't necessary. It's only when the returned metadata exists with the new session token data do we try to acquire a lock and write the updated session token file. Only at that point is the lock required. |
Note that with https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213 merged, the behaviour of locking in |
An MR has been created for this issue here: https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/212
Specification
The Session
Session.writeToken
insrc/sessions/Session.ts:116
makes use ofproper-lockfile
. However we need to create a test to make sure this functions properly when multiple processes are attempting to update the token file.The
proper-lockfile
library is fairly old, and is more complex than what we require. As such, we should switch to usingfd-lock
instead.fd-lock
only provides two methods:lock
andunlock
. There is no explicit method to check whether a file is already locked, however, since a file can only be locked if it is not already locked then we can simply attempt to lock the file - if we fail, the file is already locked.Additional context
imagine 2 calls
that's in the case of a serial calls but in parallel calls
so at the end of parallel calls, the token state us still T2, but that's fine as T2 is still valid, and T3 is simply dropped
Information on
fd-lock
can be found here.Tasks
fd-lock
as an alternative toproper-lockfile
tests/client
and/ortests/bin
).The text was updated successfully, but these errors were encountered: