Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature/Extensions] Follow-up: Implement Encryption/Decryption for principal identifier token #4485

Closed
DarshitChanpura opened this issue Sep 12, 2022 · 10 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request extensions

Comments

@DarshitChanpura
Copy link
Member

Current iteration for generating extension request identifier token returns a simple string as part of #4299

This should be replaced with a two-way encrypted token.

@peternied
Copy link
Member

@scrawfor99 I think this would be a good first issue to engage with around the new identity systems. Please feel free to tag @DarshitChanpura or myself if you have questions

@cwperks
Copy link
Member

cwperks commented Oct 6, 2022

Should we use a different set of certificates for internal nodes <-> extension nodes communication?

@peternied
Copy link
Member

Should we use a different set of certificates for internal nodes <-> extension nodes communication?

Yes, I would say that every extensions communication channel should have its own cert/exchanged secret to secure communications. See the Communication Security section from the extensions security considerations.

That said, I'm not sure how certificates relates to the the encrypted tokens, as the symmetric encryption process for tokens isn't making the 'token' context avaliable to an extension, but instead concealing them.

@cwperks
Copy link
Member

cwperks commented Oct 6, 2022

Thank you Peter. I want to make sure I am thinking about this issue the correct way. Is there a principal we can contain in a JWT and try out a library like JJWT (https://github.com/jwtk/jjwt) to create a cryptographically signed token?

We can pass the identity token in the payload of the JWT similar to what is show on the JWT.io site:

{
  "sub": "1234567890", // This would be the non-volatile identifier token of the subject
  "name": "John Doe",
  "iat": 1516239022 // iat === issued at
}

@cwperks
Copy link
Member

cwperks commented Oct 7, 2022

Looking at the security plugin codebase we already have an existing dependency on io.jsonwebtoken:jjwt-api:0.10.8 (See: https://github.com/opensearch-project/security/blob/main/build.gradle#L305).

@peternied peternied changed the title [Feature/Extensions] Follow-up: Implement Encryption/Decryption for extension identifer token [Feature/Extensions] Follow-up: Implement Encryption/Decryption for principal identifier token Oct 7, 2022
@peternied
Copy link
Member

Gah, that title was slightly misleading, restating the core problem we are trying to solve with the principal identity token are

  1. extensions need a way to differentiate User A from User B
  2. this mechanism needs to survive being (de)serialized
  3. the extension should not be able to use this token to identify the user unless the operations of the cluster have allowed it.

We already have principal as a string ([email protected]) within the context of OpenSearch, which could be passed to the extension, satisfying 1 & 2.

In order to make sure the token isn't usable on its own (3) we need a transformation process to obfuscate the value. Hashing the principal would for the most part work, however, we are without a way to convert the hash back into the original value without storing the principal and hash in some kind of cache/structure - this could get big for large sets of identities.

By using a cryptographic secret to encrypt the principal + salt (unique for each extension) and then provide that to the extension we have 1, 2, & 3, as the encrypted value can be decrypted when the extension gives the value back to OpenSearch for a user lookup.

We need no additional library since the class DC built PrincipalIdentityToken can be used to differentiate any old string, with these specific strings that have gone through the process.

@peternied
Copy link
Member

Here is a diagram of how this tokens would be passed around with extensions

sequenceDiagram
   autonumber
   participant User1
   participant OpenSearch
   participant EchoExtension
   participant OtherExtension
   User1 ->> OpenSearch: User Signs in, OpenSearch looks up the principal `1`
   OpenSearch ->> User1: Authentication flow complete
   User1 ->> OpenSearch: Call `echoOnce` with the value 'abc'
   OpenSearch ->> EchoExtension: `echoOnce` is handled by this extension, <br/> converts the principal `1` into the principal identifier token 'qwerty', sends value 'abc' and PIT 'qwerty'
   EchoExtension ->> EchoExtension: Process the request, store the PIT
   EchoExtension ->> User1: (Via ObenSearch) returns response 'abc'
   User1 ->> OpenSearch: Call `echoOnce` with the value 'abc'
   OpenSearch ->> EchoExtension: `echoOnce` is handled by this extension, <br/> converts the principal into the principal identifier token, sends value 'abc' and PIT 'qwerty'
   EchoExtension ->> EchoExtension: Processes the request, PIT found
   EchoExtension ->> User1: (Via ObenSearch) returns response 'We already echoed that'
   opt Scenario where principal identifier token are different for different extensions
      User1 ->> OpenSearch: Call `other` with no parameters
      OpenSearch ->> OtherExtension: `other` is handled by this extension, <br/> converts the principal into the principal identifier token 'l337', sends PIT 'l337'
      OtherExtension ->> User1: (Via ObenSearch) returns response
   end
Loading

@stephen-crawford
Copy link
Contributor

stephen-crawford commented Oct 7, 2022

Here is a diagram of how this tokens would be passed around with extensions

I am wondering if encryption/decryption is required here or just hashing. It seems like we only ever check the contents on the core side so that would mean that we could simply hash deterministically and check as needed. The extension needs to be able to recognize a principal but that does not mean the principal cannot be a hash just that it must be deterministic.

From above it seems like the concern with hashing is storage or memory use from needing to store the principal and hash but I am not sure how this is more space consuming than any other method on cores' side. We would need either:

  1. Store principal, extension id, secret -- rehash to check when needed, slower, but less storage
    or
  2. Store past hash and just compare

I do not know how much more expensive these options are over the alternative of storing the principal, extension, and secret --> decrypting when needed. This would use the same storage as situation 1 just opposite direction of computation.

Are we sure that hashing is more expensive then decryption?

@peternied
Copy link
Member

Consider a scenario where the extension is looking up the user for some kind of display purpose, the only information it will have is the PIT (principal identifier token), it will need to ask OpenSearch to provide the display name, which could change at any time.

This value can be looked up by the principal but if we only have a hash we will have to hash all possible principals until we find it, or we will need to keep a copy of the hash to look up the value. This might seem like a small number but some of our connected user systems will have 100,000 of people associated with them, a couple of bytes per user will add up very fast per each extension they use.

sequenceDiagram
   autonumber
   participant User2
   participant OpenSearch
   participant EchoExtension
   User2 ->> OpenSearch: User Signs in, OpenSearch looks up the principal `1`
   OpenSearch ->> User2: Authentication flow complete
   User2 ->> OpenSearch: Call `whoEchoOnce` with the value 'abc'
   OpenSearch ->> EchoExtension: `whoEchoOnce` is handled by this extension, <br/> converts the principal `2` into the principal identifier token 'tubba', sends value 'abc' and PIT 'tubba'
   EchoExtension ->> EchoExtension: Process the request, find the PIT for User1, `qwerty`
   EchoExtension ->> OpenSearch: Call 'whoIs" with the value 'qwerty'
   OpenSearch ->> EchoExtension: Returns 'User1'
   EchoExtension ->> User2: (Via ObenSearch) returns response 'User1'
Loading

@cwperks
Copy link
Member

cwperks commented Oct 10, 2022

This diagram sounds like an OAuth 2.0 flow where the first time the user interacts with an extension they should have the ability to grant or deny the extension from accessing info that the extension is requesting to use, like display name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request extensions
Projects
None yet
Development

No branches or pull requests

5 participants