Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auth post-mortem (Pyramid 2.0) #3422

Closed
luhn opened this issue Nov 18, 2018 · 53 comments · Fixed by #3465
Closed

Auth post-mortem (Pyramid 2.0) #3422

luhn opened this issue Nov 18, 2018 · 53 comments · Fixed by #3465

Comments

@luhn
Copy link
Contributor

luhn commented Nov 18, 2018

Hi guys! I'm interested in taking a crack at reworking the auth system for Pyramid 2.0, based on the Auth API Post-Mortem. Would you be open to starting a discussion?

I like Pyramid a lot and would love to see a new auth API in 2.0, as mentioned in #2362. Hopefully I'm not being too brash, barging through the front door as a first-time contributor and asking to remodel kitchen. I want to be respectful of your time; once we hash out the details I'm confident I can make it happen independently with minimal time investment from you.

@digitalresistor
Copy link
Member

We'd be more than happy to see someone take it on and try it. The only concerns that we have from core is that we want to make sure that it is backwards compatible and that may be an issue since the API's don't map well to each other.

It's something that has been discussed quite a bit on IRC, and in private, but there are some other open issues around authn/authz, mostly related to being able to access the request object.

@luhn
Copy link
Contributor Author

luhn commented Nov 18, 2018

Hmm... Backwards compatibility will be a bit awkward, but I think doable. I'll see if I can sketch something up.

@luhn
Copy link
Contributor Author

luhn commented Nov 18, 2018

Okay, here's a quick sketch: master...luhn:auth-sketch

  • Old-style auth interfaces are kept the same
  • New-style auth interfaces are INewAuthenticationPolicy and INewAuthorizationPolicy (awkward, but inevitable for strict bw compat)
  • Secure view deriver exclusively uses new-style auth
  • New-style compatibility policies are implemented (and would be the default) which shims the old-style policies
  • The final product would include an ACL helper as discussed in the post-mortem.

The new interfaces deviate from the post-mortem, since I figured this sketch was a chance to exercise some creativity/opinions. The sketch could be adjusted to better match the post-mortem if that's what the core wants.

  • I did not drop AuthenticationPolicy in favor of IdentityPolicy, it helps minimize the differences between the new and old auth.
  • I merged unauthenticated_userid and authenticated_userid into get_user. Not sure how this is going to be received... I personally have never found unauthenticated identity (i.e. unauthenticated_userid) useful. Plus, it's the authentication API, isn't unauthed anything out of scope?
  • I used get_user instead of get_userid because whether the auth policy returns an ID or a full user object is IMO an implementation detail. (Really just semantics, it doesn't influence the implementation)
  • I ignored authorized_userid because I don't get it. The user is authorized... to what? Authorization is only meaningful within context, which this method is lacking. Furthermore, the post-mortem argues that simplicity is key, so why have two methods in the authz policy when you only need one for the framework to do it's job?

Apologies if I'm overstepping my bounds.

@digitalresistor
Copy link
Member

Some comments:

  • request.user is not something that we are likely to accept. That is something that plenty of authors are already using for extension points for their own apps, and it is unlikely we want to take that for the framework.
  • remember/forget don't work well for API's. This is something that has been something we've wanted to fix for a while, specifically, right now you can't use an authentication policy with remember where you want to send the information not via a header but instead want to do so in the response body. We would want to provide some way to change that/upgrade it so that it can be used as such.
  • get_user returning an object is not something we would want to codify unless the framework itself makes use of it.
  • authenticated_userid/unauthenticated_userid do have a purpose, and should likely continue existing. I personally don't use the distinction and only use authenticated_userid. That distinction might allow you to parse a JWT token for example and return an userid, whereas authenticated_userid does a database call to verify that the user is who they say they are and are still validly authenticated (they haven't been fired, or removed for example). This allows you to have some paths where you may be returning less sensitive data where you need to see if the user has a token vs a valid token.
  • authorized_userid in the proposal allows the authorization policy to decide whether the user that is authenticated according to the authentication policy has the rights required to be considered "authorized" for this request. Just because a user is "logged in" does not mean they are authorized to do what they are requesting to do.

@luhn
Copy link
Contributor Author

luhn commented Nov 19, 2018

Thank you for your comments!

  • I figured request.user was a long shot, I'll take it out. If we implement IdentityPolicy.identify, will we be adding a request.identity? I also assume we'd be adding request.authorized_userid?
  • I saw that comment on Pyramid 2.0 possible feature list #2362 about remember/forget. I'm definitely open to changing that while I'm working at this, but I didn't have any ideas myself. Suggestions?

After reading your feedback, I do think I got ahead of myself with that sketch. My apologies. I should make sure I understand the post-mortem a bit better.

So, looking at the sample application in the post-mortem, it shows authorized_userid checking identity against a dictionary. I assume in a real-world application it might be checking identity against a database. So it seems authorized_userid is taking over the responsibility of authenticated_userid?

@luhn
Copy link
Contributor Author

luhn commented Nov 19, 2018

Edit: You can probably skip this, I don't think I agree with it anymore.

Okay, I’ve spent quite a bit of time revisiting your feedback, the post-mortem, and my own opinions.

I think the post-mortem is on track with the Identity interface, the framework should be able to provide built-ins that cover the majority of use cases (SessionIdentityPolicy, AuthTicketIdentityPolicy, AuthBasicIdentityPolicy, etc.) What to do with remember and forget can be discussed further later.

The idea of Authorization being driven by a permits function that takes in an identity and a context and returns Allow or Deny is simple and elegant, made powerful by the availability of an ACL helper. It’s very much what I expect from and love about Pyramid.

The thing I do take issue with is authorized_userid. As I argued previously, you can’t have authorization without context, which that method does not have. If you look at the sample application, the method checks the identity against the database (in this case a dictionary) and is used to populate request.user. So what we have is authenticated_userid by another name.

Having authenticated_userid method in an interface called IAuthorizationPolicy is obviously at odds. What I see as solutions:

  • Make a generic IAuthPolicy interface that handles both authentication (authenticated_userid) and authorization (permits). Personally my least favorite option. However, this does result in something that is the same as the post-mortem, just with different names.
  • Merge authenticated_userid and IIdentityPolicy to make IAuthenticationPolicy. In my mind, this does make some sense because identity and authentication tend to come as a pair. It also means IAuthenticationPolicy doesn’t change in 2.0 except for the removal of effective_principals, which makes backwards compatibility much more easy and elegant. Despite being fairly similar to the legacy authentication, I think it still addresses the primary concerns of the post-mortem.
  • Split authenticated_userid into IAuthenticationPolicy. This means the auth framework is now three interfaces/knobs (identity, authn, authz), which may seem excessive. However, this means authentication and authorization can now be implemented as single functions (and identity can be plug-and-play), so I think this can actually lower the barrier to entry for auth.

@luhn
Copy link
Contributor Author

luhn commented Nov 19, 2018

Reviewing the implementation of the builtin authentication policies, it looks like authenticated_userid is implemented using the groupfinder callback. This means option two does not address the concerns of the post-mortem.

@luhn
Copy link
Contributor Author

luhn commented Nov 19, 2018

One thing that I find hard to square is Pyramid's idea that authenticated_userid should involve querying the database for the user. The idea, as far as I understand it, is that maybe the user has been deleted. I don't think that's the responsibility of the authentication. The user's identity claim is still true, it's just no longer relevant to the system, and therefore they no longer allowed access, which is the concern of authorization.

I guess that's what the post-mortem is getting at with authorized_userid, so I suppose I agree with it more than I think.

Look at me, saying "minimal time invest from you" and here I am writing walls of text only to reach where you guys have been all along. Sigh.

@luhn
Copy link
Contributor Author

luhn commented Nov 19, 2018

Just in case I haven't annoyed everybody yet, I'm going to try again.

I propose we remove authorized_userid entirely, for the following reasons:

  • As I'm sure you can tell by my wall of text, it can be confusing.
  • As far as I can tell from the sample application, it's not used by the auth framework at all. It seems rather "un-Pyramid-like" to enforce something that isn't used. A user can easily implement request.authorized_userid or request.user or whatever they please fairly easily (and most are probably already doing this).
  • By reducing IAuthorizationPolicy to one function, an auth policy can be written with a simple function. I find that idea quite appealing.

Whether or not it's included has negligible effect on the implementation, so I'll get started now regardless.

@digitalresistor
Copy link
Member

@mmerickel may have some ideas and thoughts too.

I'd also like to ping @dstufft if he's willing to give his feedback on how the current authn/authz is fairing in warehouse.

@mmerickel
Copy link
Member

Well there's a lot to talk about with the security apis.

I think one of the major questions we need to ask is whether we want to redefine the protocol between the authn and authz policies. Right now they communicate via prinicipals (a list of strings). This is restrictive, things like macaroons can not be distilled down into a declarative list. A more flexible system could support a richer protocol here. I think a common request would be for some form of opaque object specific to the identity policy. For example, there are things right now that are lost when using repoze.who or authtkt, etc because they are capable of encoding extra data alongside the userid but when we use request.authenticated_userid that is not data that is available without using some policy-specific extra apis.

The system we have right now:

 +----------+                       
 | request  |                       
 +----------+                                         
       |                            
       v                            
+------------+     +---------------+
| principals |---->| authorization |
+------------+     +---------------+

A possible improvement:

+----------+                        
| request  |---------------+        
+----------+               |        
      |                    |        
      v                    v        
+----------+       +---------------+
| identity |------>| authorization |
+----------+       +---------------+

This is one of the major recognitions of the post-mortem. That the identity policies should be highly reusable, more opaque and that the authorization policy is the extension point where users should write their own code to handle the identities and translate them into something meaningful.

The second major goal I have is that remember / forget should be more useful for non-cookie-based authentication. I'd like to come up with some way to define them so they can return the token or body instead of only headers. For example a dict with headers, token, and body, content-type keys. There is not a standard way to construct a response using the token key, but headers or body+content-type could be used to define a response object.

Basically all of this is in line with the post mortem, and the issue has always been how to implement this in a bw-compatible-ish way. It does not need to be backward compatible, but there must be a clear migration path for people to use their old setup with pyramid without rewriting the actual policies.

Anyway this is just me getting some thoughts down to get involved. I hope it is somewhat helpful.

@mmerickel
Copy link
Member

I would think request.authorized_userid would be a thing or we would just keep it named authenticated_userid and request.has_permission would remain unchanged. request.identity or request.user_identity or request.authenticated_user might also be a thing we'd want to expose.

request.unauthenticated_userid is a thing that's hard to just drop... we use it heavily in our tutorials as the primary extension point for an authentication policy. We'd need to look at what the new patterns would be in the new api and then evaluate how to get from here to there.

@luhn
Copy link
Contributor Author

luhn commented Nov 20, 2018

I'm happy to help make something happen with remember/forget as well. I think I'll open up a new issue for that discussion, so in my mind it's a separate beast.

I glad to hear strict bw compat isn't a requirement, I'd like to break it a little if only so we don't have to append "new_" to everything we do.

I agree on the identity being an opaque object. One thing I really like about the post-mortem API is IdentityPolicy.identify doesn't dictate what kind of object should be returned, just that it should identify the user.

@luhn
Copy link
Contributor Author

luhn commented Nov 20, 2018

Or maybe instead of an opaque object, there could be a UserIdentity object, which has an userid property but otherwise behaves like a dictionary? Or an IUserIdentity which just defines id and the rest of the implementation is up to the policy? Let's the framework know one pertinent thing about the user (the ID), while giving flexibility for policies to pass along all the data it has.

@luhn
Copy link
Contributor Author

luhn commented Nov 20, 2018

Okay, I took some ideas swimming around my head (splitting auth into three parts; user identity object) and vomited them into Python: https://gist.github.com/luhn/4cba3eeb26c295ce70e2feb4da04e757

I'm not sure how much I like it myself. Are there too many knobs? Is IUserIdentity really the way to go? Does map nicely to the old auth API.

@dstufft
Copy link
Contributor

dstufft commented Nov 20, 2018

Forgive me for the organization of this, it's very brain dumpy.

Personally, I've never quite understood the use case of request.unauthenticated_userid compared to request.authenticated_userid. This particular API has always felt somewhat dangerous to me, because depending on your IAuthenticationPolicy this may be coming directly from the user with little or no validation that the user didn't just YOLO enter anything they wanted. The most egregious case of this is with basic auth, where anyplace you're using request.unauthenticated_userid you may be just using something that the user typed into their basic prompt with zero actual validation.

In Warehouse, we've made an effort to remove any security footguns like that, and for Basic Auth request.unauthenticated_userid does the same thing as request.authenticated_userid. Ultimately having both of them doesn't really bother me I guess? I can work around the potential issues , I just never understood in what cases it was actually desirable to have the current semantics.

One thing we've added to all of the IAuthenticationPolicy implementations in Warehouse, is that attempting to authenticate with them always adds a Vary header specific to that authentication policy (Vary: Authorization for basic, Vary: Cookie for session). This works currently just by adding a response callback, but it may make sense to bake this idea into "core" interface more concretely. OTOH this may be a place where the distinction between "I'm getting a userid to make meaningful decisions based on it" and "I'm getting a userid for logging or something" actually is helpful.

In Warehouse we're using pyramid_multiauth to handle the fact that we have multiple ways to authenticate (session, basic, macaroon in the near future). I'm not sure if keeping this implemented as a meta IAuthenticationPolicy makes sense, or if it deserves to be baked into the core framework more. One shortcoming of using it and the current setup is that you don't really know how a given userid was found. Switching to allowing some sort of object instead of effectively only strings does mean that it could return an object with some sort of policy attribute attached to it.

One major concern I have with splitting up of IIdentifyPolicy and IAuthenticationPolicy, is that I keep coming back to not really understanding the point of splitting the concept of an unauthenticated userid from an authenticated one. The most recent example by @luhn shows a TrustAuthenticationPolicy which is documented as being usable when your identity provider is tamper-proof. This feels like a recipe for disaster when switch your IIdentifyPolicy around (or add one) and then forgot to update your IAuthenticationPolicy. It seems to me that if there is some sort of a dependency between the two interfaces such that you have to be careful what combination you're using things in, then those items belong together rather than separate.

I think I feel like overall, the current IAuthenticationPolicy is not actually that bad, and the main thing I'd really agitate for change in, is getting ready of the idea of principals, and instead just returning some sort of object that is intended to act as a stand in for the user's identity, which can have additional data associated with it (such as what policy was used to authenticate the user). Thus I would look at trying to arrive at something like (forgive the non z.i interface, it's early and excuse the naming):

class IUserIdentity:
    id: Any

class IAuthenticationPolicy:
    # It's possible we could just say that this returns ``Any``, and treat the object
    # it returns as completely opaque. The only reason not to do that is if any part
    # of Pyramid needs to *actually* know the structure of this object.
    def __call__(self, request: Request) -> Optional[IUserIdentity]:
        ...

This simplifies the authentication policy, making it dead easy to write your own. Of course Pyramid can provide some stand in ones (like it does currently), but those policies wouldn't need to be extensible like they currently are, because they're simple if someone needs something special from them, they can just write their own.

Now this does get rid of the distinction between unauthenticated_userid and authenticated_userid. I personally feel like that is a distinction worth getting rid of. However, if it's decided that Pyramid should still have this concept, then I would just extend the IAuthenticationPolicy to have two methods (instead of one) like it does currently, one for each. Doing that does make implementing an IAuthenticationPolicy more challenging though, so perhaps it could be an optional, additional method for unauthenticated_userid?

A quick look at how we might implement these things as they stand.

@attr.s(auto_attribs=True, frozen=True, slots=True)
class UserIdentity:
    id: Any

class SessionAuthenticationPolicy:
    def __init__(self, prefix='auth.'):
       self._prefix = prefix
        self._userid_key = prefix + 'userid'

    def __call__(self, request) -> UserIdentity:
         if self._userid_key in request.session:
            return UserIdentity(id=request.session.get(self._userid_key))

class BasicAuthenticationPolicy:
    def __init__(self, check: Callable[[str, str, Request], UserIdentity], realm='Realm'):
        self._check = check
        self._realm = realm

    def __call__(self, request) -> UserIdentity:
        credentials = extract_http_basic_credentials(request)
        if (
            credentials
            and self._check(credentials.username, credentials.password, request)
        ):
            return self._check(UserIdentity)

@attr.s(auto_attribs=True, frozen=True, slots=True)
class MacaroonUserIdentity:
    id: Any
    macaroon: Macaroon

class MacaroonAuthenticationPolicy:
    def __init__(self, get_key: Callable[[Macaroon, request], bytes]):
        self._get_key = get_key

    def __call__(self, request) -> UserIdentity:
        # extract_http_macaroon is a fake method that returns a high
        # level macaroon object that has a number of supported functionality
        # that none of the current Macaroon libraries currently implement.
        macaroon = extract_http_macaroon(request)
        # For @mmerickel, verifies the data serialized in the identifier, but not
        # the entire Macaroon or the caveat language.
        macaroon.verify_data(key=self._get_key(macaroon))
        return MacaroonUserIdentity(
            id=macaroon.data["user.id"],
            # Macaroon's user identity object also includes the macaroon that
            # we authenticated the user from, this is because attempting to
            # authorize the user has to be further attenuated by the Macaroon
            # itself, so simply using the user's ID is not enough information.
            macaroon=macaroon,
        )

As I write this, I realize that this most closely resembles the IIdentityPolicy in the given gist. However, I feel like it differs in the intent. The idea here is that the user id given back to us from this API has been verified for authenticity in whatever way that policy allows (checking password for Basic, unneeded for Session, verifying the data in the Macaroon). We never, ever return a user identity that hasn't been verified authentic (this is different from having been verified it still exists in some persistent data store).

Now, there is the fact we no longer have a way for user's to specify they want to specifically access the user, but only if it still exists in their persistent data store. Personally I feel like this probably can roughly be equated into two categories:

  • A site that never wants to be able to access a userid that may no longer exist in their persistent data store.
  • A site that sometimes wants to check that, and sometimes doesn't.

For the first case, a very trivial wrapper policy can implement that easily, for instance in Warehouse we could implement it like:

class DatabaseUserAuthenticationPolicy:
    def __init__(self, policy):
        self._policy = policy

    def __call__(self, request):
        identity = self._policy(request)
        if identity is not None:
            if request.db.query(User).filter_by(id=identity.id).first():
                return identity

For the second case, It seems like that distinction could just as easily be determined by calling a function instead of a property on the request. So something like:

def identity_in_db(request, identity):
    return bool(request.db.query(User).filter_by(id=identity.id).first())

def my_view(request):
    # Some code where I don't care if the user exists in the database anymore.
    print(f"The user {request.identity.id!r} attempted to do something")
    
    # Some code where I *do* care if the user exists in the database:
    if identity_in_db(request, request.identity):
        return {"user.id": request.identity.id}
    else:
        return {}

I feel like this is a lot clearer and will involve a lot less custom code having to be dealt with inside of the IAuthenticationPolicy because while determining if something still exists in some persistent data store is pretty much always going to be very specific to the site itself, pulling a user identity out of the session, basic auth, a macaroon, an authtkt, or whatever is unlikely ever going to be very specific to that site. I feel like ditching the distinction between unauthenticated and authenticated user id's makes the whole API a lot easier to reason about (it's always "safe" to user the user id given in the identity, it just might not always still exist in the database, versus now where unauthenticated_userid is sometimes "safe" and sometimes completely YOLO).

This is somewhat similar to how the above proposal works, except it wants to standardize some sort of API (and thus an interface) that will provide a standard way to access the request identity, but after first verifying it still exists in the database. I can understand that desire, if there is an API that we need to provide in Pyramid that needs to check that. Otherwise I feel like the above is sufficient, and we don't need to provide an interface to let people avoid writing a 2 line request property if they really want it.

Moving on to Authorization. The biggest deficiency in Warehouse's view in the current API is that it more or less assumes that given a set of principles, the permissions granted to those principles are more or less static, or at least are properties only of the context the permission is being asked of, and that the request itself cannot further attenuate the permissions.

Smaller complaints about the AuthZ framework is that much like the AuthN policy, the use of principles feels particularly constraining here, and I feel like it would be much nicer to simply operate on an identity, and let AuthZ policies implement principles if they make sense. The fact that principles are generally strings, means that even if yo work around the request problem above with a thread local, you're likely going to be duplicating work (and possibly getting things wrong if if you're combining something like pyramid_multiauth with a AuthZ policy that is only willing to work with a particular type of user identity).

The API in the proposal above for AuthZ looks reasonable to me honestly, something like:

class IAuthorizationPolicy:
    def __call__(context, identity, permission, request):
        ...

I would probably argue over the ordering of the arguments ;) but otherwise it looks sane to me. You can easily implement a current gen Pyramid style ACL policy on top of this by moving the groupfinder function into the ACLAuthorizationPolicy. More importantly I think this allows more complex cases like:

class MacaroonAuthorizationPolicy:
    def __init__(self, policy):
        self._policy = policy 

    def __call__(context, identity, permission, request):
        if hasattr(identity, "macaroon"):
            macaroon = identity.macaroon
            # Actually go through and provide all of the information
            # needed to verify our caveats, this could be something
            # like checking that the given permission is in the list of
            # permissions in a permissions caveat, checking the request
            # IP address to ensure it came from the correct time, checking
            # some property of the object, or whatever.
            if not macaroon.verify():
                return Denied
        
        # Dispatch to the underlying AuthZ policy, that implements AuthZ in
        # whatever way makes sense.
        return self._policy(context, identity, permission, request)

Hopefully this sprawling brain dump has been generally useful!

@mmerickel
Copy link
Member

@dstufft thanks, this is great. I think everyone pretty much agrees we should stop requiring the policy to return a list of principals for flexibility. A user identity object seems like a good way to go here so long as the framework still has some way to get a string representing the user which it can use for logging/debugging purposes.

I feel like ditching the distinction between unauthenticated and authenticated user id's makes the whole API a lot easier to reason about (it's always "safe" to user the user id given in the identity, it just might not always still exist in the database, versus now where unauthenticated_userid is sometimes "safe" and sometimes completely YOLO).

The entire point of unauthenticated_userid is that it is not like this at all. Unauthenticated just means it hasn't been verified by the application, but it should be trusted by the policy. The distinction is important. The goal is to make an authentication policy reusable. There are two steps 1) pull the identity from the request (standard, reusable logic) 2) decide if you actually want to trust it (extensible per-application logic). As such there is the unvalidated identity pulled from the request which is exactly unauthenticated. You can then extend the policy and define what authenticated means to your application, while at the same time knowing the policy that's giving you the identity to know what unauthenticated really means. Your example where you only check the identity_in_db in one spot is what we're trying to avoid - this is strictly more insecure-by-default than what we propose where there's a clear extension point to validate it on every request. Most applications would want to check that on every request, even if in some apps you can be more relaxed about it due to policy. This is the split between the identity pulled from the transport and the identity you want to use in the app. You are not the only one who keeps saying unauthenticated_userid serves no purpose and I can't get my head around that line of thinking if you realize it's just the part of the policy that is focused on transport. Maybe there could've been a better name for it but it serves a real purpose in the abstraction. I agree that all applications should only use authenticated_userid in their code other than the one spot that converted from unauthenticated to authenticated maybe even by just copying it. This decision usually depends on which policy the identity came from in the first place.

I think the goal of the post mortem rings true here, and would help all of us get over this naming issue just by tweaking the location of responsibilities. The goal in my mind is that the IdentityPolicy is 100% reusable and focuses solely on the transport whereas the AuthorizationPolicy is 100% written by the user with the application constraints in mind. In this scenario it might not be necessary to expose the unverified identity on the request anywhere and instead Pyramid is responsible for just querying the value and passing it into the AuthorizationPolicy. With this in mind it might also be easier to find some better name for this policy to avoid clashing with the current AuthorizationPolicy for compatibility.

Another thing to consider is some form of multi-policy support integrated into Pyramid such that if we do keep an identity-vs-authorization split there is a frameworky way for the authorizer to know which policy the identity is from without relying on some attribute in the identity. It's just a thought and is probably too much for Pyramid to require right now.

@luhn
Copy link
Contributor Author

luhn commented Nov 20, 2018

Thanks @dstufft! I pretty much agree with you entirely. Your criticisms of my gist are valid and I think the proposal you lay out is quite elegant.

@luhn
Copy link
Contributor Author

luhn commented Nov 20, 2018

@mmerickel How do you feel about the IUserIdentity interface?

Edit: Nevermind, skipped over your bit "A user identity object seems like a good way to go here..."

@dstufft
Copy link
Contributor

dstufft commented Nov 20, 2018

I think the goal of the post mortem rings true here, and would help all of us get over this naming issue just by tweaking the location of responsibilities. The goal in my mind is that the IdentityPolicy is 100% reusable and focuses solely on the transport whereas the AuthorizationPolicy is 100% written by the user with the application constraints in mind. In this scenario it might not be necessary to expose the unverified identity on the request anywhere and instead Pyramid is responsible for just querying the value and passing it into the AuthorizationPolicy. With this in mind it might also be easier to find some better name for this policy to avoid clashing with the current AuthorizationPolicy for compatibility.

Yea, I would feel a lot less confused if request.unauthenticated_userid didn't exist, and that was solely used as the input to some interface that validated the user id. Perhaps that's the right way of looking at it? You end up with 3 types of interfaces:

  • An authentication or identity object whose goal it is to retrieve a user id from a given request.
    • This may just be a poor choice in the current BasicAuthenticationPolicy, but I feel like there should at least be a documented assumption that this is authenticated in some way. Not in the way that the current API means for unauthenticated vs authenticated, but like you should ideally not just blindly accept a value from the client. The example I can give here is that BasicAuthenticationPolicy currently just returns whatever username exists in the credentials without checking the password first, and I feel that is wrong. While we might not have run all of the validations against it, we should be assured that the userid coming from this API was, at one point at least, "real".
    • The Gist API calls this the identity interface. I don't really care about naming, but I feel like authentication is a better name for it.
  • A validator object, that takes the userid from the auth/identity interface, and validates it.
    • It might even be nice to support multiple validators, one common one would be the checking if the user-id still exists in the database (and presumably hasn't been disabled or something), but there may be additional validators that make sense as well. For instance, maybe the site has a maintenance mode, and when the site is in maintenance mode only certain users are allowed to be considered "logged in".
  • An authorization object, which takes a now validated user identity and checks if it is authorized to perform some action, on some object, in the context of some request.

I would then only expose off of the request, the validated identity object and not the unvalidated object. If someone really needs to get at the unvalidated object, they can call the authentication policy APIs themselves, but that should be a special case I think, not a use case to design the primary API around.

The distinction here between request.unauthenticated_userid and the output of the authentication/identity policy (the first object in my list) is that I think you should still perform some sort of authentication on the value prior to returning it. For session based, this would just be the fact the secure session ID exists (since guessing that is ~impossible) and might even include a signed cookie for the session id. For authtkt that would be the signature involved in that cookie, for remote users, that would be ~blindly accepting the value (because the assumption is that remote users already have been authenticated), for basic auth, that means actually checking the password to ensure that the credentials are accurate.

Basically the idea here is that the output should be more than just "claimed to be so and so" (as the current docs for request.unauthenticated_userid state, but should be something like "has valid credentials for user id XYZ", where credentials might be a session ID, an authtkt token, a macaroon, or a username/password and then the validator(s) that do things like check the database and such are there to ensure that those credentials are still valid (e.g. the user hasn't been deleted/disabled, the site hasn't had logins disabled, or whatever else).

@mmerickel
Copy link
Member

If you look through all the existing policies (basic, repoze.who, authtkt) they all have some larger object they could return instead of the lowly userid. I think that's the first step is just allowing them to return that object in a semi-standard format. In basic's case it'd be the (username, password) instead of just the username.

The validator object is what we currently call the groupfinder and it has definitely been a source of frustration to document. In the past few years I've modified the documentation to demonstrate subclassing instead. In this way you just define authenticated_userid and effective_principals yourself, directly, and it's been way more obvious (I think).

class MyAuthenticationPolicy(AuthTktAuthenticationPolicy):
    def authenticated_userid(self, request):
        unauthenticated_userid = self.unauthenticated_userid(request)
        if unauthenticated_userid is not None:
            # ... do stuff to authenticate and return a userid (it doesn't need to match the unauthenticated_userid)

    def effective_principals(self, request):
        principals = [Everyone]
        authenticated_userid = self.authenticated_userid(request)
        if authenticated_userid is not None:
            # grab some extra principals here
            principals.append(Authenticated)
            principals.append(f'user:{authenticated_userid}')
        return principals

The api surface that Pyramid really cares about to do its job is this ISecurityPolicy below. Maybe it was a mistake for Pyramid to break them apart and helpers should've just been defined to help a user compose one of these objects:

class ISecurityPolicy:
    def identify(self, request):
        """ Return a trusted and verified identity object."""

    def remember(self, request, identity):
        """ Return a dict containing headers, body, token that can be used in a
        response to inform the client how to keep a user logged in."""

    def forget(self, request, identity):
        """ Return a dict containing headers, body, token that can be used in a
        response to inform the client that they are logged out."""

    def permits(self, identity, context, permission):
        """ Return ``Allowed`` or ``Denied`` for the ``permission`` acting on ``context``."""

For example the ACLAuthorizationHelper mentioned in the post mortem, as well as the AuthTktCookieHelper that already exists could be combined into an implementation here. Note that when I define this api surface I'm intentionally ignoring IAuthorizationPolicy.principals_allowed_by_permission which I'd like to see removed.

@mmerickel
Copy link
Member

If the identity is a fully-fleged user object does anyone have compelling arguments why permits should accept the request object? I'd need to go through old issues and see, because certainly people have asked for it but nothing has stuck in my brain as being a compelling reason.

@dstufft
Copy link
Contributor

dstufft commented Nov 20, 2018 via email

@mmerickel
Copy link
Member

I'm interested if anyone has any opinions how a system would work that is more stateful. Some policy in charge of privilege escalation around login/logout, invalidating request.session, setting vary headers. I always handle these directly in my login/logout views and if identify is used then something should be setting Vary: Cookie on any response generated from that request. It may not belong in Pyramid, but it's a good thought experiment. Otherwise I think the ISecurityPolicy I defined above is fairly promising going forward. I'd want to look at some example implementations before making any final decisions though.

@luhn
Copy link
Contributor Author

luhn commented Nov 21, 2018

I think ISecurityPolicy generally addresses all the concerns brought forth, so I'm good with it.

Hard to imagine a stateful general-purpose security API, as authentication can be handled in such a myriad of ways. imo, even remember/forget is too much to be in the Pyramid API.

You make a good point about Vary, though. Maybe give the policy a hook to the response? authenticated_response(response), called after the response is generated iff request.authenticated_userid (or whatever secedes it) is invoked. Although that just feels icky... it seems surprising that a security policy would mutate the response.

@dstufft
Copy link
Contributor

dstufft commented Nov 21, 2018

@luhn
Copy link
Contributor Author

luhn commented Nov 21, 2018

I take back everything I said about Vary, @dstufft's example feels like the right way to do it.

Should something like request.add_response_callback(add_vary_callback("Authorization")) be included as part of the stock security policies?

@dstufft
Copy link
Contributor

dstufft commented Nov 21, 2018

The argument against it, is that you technically only need to Vary the response if the output of the response itself is altered by accessing the user id. If you're accessing the user id to do something like send an email, log the current user or something, then you're needlessly adding a Vary.

The argument for it, is that if you forget to Vary, you're allowing caches in between you and the client cache things and serve them to multiple users even if they contain private information.

@dstufft
Copy link
Contributor

dstufft commented Nov 21, 2018

For warehouse's specific case, You can't access the current user without decorating the view to state you're doing so, so it was easy for us to take the safe path and always add the Vary. Pyramid is a lot more general so that may not be a reasonable stipulation to make there (or maybe it is!).

FWIW, I think Django adds the Vary anytime you access request.user.

@mmerickel
Copy link
Member

@luhn I tend to agree about the remember / forget APIs, it's a hard problem. @mcdonc spent some time back around 1.5 trying to talk himself into request.remember(...) and such, but ultimately from what I recall it just doesn't map very well because the way the current APIs are structured it's not actually clear whether the user is logging in or not, or if they're just calling the function for fun. The main reason for that is that the policy has no concept of what response is being returned or whether it's related to the remember call. Similarly as @dstufft mentioned, it's not clear whether the response actually depends on the userid or not just because the request accessed it. I don't think there's a right answer here which is why it's really hard to define the behavior in Pyramid.

@mmerickel
Copy link
Member

mmerickel commented Nov 21, 2018

I think an interesting approach would be to define some api around privilege escalation that gets invoked. There could even be an event emitted. The event/api would require the response object that is triggering the change to be passed to it. Systems could listen for this event and do things like invalidate the current session.

@mmerickel
Copy link
Member

Does anyone think it's a bad idea to merge the two policies into one? I think it's the only sane thing to do when relaxing the contract to something more general than a list of strings. The advantage before was that they could be configured separately but that only works well if there is a standard protocol for them to communicate.

@dstufft
Copy link
Contributor

dstufft commented Nov 21, 2018

Can you be clear which two policies you're talking about merging together? There are a number of proposals in this thread, with varying sets of policies, so I just want to be clear :)

@mmerickel
Copy link
Member

Sorry, my focus was on the ISecurityPolicy I proposed above which replaces the IAuthenticationPolicy and IAuthorizationPolicy currently in Pyramid. That is not the final policy, for example it may be worth exposing a method to convert from identity to a userid string, and the remember/forget may wish to use a userid instead of identity object, but the basic idea is there.

@dstufft
Copy link
Contributor

dstufft commented Nov 21, 2018

My only concern with joining them is it would make multi policy harder I think? IOW in Warehouse the user id can come from multiple sources, but we really only want to have one set of authorization for the user (although that's not entirely true with Macaroons, but it is for session vs basic auth).

@mmerickel
Copy link
Member

In a system like that I'd expect the multiauth policy to wrap the identity in something and then communicate back to the appropriate policy to invoke permits. It depends on what contract the multiauth policy imposes on the identities. It's really up to the multiauth policy but in this system where we no longer have a list of strings, the format of the identity object is opaque and thus there is not a standard permits that can handle any identity. The multiauth policy could define its own permits assuming all identities conform to some contract but that's up to the multiauth policy.

@digitalresistor
Copy link
Member

While we are here... one of the things that regularly throws me off is that the auth policy is created once, and is not a factory.

If the auth policy was a per request factory, then we could store values on self instead of adding them somewhere in request.

This would also allow you to do isinstance type checking instead of returning some sort of type in the identity object for example.

It would also allow multi-auth more easily, since now your factory can determine what policy to use, and that can then be used to influence the authz policy more easily.

@luhn
Copy link
Contributor Author

luhn commented Nov 22, 2018

I can see what you mean. Due to the lack of reify on authenticated_userid (which I assume is by design), my auth policies are usually implemented to awkwardly squirrel away the user object in the request.

@luhn
Copy link
Contributor Author

luhn commented Nov 22, 2018

Kinda spitballing here, if there was a security policy factory, we could have ISecurityPolicy double as IUserIdentity. Something like:

class ISecurityPolicy:
    id = Attribute('The ID of the user authenticated in the current request')
    def permits(context, permission):
        "Return `Allow` if this user is allowed this permission"

class MySecurityPolicy:
    def __init__(self, request):
        self.request = request

    @property
    def id(self):
        return request.session['id']

    @reify
    def _user_obj(self):
        return db.query(User).filter(User.id == self.id).one()

    @property
    def email(self):
        return._user_obj.email

     def permits(self, context, permission):
         principles = self._user_obj.principles
         return acl_helper(context, principles, permission)

Configurator(security_policy_factory=MySecurityPolicy)

def view(context, request):
    return {'email': request.identity.email}

@mcdonc
Copy link
Member

mcdonc commented Nov 25, 2018

FWIW, even as a "Pyramid 2.0" sort of thing, I don't think the bw compat problems implied by changing to a single policy is worth it.

@mmerickel
Copy link
Member

FWIW, even as a "Pyramid 2.0" sort of thing, I don't think the bw compat problems implied by changing to a single policy is worth it.

I think it's too early to decide until we have a concrete proposal and it's still early days for that. I agree it's not something to be done lightly. The current system does have some limitations that have been identified and there are currently some definitively wonky patterns recommended to get around them. Even the current pattern used in the tutorials to achieve a request.user object and convert from unauthenticated_userid to authenticated_userid are circular and non-obvious. If we can clean those things up in a way we think is 100% better then I think it's worth it. I'll also note that the ISecurityPolicy proposal is backward compatible at least so far. One can be constructed under the hood that uses the policies configured in a 1.0 app.

@luhn
Copy link
Contributor Author

luhn commented Nov 27, 2018

The conversation has dried up, any further issues that need to be discussed?

I personally think it's worth discussing @bertjwregeer's suggestion of the security factory.

If everybody's had their say, it seems @mmerickel proposal of ISecurityPolicy with IUserIdentity is the path forward.

@mmerickel
Copy link
Member

I personally think it's worth discussing @bertjwregeer's suggestion of the security factory.

The issue with caching anything is what to do at the privilege boundaries and it is why, at least in part I suspect, that Pyramid takes the approach right now of not caching anything in the auth system. However, I do agree that it's worth exploring how this might work but it is tricky and ties directly into remember/forget.

If everybody's had their say, it seems @mmerickel proposal of ISecurityPolicy with IUserIdentity is the path forward.

I'm not really clear at all what the IUserIdentity is here, but I'm happy to see it play out and deal with it later.

@luhn
Copy link
Contributor Author

luhn commented Dec 3, 2018

The issue with caching anything is what to do at the privilege boundaries and it is why, at least in part I suspect, that Pyramid takes the approach right now of not caching anything in the auth system.

Yeah, I figured that was why authenticated_userid isn't reify'd. However, if using a security factory, the policy can implement caching as it sees fit (or not at all).

I'm not really clear at all what the IUserIdentity is here, but I'm happy to see it play out and deal with it later.

That would be the object an auth policy returns, instead of the ID returned now.

class IUserIdentity(Interface):
    id = Attribute("""ID of the user.""")

Good for policies that include extra metadata, like macaroons or (apparently) authticket. Also allows users to have the auth policy return a full user object if they so desire.

@mmerickel
Copy link
Member

If the only purpose of the identity is to standardize a single attribute then I'd probably push for a method on the policy like authorized_userid(user) that could be user-defined and take an object returned from identify and convert it to a string identifier. Notice how the policy is almost exactly the post-mortem except merged into a single object at this point, allowing the user to define their own contracts between authn and authz.

@ztane
Copy link
Contributor

ztane commented Dec 5, 2018

btw, to remind that permits should still get request one way or the other. Now it is kinda implied that context knows its request, or there's a threadlocal.

@luhn
Copy link
Contributor Author

luhn commented Dec 5, 2018

@mmerickel, I feel like that method is going to cause confusion, since it's not obvious what it's being used for.

Now that I think about it, what's the reason we need a user ID at all? The framework itself doesn't care (afaict) if the identity object is fully opaque, especially now that there's no boundary between authn and authz.

@mmerickel
Copy link
Member

@luhn I think it's highly useful for debugging but I don't have a problem with starting without it and seeing how things go?

@luhn
Copy link
Contributor Author

luhn commented Jan 18, 2019

Hey guys, I haven't forgotten about this, sorry it's been so long. Between holidays and work I've had my plate full, hoping to pick this up again soon.

@luhn
Copy link
Contributor Author

luhn commented Jan 20, 2019

I've reviewed the conversation and sketched this: master...luhn:auth-sketch-3

Authn+authz have been merged into ISecurityPolicy as discussed. I've included SessionSecurityPolicy as an example implementation and CompatibilitySecurityPolicy to show bwcompat. (It works out quite nicely.)

ISecurityPolicy.permits includes the request object, as requested by @ztane.

ISecurityPolicy.identify returns an opaque identity for "seeing how things go" with @mmerickel.

remember and forgot have been left mostly untouched. Please go to #3423 with any opinions.

@mmerickel
Copy link
Member

Awesome. I'd really like to hear from anyone who thinks this is worse than the current design which separates the two policies and enforces a transfer of principals between them.

@ztane
Copy link
Contributor

ztane commented Sep 23, 2019

Probably not worse than the current, but I expect to see frameworks have their own pluggable thingies to these... my biggest problem has always been that the __acl__ has never worked well in any use case that I've had; if the security policy gets tied tight then I expect to have to subclass the security policies to get rid of the __acl__ in permits.

@ztane
Copy link
Contributor

ztane commented Sep 23, 2019

But... at least with ISecurityPolicy and getting request into permit I can replace the entire thing wholesale...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants