Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HasAPIKey adds huge performance (speed) hit #173

Closed
jorgenbuilder opened this issue Aug 23, 2021 · 32 comments
Closed

HasAPIKey adds huge performance (speed) hit #173

jorgenbuilder opened this issue Aug 23, 2021 · 32 comments
Labels
help wanted Extra attention is needed
Milestone

Comments

@jorgenbuilder
Copy link

I'm seeing a huge performance hit when adding the HasAPIKey permission class. I've isolated it down to toggling the following setting:

REST_FRAMEWORK = {
    'DEFAULT_PERMISSION_CLASSES': [],
    # 'DEFAULT_PERMISSION_CLASSES': ['rest_framework_api_key.permissions.HasAPIKey'],
}

Which gives the following results:

# Without API Keys

Testing Resource #1...
	Done in 0.09s

Testing Resource #2...
	Avg after 10 attempts: 0.35s

Testing Resource #3...
	Avg after 10 attempts: 0.11s
# With HasAPIKey

Testing Resource #1...
	Done in 0.55s

Testing Resource #2...
	Avg after 10 attempts: 0.83s

Testing Resource #3...
	Avg after 10 attempts: 0.46s

I wasn't able to find any similar tickets. Has anyone seen this behaviour before?

@florimondmanca florimondmanca added help wanted Extra attention is needed question Further information is requested labels Aug 24, 2021
@florimondmanca
Copy link
Owner

Hello,

Thanks for putting this up.

Some questions for consideration:

  • What's the scale of your situation? (Eg dealing with 10 API keys versus 10,000)
  • Is it an usual or unusual situation for Django permissions to cause performance reductions like this? Eg is this visible with regular user-based permissions?
  • If so, do people deal with this in general?
  • If not, maybe the DB query performed in our case is unoptimal, and we'd need to dig in to find out more.

@timsluis
Copy link

timsluis commented Oct 3, 2021

I'm seeing the same behaviour, a more or less 5x performance drop. Scale: just a few API keys.

@hyzyla
Copy link

hyzyla commented Dec 14, 2021

I have the same problem. Memory profiler silk gives hint that problem is related with cryptography

from silk.profiling.profiler import silk_profile

class TerritoryAPIPermission(HasAPIKey):

    @silk_profile(name='TerritoryAPIPermission.has_permission')
    def has_permission(self, request: HttpRequest, view: APIView) -> bool:

        # always allow performing request when browsable api rendered is selected
        renderer = getattr(request, 'accepted_renderer', None)
        if isinstance(renderer, BrowsableAPIRenderer):
            return True

        # in other cases verify API token
        return super().has_permission(request, view)

image

image

@hyzyla
Copy link

hyzyla commented Dec 14, 2021

So, one for improving performance of key verification, you can tweak PASSWORD_HASHERS for using different hash method.

https://security.stackexchange.com/questions/246062/pbkdf2-usage-will-slow-rest-api-down

https://docs.djangoproject.com/en/4.0/ref/settings/#std:setting-PASSWORD_HASHERS

@florimondmanca
Copy link
Owner

florimondmanca commented Dec 18, 2021

@hyzyla Right, that makes sense, thanks a lot for the profiler run.

I think there's a meaningful discussion to have on whether that "hash API keys so they aren't readable in the database" idea was a good one. This is done for passwords, but users log into a website once and then reuse a session cookie. Or they log into an API, and get an access token they can reuse for subsequent calls. Checking API keys by hash is indeed inefficient in this regard. It's possible we may want to drop this (and store API keys in clear in the database) in a future major release... Along with solving #128 and some other touchy issues.

It's also possible we decide this is OK enough, but should be documented… In the end, 100ms added to a given API call might only be a problem if an API user is trying to smash it every < 1s or so (a frontend would be expected to do this depending on the app, but I think we document that this package is not aimed at frontend auth?).

And/or we could turn to a slightly different storage pattern, e.g. something based on HMAC, and see how it compares?

@florimondmanca
Copy link
Owner

florimondmanca commented Jun 4, 2022

It's possible about half of the effect could come from #150, which likely results in performing the API key checking twice in most cases.

Still, the bulk of this behavior is indeed explained by hash checks between the given API key and the hash stored in the database. I still believe encrypted API keys are the way to go, as they're effectively passwords.

At this point, we'd probably want to compare the behavior across different hashers, such as PBKDF2 vs argon2.

@florimondmanca florimondmanca removed help wanted Extra attention is needed question Further information is requested labels Jun 4, 2022
@florimondmanca
Copy link
Owner

florimondmanca commented Jun 4, 2022

I did some basic benchmarking on my end using the test_project in this repo, slightly edited to add a few routes:

# views.py
from rest_framework.request import Request
from rest_framework.response import Response
from rest_framework.views import APIView

from test_project.heroes.permissions import HasHeroAPIKey


class PublicAPIView(APIView):
    def get(self, request: Request) -> Response:
        return Response({"message": "Hello, world!"})


class ProtectedAPIView(APIView):
    permission_classes = [HasHeroAPIKey]

    def get(self, request: Request) -> Response:
        return Response({"message": "Hello, world!"})


class ProtectedObjectAPIView(APIView):
    permission_classes = [HasHeroAPIKey]

    def get(self, request: Request) -> Response:
        self.check_object_permissions(request, object())
        return Response({"message": "Hello, world!"})
# urls.py
from django.contrib import admin
from django.urls import path

from . import views

urlpatterns = [
    path("admin/", admin.site.urls),
    path("api/public/", views.PublicAPIView.as_view()),
    path("api/protected/", views.ProtectedAPIView.as_view()),
    path("api/protected/object/", views.ProtectedObjectAPIView.as_view()),
]

And running this script:

import httpx
import statistics

api_key = "<Hero API key obtained via test_project admin>"


def api_key_auth(request):
    request.headers["Authorization"] = f"Api-Key {api_key}"
    return request


def timeit(url, auth):
    times = []

    with httpx.Client() as client:
        for _ in range(10):
            response = client.get(url, auth=auth)
            times.append(response.elapsed.total_seconds())

    m = statistics.mean(times)
    s = statistics.stdev(times)

    print(f"{url=:<20}: {m=:.3f} seconds ({s=:.3f} seconds)")


timeit("http://localhost:8000/api/public/", auth=None)
timeit("http://localhost:8000/api/protected/", auth=api_key_auth)
timeit("http://localhost:8000/api/protected/object/", auth=api_key_auth)

Results (in seconds):

  • /api/public/ (no API key checking): 0.005s (std=0.003) (very low because of being on localhost)
  • /api/protected/:
    • Argon2: 0.197, std=0.040
    • PBKDF2: 0.363, std=0.093
    • BcryptSHA256: 0.425, std=0.050
  • /api/protected/object/ (Argon2): 0.316 (std=0.039)

(Reference: Result w/o API key permissions: 0.005 (std=0.003) (on localhost).)

So:

  • Performance hit is indeed reduced (about 40-50%) when using Argon2 instead of the default PBKDF2 hasher (i.e. placing Argon2 higher in PASSWORD_HASHERS), as @hyzyla suggested.
  • As I assumed, the included .has_object_permissions() implementation does result in duplicate work, which would explain larger performance hit on detail API views. This hints us to move forward on Default has_object_permission implementation in BaseHasAPIKey is redundant #150.

@davidfischer
Copy link
Contributor

These password hashers are slow (very roughly ~100ms+) by design to prevent brute forcing. However, #150 isn't the only place where the hasher is called a second time. Something like the guide's recommendation for getting the API key in the view will also hash the key again:

def get(self, request):
"""Retrieve a project based on the request API key."""
key = request.META["HTTP_AUTHORIZATION"].split()[1]
api_key = APIKey.objects.get_from_key(key)
project = Project.objects.get(api_key=api_key)

HasAPIKey should probably cache the key somehow (on the request?) if the permission check is successful.

@florimondmanca
Copy link
Owner

florimondmanca commented May 23, 2023

Right. Everything that calls api_key.is_valid(...) will trigger a hash computation and result in the added latency.

A few possibilities...

  • Pass data from permission to request, and from request to the manager. I'm a bit far from Django now. Is there an idiomatic way in Django for permission classes to make data available to requests? Like a "state" on the request?
  • Cache API keys internally on the manager (inside .get_from_key()), using a cache key that spans a given request (set via a middleware). "I've already seen this key during this request, and I know it's valid/invalid."
  • Add a simple LRU cache of size 1 on .get_from_key()? This is viable if we know that the method will be called consecutively with the same key during request handling — twice per permission check, once if the key is retrieved in a view.

I think the LRU cache option is the most compelling?

@davidfischer
Copy link
Contributor

If you're sure that the key has already been checked, worst case you can just do a database lookup by prefix which will be much faster than re-hashing.

Is there an idiomatic way in Django for permission classes to make data available to requests? Like a "state" on the request?

I wouldn't say there's one way to do this. Probably the simplest would be to set request.api_key or you could use a thread local or something like that.

@davidfischer
Copy link
Contributor

I think there's a meaningful discussion to have on whether that "hash API keys so they aren't readable in the database" idea was a good one.

The more I think about this, the more I think this is the real question we should be answering and I think using a much faster hasher is probably the answer. The purpose of very slow password hashers is to protect the passwords if the database leaks somehow. These hashers will make brute forcing any passwords from the hash much more costly. However, this module is generating those keys randomly and the secret part not stored in the DB is 32 alphanumeric characters long (62^32 ~= 190 bits of entropy). That's too much entropy to be brute forced or precomputed.

As a result, you could probably just use sha256 or sha512 on the generated key. That would drastically mitigate the performance hit and reversing one of those hashes to a 190bit key should be impossible (famous last words). That might remove the need for caching the key on the request. For reference, django-rest-knox, a similar module but it ties keys to users, is just using sha512 on the generated key and storing it.

@florimondmanca
Copy link
Owner

florimondmanca commented May 25, 2023

@davidfischer Now, that is a great point. At the level of entropy of our randomly generated secret keys, even a regular SHA hash algorithm would do the trick as far as preventing access to the secret keys from someone with access to the database is concerned.

So, it looks like switching hashers could resolve this issue…

The remaining questions to figure out are:

  • How much of a performance boost can we expect (some testing would be appropriate)
  • How to ensure hashes are properly migrated. Do we just switch the hasher, issue a new release and Django will magically upgrade hashes? Or do we need something else either from us, or from users with extra docs?

@davidfischer
Copy link
Contributor

  • How much of a performance boost can we expect (some testing would be appropriate)

Testing is appropriate but I'd expect that almost all of the performance hit you measured above (#173 (comment)) would be mitigated.

Performing a sha512 is very fast:

$ python -m timeit -s "import hashlib" "h=hashlib.sha512();h.digest()"
500000 loops, best of 5: 701 nsec per loop

You might still save the auth token model on the request if successful. For example, django-rest-knox does that.

  • How to ensure hashes are properly migrated. Do we just switch the hasher, issue a new release and Django will magically upgrade hashes? Or do we need something else either from us, or from users with extra docs?

This will get a little tricky, I think. The KeyGenerator class probably needs to be changed. It needs to detect if the hash is in the old format and then call Django's check_password. If successful, it'll save it in the new format. KeyGenerator becomes something like this:

from django.utils.crypto import constant_time_compare

class KeyGenerator:
    # ... other methods

    def hash(self, value: str) -> str:
        hash = hashlib.sha512(value.encode()).hexdigest()
        return "sha512$$%s" % hash

    def verify(self, key: str, hashed_key: str) -> bool:
        if hashed_key.startswith("sha512$$"):
            # New simpler hasher
            return constant_time_compare(self.hash(key), hashed_key)
        else:
            # Slower password hashers from Django
            result = check_password(key, hashed_key)
            if result:
                # UPDATE PASSWORD to use self.hash()
                # OR maybe we can pass custom `setter` option to `check_password`
                pass  
            return result

I considered a custom sha512 password hasher, but I don't think it actually does much. You can't use check_password since you can't pass a custom hasher there and since you can't add the hasher to Django's PASSWORD_HASHERS setting because it's not appropriate as an actual password hasher. It does let you use make_password though:

import hashlib
from django.contrib.auth.hashers import UnsaltedSHA1PasswordHasher, make_password

# The parent will be removed in Django 5.1
# so you probably don't want to rely on it.
class SHA512Hasher(UnsaltedSHA1PasswordHasher):
    algorithm = "sha512"
    digest = hashlib.sha512

    def encode(self, password, salt):
        if salt != "":
            raise ValueError("salt is unnecessary for high entropy API tokens.")
        hash = self.digest(password.encode()).hexdigest()
        return "%s$$%s" % (self.algorithm, hash)

encoded = make_password('mysecrets', hasher=SHA512Hasher())

At this stage, I feel like I'm writing a bunch of custom security critical code and getting many more eyes on this would be good before we go too far.

@AlexIvanchyk
Copy link

AlexIvanchyk commented Jul 14, 2023

The issue lies in the BaseHasAPIKey permission class, particularly in the has_permission method where an API key is validated. In high-traffic applications, this validation process can be CPU-intensive, especially considering the fact that Django Rest Framework checks permissions multiple times per request. This results in significantly slowed request times.

To alleviate this issue, I propose implementing a caching mechanism within the permission check. Below is a sample implementation:

from django.core.cache import cache

class BaseHasAPIKey(permissions.BasePermission):
    model: typing.Optional[typing.Type[AbstractAPIKey]] = None
    key_parser = KeyParser()
    CACHE_TIMEOUT = 60 * 5  # 5 minutes, adjust as necessary.

    def get_key(self, request: HttpRequest) -> typing.Optional[str]:
        return self.key_parser.get(request)

    def has_permission(self, request: HttpRequest, view: typing.Any) -> bool:
        assert self.model is not None, (
            "%s must define `.model` with the API key model to use"
            % self.__class__.__name__
        )
        key = self.get_key(request)
        if not key:
            return False
        # Add caching
        cache_key = f'api_key_validity:{key}'
        is_valid = cache.get(cache_key)
        if is_valid is None:
            is_valid = self.model.objects.is_valid(key)
            cache.set(cache_key, is_valid, self.CACHE_TIMEOUT)
        return is_valid

    def has_object_permission(
        self, request: HttpRequest, view: typing.Any, obj: AbstractAPIKey
    ) -> bool:
        return self.has_permission(request, view)

In addition, we'd need to implement cache invalidation via Django signals whenever the API key is updated, deleted, or revoked:

from django.core.cache import cache
from django.db.models.signals import pre_delete, pre_save
from django.dispatch import receiver

@receiver(pre_save, sender=AbstractAPIKey)
def invalidate_api_key_on_save(sender, instance, **kwargs):
    cache_key = f'api_key_validity:{instance.key}'
    cache.delete(cache_key)

@receiver(pre_delete, sender=AbstractAPIKey)
def invalidate_api_key_on_delete(sender, instance, **kwargs):
    cache_key = f'api_key_validity:{instance.key}'
    cache.delete(cache_key)

This proposed solution employs Django's caching framework to reduce the frequency of CPU-intensive password checking operations, and automatically invalidates the cache whenever necessary.

Please note that caching sensitive data such as API keys should be considered carefully with regard to potential security implications. In this case, we are caching the validation result (a boolean value) rather than the key itself, which should mitigate most risks.

I hope you find this information and proposal useful. I believe this change could significantly improve the performance of Django applications using djangorestframework-api-key, particularly under high traffic loads.

@florimondmanca florimondmanca added the help wanted Extra attention is needed label Jul 14, 2023
@florimondmanca
Copy link
Owner

Thanks @AlexIvanchyk

Two questions from this

  • Would it require action from users to set up caching if it hasn’t been set up already? Or is an in memory cache used by default?
  • Perhaps the CACHING_TIMEOUT option should be retrieved from a setting by default, eg API_KEY_CACHING_TIMEOUT

I have a low amount of bandwidth to move things forward on this repo, but I’d be happy to review PRs if folks would like to put in the work to resolve this issue. I do think this would be a significant improvement.

@AlexIvanchyk
Copy link

@florimondmanca Thank you for considering the proposed enhancement and your willingness to review PRs related to this issue.

Regarding your questions:

  1. If caching is not set up already: Django includes a simple memory-based cache backend (using a Python dictionary) out of the box, which doesn't require any setup. However, it's local-memory based and hence doesn't persist between server restarts or across different server instances, which makes it less ideal for production environments. For users who haven't set up caching yet, they would have to do so if they wish to make use of this performance improvement.
  2. Regarding CACHE_TIMEOUT: I absolutely agree. The cache timeout should ideally be a configurable setting to provide flexibility based on the use case. The code could default to a pre-determined value if no setting is provided. Here's how it could look:
from django.conf import settings

API_KEY_CACHE_TIMEOUT = getattr(settings, 'API_KEY_CACHE_TIMEOUT', 60*5)

In addition, the PR for this feature would need to include:

  • Changes to the permission class and the signals as proposed.
  • Documentation updates, including a section on the need for caching and instructions on how to set it up.
  • Tests to ensure the caching mechanism works correctly and invalidates as expected.

I appreciate your openness to this potential improvement and look forward to contributing further towards this project.

@florimondmanca
Copy link
Owner

@AlexIvanchyk Hey, so coming back to this…

  1. Great; so it doesn’t require an intentional cache setup to work at least in degraded mode, which avoids any breaking change or heavy requirements for users. That’s perfect.
  2. Yup, agreed with your proposed implementation of retrieving the setting.

Overall I think this is well thought out and a very beneficial change for all users.

We should be ready to move to implementation. Are you up for giving it a go? I don’t have much time to dedicated to the project in terms of new code developments, but I’ll be there for reviews.

@AlexIvanchyk
Copy link

@florimondmanca Hey,
Got it! I'll implement the described solution and once it's ready, I'll provide the final version for your review. Let's get this done!

@florimondmanca florimondmanca modified the milestone: v2.3.0 Aug 15, 2023
@AlexIvanchyk
Copy link

@florimondmanca Implement API key caching and update documentation

Pull request summary:

  1. Dynamic Caching Mechanism: Introduced the CacheMixin, enabling dynamic key generation based on the class name. This facilitates model/permissions inheritance, allowing users to extend the model, e.g., adding a company field for data access restrictions.
  2. Dynamic Cache Invalidation Signals: Implemented signals for cache invalidation, which automatically bind to all models derived from the AbstractAPIKey model.
  3. Testing: Incorporated tests to validate cache invalidation mechanisms.
  4. Documentation: Updated the relevant documentation to reflect these changes.
  5. Base Class Refactoring: Made improvements and revisions to the BaseHasAPIKey class.

I'd also like to share some results from tests conducted on my active microservices. Notably, multiple permissions are assigned to each endpoint in my setup. These permissions encompass both the validation of a global key read-access permissions, and specific company access permissions. Remarkably, on an entry-level instance with a modest processor, I've achieved over a 15-fold reduction in query execution time using this approach. I believe this enhancement is significant and warrants prompt integration.

@beda42
Copy link

beda42 commented Aug 30, 2023

Hi, I have just stumbled upon the performance issue as well.

From my point of view, the issue seems quite clear - the PBKDF2 and other password hashers are really not a good match for the task at hand. Switching to sha2, sha3 or blake2 hashing (something bundled with Python) should get rid of the whole problem without any drawback as already pointed out by @davidfischer. Caching the keys may sidestep part of the issue, but does feel like a potential source of security issues.

I would be willing to cooperate on making the switch to "normal" hashing possible. @davidfischer do you plan to create a PR? It seemed like you have something going a while ago...

@AlexIvanchyk
Copy link

Hi @beda42, thank you for your input on this issue.

Backward Compatibility: Switching the hashing algorithm is an interesting suggestion but it introduces a backward compatibility issue. Existing applications using the current hashing mechanism would require migration, and this could be a non-trivial task for large-scale, distributed systems.

Security Concerns with Caching: I appreciate your concern about security issues related to caching. However, it's important to note that we're caching the validation result (a boolean value), not the API key itself. This approach aims to balance performance and security. While no system can be 100% secure, this method minimizes the attack surface.

Multiple Permission Checks in DRF: You're correct that using a less CPU-intensive hashing algorithm would improve performance. However, this doesn't address the issue of Django Rest Framework checking permissions multiple times per request, which is the crux of the performance bottleneck. Implementing caching would mitigate this problem more comprehensively, without requiring users to switch hashing algorithms and thereby lose backward compatibility.

@AlexIvanchyk
Copy link

Hi again,

I appreciate the thoughtful discussion. To strike a balance between performance and security, what if we introduce a configurable setting for the hashing algorithm and caching? This would provide the flexibility to choose a hashing algorithm based on the application's requirements.

Here's what I envision:

Configurable Hashing Algorithm: A setting that allows users to select from a range of hashing algorithms (PBKDF2, sha2, sha3, blake2, etc.). This will enable users to opt for a less CPU-intensive hashing mechanism if they so choose.

Optional Caching: Another setting that would allow users to enable or disable caching based on their needs. This way, if someone is particularly concerned about the potential security implications of caching, they can simply disable it.

This approach would offer maximum flexibility and let developers choose the right configuration for their specific use case, be it higher security or better performance.

Would love to hear your thoughts on this.

@beda42
Copy link

beda42 commented Aug 30, 2023

Backward Compatibility: Switching the hashing algorithm is an interesting suggestion but it introduces a backward compatibility issue. Existing applications using the current hashing mechanism would require migration, and this could be a non-trivial task for large-scale, distributed systems.

Well, as the key itself is not stored in the database, changing the hash for existing records is not possible unless a request containing the key is made. So the hashes will be updated on the fly and no migration will be required. Or the new hashing algorithm may be only used for new keys leaving the old ones completely untouched. Having this configurable via settings would probably be the best option.

Security Concerns with Caching: I appreciate your concern about security issues related to caching. However, it's important to note that we're caching the validation result (a boolean value), not the API key itself. This approach aims to balance performance and security. While no system can be 100% secure, this method minimizes the attack surface.

The fact that only a boolean result of the validation is stored in the cache is exactly why it feels like a potential security problem to me. Normally only the database is the source of truth when it comes to authentication, but here a second source of truth is created and an attacker capable of modifying the cache can open access to the system. I am not sure if all developers pay the same attention to securing their redis cache as they do with their database.

Anyway, making this optional sounds like the best solution.

Multiple Permission Checks in DRF: You're correct that using a less CPU-intensive hashing algorithm would improve performance. However, this doesn't address the issue of Django Rest Framework checking permissions multiple times per request, which is the crux of the performance bottleneck. Implementing caching would mitigate this problem more comprehensively, without requiring users to switch hashing algorithms and thereby lose backward compatibility.

My expectation of the performance after the hashing algorithm is changed, is to make the hashing faster than the database lookup for the key, which will effectively remove it from play. Please note that PBKDF2 is designed to be slow by repeatedly hashing the input - more than 200,000 times in current versions of Django - so the actual cost of hashing can be brought down by several orders of magnitude by using a plain hash instead of a "repeated" one. So this is not about changing one hash for another - it is about completely changing the method from a deliberately slow one to a super fast one.

So I would guess that if a simple hash it used, the caching will become effectively obsolete and it will have no measurable performance benefit.

@AlexIvanchyk
Copy link

AlexIvanchyk commented Aug 30, 2023

@beda42
Thank you for your detailed response. Your points are valid and worthy of consideration. Let's go through them:

Backward Compatibility & Hashing Algorithm: Having a configurable hashing algorithm could be a good middle-ground solution. You're correct that changing the hashing algorithm may not require database migration if implemented carefully. However, we'd have to ensure a seamless transition for existing applications, perhaps by supporting both the new and old algorithms in parallel for a period of time.
We could store an additional field in the key model to denote the hashing algorithm used. By default will be old one. This way, upon key lookup, the system could use the appropriate algorithm for validation or after validation for old algorithm we will do updated on the fly.

Security Concerns with Caching: Your point about the cache as a potential attack surface is well-taken. Ideally, the database should remain the single source of truth for authentication. Making caching optional through settings could be a way to give developers more control based on their security requirements.

Multiple Permission Checks & Performance: You make a compelling argument about the dramatic speed differences between the current hashing algorithm and a simpler one. While I agree that a simpler hashing algorithm would speed up the process, it won't solve the problem of multiple permission checks by DRF within a single request. Though, I acknowledge that if the hashing becomes faster, the impact of multiple checks might become negligible. It's worth noting that when an application uses multiple permission classes, the total number of hashing operations can become quite substantial. This could lead to a noticeable performance hit, even with a highly optimized hashing algorithm. Caching offers a way to eliminate the computational cost of these verifications altogether. This holds true even when using the most efficient hashing algorithms available. By caching the validation result, we can drastically reduce, if not entirely remove, the overhead introduced by the numerous hashing operations, making the system more resilient to high traffic loads.

In light of your valuable input, I propose a nuanced strategy that incorporates multiple solutions:

  1. Add an extra field to the key model indicating the hashing algorithm in use, with the default set to the current algorithm. This would allow dynamic algorithm selection during key validation. For keys using the older algorithm, we could update the hash in real-time after successful validation. BLAKE2 seems to be the most suitable option. It's fast without compromising on security. My only concern is that users will need to do migrate command.
  2. Provide an optional setting for caching, letting users with stringent security concerns disable this feature as needed.

@florimondmanca
Copy link
Owner

This discussion is extremely interesting, thank you everyone.

I’d like to note that I think the current implementation makes use of Django’s password management helpers, which include a hash upgrade if the configured hasher changes.

I don’t remember if the hasher used is the one used for regular user passwords, or if PBKDF2 was used explicitly. One would need to dive into the code.

But this could reduce the migration burden for users if we go for a faster hash algorithm.

@beda42
Copy link

beda42 commented Aug 30, 2023

@AlexIvanchyk

In light of your valuable input, I propose a nuanced strategy that incorporates multiple solutions:

1. Add an extra field to the key model indicating the hashing algorithm in use, with the default set to the current algorithm. This would allow dynamic algorithm selection during key validation. For keys using the older algorithm, we could update the hash in real-time after successful validation. `BLAKE2` seems to be the most suitable option. It's fast without compromising on security. My only concern is that users will need to do migrate command.

The hashed_key field, which is currently populated by the Django password hashers, already has information about the used hasher. The format is hash_algorithm$iterations$hashed_key, so something like pbkdf2_sha256$260000$OZ5OPvHiRZfIZ3nMBBAcZc$qjVVTR7vPwe3/9ifFfRk4zI+dH5yv…. So we can easily identify the hasher which was used to create the hash to correctly check it. We could then adopt the same approach and store the new record like for example blake2$$IZ3nMBBAcZc$qjVVTR7vPwe3/9ifFfRk4zI+dH5yv…. This way, it would be possible to change the hashers during the lifetime of the app without any need for migration. In fact, this is how Django deals with hasher (and hasher parameter) changes when managing passwords (https://docs.djangoproject.com/en/3.2/topics/auth/passwords/#password-upgrading), so it is a tried and proven strategy.

BTW, @davidfischer already proposed a very good piece of code above which demonstrates exactly this approach.

As for hash algorithm selection, I also like BLAKE2, but ideally it should be configurable to give users an option while choosing a good default.

@AlexIvanchyk Do you plan working on this part of the task?

2. Provide an optional setting for caching, letting users with stringent security concerns disable this feature as needed.

Sounds good. Even though I would much more prefer to have it opt-in rather than opt-out - both because it is more prudent not to silently introduce stuff with potential security consequences, and because it relies on the cache which may be unexpected in an external library.

@AlexIvanchyk
Copy link

@florimondmanca and @beda42,

  1. If the existing codebase makes use of Django's built-in password management utilities, it should already have a built-in mechanism for upgrading hash algorithms when the configured hasher changes. This feature could simplify the migration process, allowing for a smoother transition to a faster hashing algorithm. I would appreciate it if either @davidfischer or @beda42 could create a pull request to explore this further.

  2. Just to update everyone, I submitted a pull request nearly two weeks ago with cache mechanism implementation. I can modify it to include a setting that disables caching by default. Once that's done, perhaps we can proceed to the review stage, followed by migration planning and eventually releasing a new version.

Since these can be absolutely two independent solutions, I would like users to be able to use caching solution already.

@beda42 beda42 mentioned this issue Aug 30, 2023
@beda42
Copy link

beda42 commented Aug 30, 2023

I have created a draft MR for the "plain" hash hashing. It is intended more as a starting point for discussion rather than complete solution. I would be happy for any feedback.

@AlexIvanchyk
Copy link

Hey @florimondmanca,

I've submitted a new commit for pull request that introduces optional settings for caching. I believe this feature could offer more flexibility for users who want to make use of caching in different scenarios.

Could you please take a moment to review the changes? Here's the link to the PR: #242

I'm looking forward to your feedback.

@florimondmanca
Copy link
Owner

florimondmanca commented Aug 31, 2023 via email

@davidfischer
Copy link
Contributor

I created a PR (#244) for using faster hashers than the existing password hashers. It also transparently upgrades any old-format-hashes it finds in a similar way to what Django does.

@florimondmanca
Copy link
Owner

florimondmanca commented Sep 12, 2023

Given that #244 has been merged and will soon be released, I believe we can consider this issue closed now.

Feedback will be gladly appreciated when 2.4.0 is out (in the coming days).

Big thanks to everyone involved. It's been a long due for sure, but a great final collaboration effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants