-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] CIP-2: Auth Providers Proposal #986
Changes from 7 commits
f5fc3f2
394416c
30c5ee0
df2bb86
b90a525
a479d61
6644104
ea0f148
eced90d
1756a93
8b67cb8
e66edaa
9476c81
b446120
4b6485f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Chroma Server Middlewares | ||
|
||
## Basic Auth | ||
|
||
This is very rudimentary security middleware that checks Authorization headers for basic auth credentials. | ||
|
||
The basic auth user and pass are configured on the server side using `CHROMA_SERVER_AUTH_PROVIDER_CONFIG`. Make sure to | ||
also define the auth provider `CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.BasicAuthServerProvider"`. | ||
|
||
### Usage | ||
|
||
Start the server: | ||
|
||
```bash | ||
CHROMA_SERVER_AUTH_PROVIDER="chromadb.BasicAuthServerProvider" \ | ||
CHROMA_SERVER_AUTH_PROVIDER_CONFIG='{"username":"admin","password":"admin"}' \ | ||
ALLOW_RESET=1 \ | ||
IS_PERSISTENT=1 \ | ||
uvicorn chromadb.app:app --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config log_config.yml --reload | ||
``` | ||
|
||
Test request with `curl`: | ||
|
||
```bash | ||
curl http://localhost:8000/api/v1 -v -H "Authorization: Basic YWRtaW46YWRtaW4=" | ||
``` | ||
|
||
Test with client side auth provider: | ||
|
||
```python | ||
import chromadb | ||
from chromadb import Settings | ||
|
||
client = chromadb.HttpClient(settings=Settings(chroma_client_auth_provider="chromadb.auth.BasicAuthClientProvider", | ||
chroma_client_auth_provider_config={"username": "admin", "password": "admin"})) | ||
client.heartbeat() | ||
``` | ||
|
||
Test with Http Client and basic auth header: | ||
|
||
```python | ||
import chromadb | ||
|
||
client = chromadb.HttpClient(host="localhost", port="8000", headers={"Authorization": "Basic YWRtaW46YWRtaW4="}) | ||
client.heartbeat() | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,22 @@ | ||
from pydantic import BaseSettings | ||
from typing import Optional, List, Any, Dict, TypeVar, Set, cast, Iterable, Type | ||
from typing_extensions import Literal | ||
from abc import ABC | ||
import base64 | ||
import importlib | ||
import inspect | ||
import logging | ||
from overrides import EnforceOverrides, override | ||
import os | ||
from abc import ABC, abstractmethod | ||
from graphlib import TopologicalSorter | ||
import inspect | ||
from typing import Optional, List, Any, Dict, Set, Iterable, Union | ||
from typing import Type, TypeVar, cast | ||
|
||
import requests | ||
from overrides import override | ||
from overrides import overrides, EnforceOverrides | ||
from pydantic import BaseSettings, SecretStr | ||
from starlette.middleware.base import BaseHTTPMiddleware, RequestResponseEndpoint | ||
from starlette.requests import Request | ||
from starlette.responses import Response, JSONResponse | ||
from starlette.types import ASGIApp | ||
from typing_extensions import Literal | ||
|
||
# The thin client will have a flag to control which implementations to use | ||
is_thin_client = False | ||
|
@@ -15,10 +25,8 @@ | |
except ImportError: | ||
is_thin_client = False | ||
|
||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
LEGACY_ERROR = """\033[91mYou are using a deprecated configuration of Chroma. | ||
|
||
\033[94mIf you do not have data you wish to migrate, you only need to change how you construct | ||
|
@@ -59,7 +67,25 @@ | |
} | ||
|
||
|
||
class Settings(BaseSettings): | ||
class ClientAuthProvider(ABC, EnforceOverrides): | ||
def __init__(self, settings: "Settings") -> None: | ||
self._settings = settings | ||
|
||
@abstractmethod | ||
def authenticate(self, session: requests.Session) -> None: | ||
pass | ||
|
||
|
||
class ServerAuthProvider(ABC, EnforceOverrides): | ||
def __init__(self, settings: "Settings") -> None: | ||
self._settings = settings | ||
|
||
@abstractmethod | ||
def authenticate(self, request: Request) -> Union[Response, None]: | ||
pass | ||
|
||
|
||
class Settings(BaseSettings): # type: ignore | ||
environment: str = "" | ||
|
||
# Legacy config has to be kept around because pydantic will error on nonexisting keys | ||
|
@@ -88,7 +114,13 @@ class Settings(BaseSettings): | |
chroma_server_ssl_enabled: Optional[bool] = False | ||
chroma_server_grpc_port: Optional[str] = None | ||
chroma_server_cors_allow_origins: List[str] = [] # eg ["http://localhost:3000"] | ||
# eg ["chromadb.api.fastapi.middlewares.auth.AuthMiddleware"] | ||
chroma_server_middlewares: List[str] = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this used? |
||
|
||
chroma_server_auth_provider: Optional[str] = None | ||
chroma_server_auth_provider_config: Optional[Union[str, Dict[str, Any]]] = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could we strongly type this instead of leaving it to be an arbitrary dict? (TypedDict) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For example, I have to read the code to know "ignore_auth_paths" is a valid config. I'd prefer if this was self-documenting. |
||
chroma_client_auth_provider: Optional[str] = None | ||
chroma_client_auth_provider_config: Optional[Union[str, Dict[str, Any]]] = None | ||
anonymized_telemetry: bool = True | ||
|
||
allow_reset: bool = False | ||
|
@@ -106,7 +138,7 @@ def require(self, key: str) -> Any: | |
def __getitem__(self, key: str) -> Any: | ||
val = getattr(self, key) | ||
# Error on legacy config values | ||
if val in _legacy_config_values: | ||
if isinstance(val, str) and val in _legacy_config_values: | ||
raise ValueError(LEGACY_ERROR) | ||
return val | ||
|
||
|
@@ -158,7 +190,7 @@ def reset_state(self) -> None: | |
|
||
class System(Component): | ||
settings: Settings | ||
|
||
auth_provider: Optional["ClientAuthProvider"] | ||
_instances: Dict[Type[Component], Component] | ||
|
||
def __init__(self, settings: Settings): | ||
|
@@ -169,7 +201,15 @@ def __init__(self, settings: Settings): | |
"Chroma is running in http-only client mode, and can only be run with 'chromadb.api.fastapi.FastAPI' as the chroma_api_impl. \ | ||
see https://docs.trychroma.com/usage-guide?lang=py#using-the-python-http-only-client for more information." | ||
) | ||
|
||
if settings.chroma_client_auth_provider is not None: | ||
logger.debug( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we just debug all settings if we are adding this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe I'll remove it for now but as per our side discussion it would be useful to make things traceable either via logging or tracers (jaeger) |
||
f"Client Auth Provider: {settings.chroma_client_auth_provider}" | ||
) | ||
self.auth_provider = get_class( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the auth provider should be a Component, and then the classes that need it should require() it to match the pattern elsewhere in the codebase. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @HammadB, I refactored it so that it works for client auth provider however the server side is a bit different as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nevermind, I stand corrected; we have in the constructor: self._api: chromadb.api.API = chromadb.Client(settings) So naturally I did: if settings.chroma_server_auth_provider:
self._auth_middleware = self._api.require(FastAPIChromaAuthMiddleware)
self._app.add_middleware(FastAPIChromaAuthMiddlewareWrapper, auth_middleware=self._auth_middleware) |
||
settings.chroma_client_auth_provider, ClientAuthProvider | ||
)(settings) | ||
else: | ||
self.auth_provider = None | ||
# Validate settings don't contain any legacy config values | ||
for key in _legacy_config_keys: | ||
if settings[key] is not None: | ||
|
@@ -225,11 +265,141 @@ def stop(self) -> None: | |
def reset_state(self) -> None: | ||
"""Reset the state of this system and all constituents in reverse dependency order""" | ||
if not self.settings.allow_reset: | ||
raise ValueError("Resetting is not allowed by this configuration (to enable it, set `allow_reset` to `True` in your Settings() or include `ALLOW_RESET=TRUE` in your environment variables)") | ||
raise ValueError( | ||
"Resetting is not allowed by this configuration (to enable it, set `allow_reset` to `True` in your Settings() or include `ALLOW_RESET=TRUE` in your environment variables)" | ||
) | ||
for component in reversed(list(self.components())): | ||
component.reset_state() | ||
|
||
|
||
class BasicAuthClientProvider(ClientAuthProvider): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment - move to /auth There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All moved to auth. |
||
_basic_auth_token: SecretStr | ||
|
||
def __init__(self, settings: "Settings") -> None: | ||
super().__init__(settings) | ||
self._settings = settings | ||
if os.environ.get("CHROMA_CLIENT_AUTH_BASIC_USERNAME") and os.environ.get( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets move all the env variables into one place and define an enum and use the enum, string hardcoding is error prone and harder to track. I imagine we'll add a bunch of env vars in the future and keeping them in one place makes it better self-documentation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also the settings can get automatically populated by environment variables (thats how docker passes settings via env), should we just rely on that instead of doing .get()? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of the challenges I thought of here was polluting the Settings with tons of params that are specific to each Auth provider. But let me mull it over. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah settings is becoming a sort of 'junk drawer' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that the settings object has no concept of hierarchy - some of the settings only belong to certain components. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, in "theory" you can have another netsted settings inside Settings ... or can you 🤔 (Inception) |
||
"CHROMA_CLIENT_AUTH_BASIC_PASSWORD" | ||
): | ||
self._basic_auth_token = _create_token( | ||
os.environ.get("CHROMA_CLIENT_AUTH_BASIC_USERNAME", ""), | ||
os.environ.get("CHROMA_CLIENT_AUTH_BASIC_PASSWORD", ""), | ||
) | ||
elif isinstance( | ||
self._settings.chroma_client_auth_provider_config, str | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Comment the cases with the case for ease of reading.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @HammadB Actually, I am refactoring this whole mess. It won't scale/adapt to new ways of providing configuration and credentials. Instead, now we'll have two new abstractions (driven by settings with certain, hard-coded for now, precedence - env, file, ... anything else). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, yeah a "loader" interface makes sense. |
||
) and os.path.exists(self._settings.chroma_client_auth_provider_config): | ||
with open(self._settings.chroma_client_auth_provider_config) as f: | ||
# read first line of file which should be user:password | ||
_auth_data = f.readline().strip().split(":") | ||
# validate auth data | ||
if len(_auth_data) != 2: | ||
raise ValueError("Invalid auth data") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add test for this case - can we make error message more descriptive? Expected username:password got {}? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will move this whole code into an abstraction for credentials loading where depending on the "loader" (e.g. htpasswd, DB, env etc.) we'll have a more specific message to help the user understand why credentials loading was no successful. |
||
self._basic_auth_token = _create_token(_auth_data[0], _auth_data[1]) | ||
elif self._settings.chroma_client_auth_provider_config and isinstance( | ||
self._settings.chroma_client_auth_provider_config, dict | ||
): | ||
self._basic_auth_token = _create_token( | ||
self._settings.chroma_client_auth_provider_config["username"], | ||
self._settings.chroma_client_auth_provider_config["password"], | ||
) | ||
else: | ||
raise ValueError("Basic auth credentials not found") | ||
|
||
@overrides | ||
def authenticate(self, session: requests.Session) -> None: | ||
session.headers.update( | ||
{"Authorization": f"Basic {self._basic_auth_token.get_secret_value()}"} | ||
) | ||
|
||
|
||
class ChromaAuthMiddleware(BaseHTTPMiddleware): # type: ignore | ||
def __init__(self, app: ASGIApp, settings: "Settings") -> None: | ||
super().__init__(app) | ||
self._settings = settings | ||
self._settings.require("chroma_server_auth_provider") | ||
if settings.chroma_server_auth_provider: | ||
_cls = get_class(settings.chroma_server_auth_provider, ServerAuthProvider) | ||
logger.debug(f"Server Auth Provider: {_cls}") | ||
self._auth_provider = _cls(settings) | ||
|
||
async def dispatch( | ||
self, request: Request, call_next: RequestResponseEndpoint | ||
) -> Response: | ||
response = self._auth_provider.authenticate(request) | ||
if response is not None: | ||
return response | ||
return await call_next(request) | ||
|
||
|
||
def _create_token(username: str, password: str) -> SecretStr: | ||
return SecretStr( | ||
base64.b64encode(f"{username}:{password}".encode("utf-8")).decode("utf-8") | ||
) | ||
|
||
|
||
class BasicAuthServerProvider(ServerAuthProvider): | ||
_basic_auth_token: SecretStr | ||
_ignore_auth_paths: List[str] = ["/api/v1", "/api/v1/heartbeat"] | ||
|
||
def __init__(self, settings: "Settings") -> None: | ||
super().__init__(settings) | ||
self._settings = settings | ||
self._basic_auth_token = SecretStr("") | ||
if os.environ.get("CHROMA_SERVER_AUTH_BASIC_USERNAME") and os.environ.get( | ||
"CHROMA_SERVER_AUTH_BASIC_PASSWORD" | ||
): | ||
self._basic_auth_token = _create_token( | ||
os.environ.get("CHROMA_SERVER_AUTH_BASIC_USERNAME", ""), | ||
os.environ.get("CHROMA_SERVER_AUTH_BASIC_PASSWORD", ""), | ||
) | ||
self._ignore_auth_paths = os.environ.get( | ||
"CHROMA_SERVER_AUTH_IGNORE_PATHS", ",".join(self._ignore_auth_paths) | ||
).split(",") | ||
elif isinstance( | ||
self._settings.chroma_server_auth_provider_config, str | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we parse it as a path in Settings() and then store it, so we can error in settings inside of here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes and furthermore I think it makes sense to pull this up in the Auth middleware instead having each provider implement it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Refactored. It looks way cleaner and more secure in the new code. chroma_server_auth_ignore_paths: Dict[str, List[str]] = {
"/api/v1": ["GET"],
"/api/v1/heartbeat": ["GET"],
"/api/v1/version": ["GET"]} then we have: class FastAPIChromaAuthMiddleware(ChromaAuthMiddleware):
...
def authenticate(self, request: Request) -> Optional[Response]:
if (request.url.path in self._ignore_auth_paths.keys() and
request.method.upper() in self._ignore_auth_paths[request.url.path]):
logger.debug(f"Skipping auth for path {request.url.path} and method {request.method}")
return None
... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great |
||
) and os.path.exists(self._settings.chroma_server_auth_provider_config): | ||
with open(self._settings.chroma_server_auth_provider_config) as f: | ||
# read first line of file which should be user:password | ||
_auth_data = f.readline().strip().split(":") | ||
# validate auth data | ||
if len(_auth_data) != 2: | ||
raise ValueError("Invalid auth data") | ||
self._basic_auth_token = _create_token(_auth_data[0], _auth_data[1]) | ||
self._ignore_auth_paths = os.environ.get( | ||
"CHROMA_SERVER_AUTH_IGNORE_PATHS", ",".join(self._ignore_auth_paths) | ||
).split(",") | ||
elif self._settings.chroma_server_auth_provider_config and isinstance( | ||
self._settings.chroma_server_auth_provider_config, dict | ||
): | ||
# encode the username and password base64 | ||
self._basic_auth_token = _create_token( | ||
self._settings.chroma_server_auth_provider_config["username"], | ||
self._settings.chroma_server_auth_provider_config["password"], | ||
) | ||
if "ignore_auth_paths" in self._settings.chroma_server_auth_provider_config: | ||
self._ignore_auth_paths = ( | ||
self._settings.chroma_server_auth_provider_config[ | ||
"ignore_auth_paths" | ||
] | ||
) | ||
else: | ||
raise ValueError("Basic auth credentials not found") | ||
|
||
@overrides | ||
def authenticate(self, request: Request) -> Union[Response, None]: | ||
auth_header = request.headers.get("Authorization", "").split() | ||
# Check if the header exists and the token is correct | ||
if request.url.path in self._ignore_auth_paths: | ||
logger.debug(f"Skipping auth for path {request.url.path}") | ||
return None | ||
if ( | ||
len(auth_header) != 2 | ||
or auth_header[1] != self._basic_auth_token.get_secret_value() | ||
): | ||
return JSONResponse({"error": "Unauthorized"}, status_code=401) | ||
return None | ||
|
||
|
||
C = TypeVar("C") | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this in /auth? Feels cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had it like this at the beginning but the problem I faced was circular imports - auth imports settings from config and config imports auth stuff but I figure a way to remove downward deps