Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] CIP-2: Auth Providers Proposal #986

Merged
merged 15 commits into from
Aug 23, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions chromadb/api/fastapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ def __init__(self, system: System):
self._session = requests.Session()
if self._header is not None:
self._session.headers.update(self._header)
if system.auth_provider is not None:
system.auth_provider.authenticate(self._session)

@override
def heartbeat(self) -> int:
Expand Down
46 changes: 46 additions & 0 deletions chromadb/auth/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Chroma Server Middlewares

## Basic Auth

This is very rudimentary security middleware that checks Authorization headers for basic auth credentials.

The basic auth user and pass are configured on the server side using `CHROMA_SERVER_AUTH_PROVIDER_CONFIG`. Make sure to
also define the auth provider `CHROMA_SERVER_AUTH_PROVIDER="chromadb.auth.BasicAuthServerProvider"`.

### Usage

Start the server:

```bash
CHROMA_SERVER_AUTH_PROVIDER="chromadb.BasicAuthServerProvider" \
CHROMA_SERVER_AUTH_PROVIDER_CONFIG='{"username":"admin","password":"admin"}' \
ALLOW_RESET=1 \
IS_PERSISTENT=1 \
uvicorn chromadb.app:app --workers 1 --host 0.0.0.0 --port 8000 --proxy-headers --log-config log_config.yml --reload
```

Test request with `curl`:

```bash
curl http://localhost:8000/api/v1 -v -H "Authorization: Basic YWRtaW46YWRtaW4="
```

Test with client side auth provider:

```python
import chromadb
from chromadb import Settings

client = chromadb.HttpClient(settings=Settings(chroma_client_auth_provider="chromadb.auth.BasicAuthClientProvider",
chroma_client_auth_provider_config={"username": "admin", "password": "admin"}))
client.heartbeat()
```

Test with Http Client and basic auth header:

```python
import chromadb

client = chromadb.HttpClient(host="localhost", port="8000", headers={"Authorization": "Basic YWRtaW46YWRtaW4="})
client.heartbeat()
```
4 changes: 4 additions & 0 deletions chromadb/auth/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from chromadb.config import BasicAuthClientProvider, BasicAuthServerProvider

# Re-export types from chromadb
__all__ = ["BasicAuthClientProvider", "BasicAuthServerProvider"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if its ok if this is chromadb.auth.X - it feels ok to live one level nested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HammadB, I moved things around now all the abstractions will live in chromadb.auth and all the concrete implementations (will live in e.g. chromadb.auth.basic or chromadb.auth.oidc etc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks

199 changes: 186 additions & 13 deletions chromadb/config.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,22 @@
from pydantic import BaseSettings
from typing import Optional, List, Any, Dict, TypeVar, Set, cast, Iterable, Type
from typing_extensions import Literal
from abc import ABC
import base64
import importlib
import inspect
import logging
from overrides import EnforceOverrides, override
import os
from abc import ABC, abstractmethod
from graphlib import TopologicalSorter
import inspect
from typing import Optional, List, Any, Dict, Set, Iterable, Union
from typing import Type, TypeVar, cast

import requests
from overrides import override
from overrides import overrides, EnforceOverrides
from pydantic import BaseSettings, SecretStr
from starlette.middleware.base import BaseHTTPMiddleware, RequestResponseEndpoint
from starlette.requests import Request
from starlette.responses import Response, JSONResponse
from starlette.types import ASGIApp
from typing_extensions import Literal

# The thin client will have a flag to control which implementations to use
is_thin_client = False
Expand All @@ -15,10 +25,8 @@
except ImportError:
is_thin_client = False


logger = logging.getLogger(__name__)


LEGACY_ERROR = """\033[91mYou are using a deprecated configuration of Chroma.

\033[94mIf you do not have data you wish to migrate, you only need to change how you construct
Expand Down Expand Up @@ -59,7 +67,25 @@
}


class Settings(BaseSettings):
class ClientAuthProvider(ABC, EnforceOverrides):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put this in /auth? Feels cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had it like this at the beginning but the problem I faced was circular imports - auth imports settings from config and config imports auth stuff but I figure a way to remove downward deps

def __init__(self, settings: "Settings") -> None:
self._settings = settings

@abstractmethod
def authenticate(self, session: requests.Session) -> None:
pass


class ServerAuthProvider(ABC, EnforceOverrides):
def __init__(self, settings: "Settings") -> None:
self._settings = settings

@abstractmethod
def authenticate(self, request: Request) -> Union[Response, None]:
pass


class Settings(BaseSettings): # type: ignore
environment: str = ""

# Legacy config has to be kept around because pydantic will error on nonexisting keys
Expand Down Expand Up @@ -88,7 +114,13 @@ class Settings(BaseSettings):
chroma_server_ssl_enabled: Optional[bool] = False
chroma_server_grpc_port: Optional[str] = None
chroma_server_cors_allow_origins: List[str] = [] # eg ["http://localhost:3000"]
# eg ["chromadb.api.fastapi.middlewares.auth.AuthMiddleware"]
chroma_server_middlewares: List[str] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?


chroma_server_auth_provider: Optional[str] = None
chroma_server_auth_provider_config: Optional[Union[str, Dict[str, Any]]] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we strongly type this instead of leaving it to be an arbitrary dict? (TypedDict)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I have to read the code to know "ignore_auth_paths" is a valid config. I'd prefer if this was self-documenting.

chroma_client_auth_provider: Optional[str] = None
chroma_client_auth_provider_config: Optional[Union[str, Dict[str, Any]]] = None
anonymized_telemetry: bool = True

allow_reset: bool = False
Expand All @@ -106,7 +138,7 @@ def require(self, key: str) -> Any:
def __getitem__(self, key: str) -> Any:
val = getattr(self, key)
# Error on legacy config values
if val in _legacy_config_values:
if isinstance(val, str) and val in _legacy_config_values:
raise ValueError(LEGACY_ERROR)
return val

Expand Down Expand Up @@ -158,7 +190,7 @@ def reset_state(self) -> None:

class System(Component):
settings: Settings

auth_provider: Optional["ClientAuthProvider"]
_instances: Dict[Type[Component], Component]

def __init__(self, settings: Settings):
Expand All @@ -169,7 +201,18 @@ def __init__(self, settings: Settings):
"Chroma is running in http-only client mode, and can only be run with 'chromadb.api.fastapi.FastAPI' as the chroma_api_impl. \
see https://docs.trychroma.com/usage-guide?lang=py#using-the-python-http-only-client for more information."
)

if (
settings.chroma_client_auth_provider is not None
and settings.chroma_client_auth_provider.strip() != ""
Copy link
Collaborator

@HammadB HammadB Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if downstream code expects it to be stripped, we should strip and store it to avoid bugs if anyone else is checking? wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HammadB I hope you don't mind, but I have added annotated pedantic validator, that ensures the value is not empty already at setting load time.

):
logger.debug(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just debug all settings if we are adding this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'll remove it for now but as per our side discussion it would be useful to make things traceable either via logging or tracers (jaeger)

f"Client Auth Provider: {settings.chroma_client_auth_provider}"
)
self.auth_provider = get_class(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the auth provider should be a Component, and then the classes that need it should require() it to match the pattern elsewhere in the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HammadB, I refactored it so that it works for client auth provider however the server side is a bit different as FastAPI(chromadb.server.Server): is not a component itself so it can't "require" other deps (perhaps refactor that in future PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I stand corrected; we have in the constructor:

self._api: chromadb.api.API = chromadb.Client(settings)

So naturally I did:

if settings.chroma_server_auth_provider:
            self._auth_middleware = self._api.require(FastAPIChromaAuthMiddleware)
            self._app.add_middleware(FastAPIChromaAuthMiddlewareWrapper, auth_middleware=self._auth_middleware)

settings.chroma_client_auth_provider, ClientAuthProvider
)(settings)
else:
self.auth_provider = None
# Validate settings don't contain any legacy config values
for key in _legacy_config_keys:
if settings[key] is not None:
Expand Down Expand Up @@ -225,11 +268,141 @@ def stop(self) -> None:
def reset_state(self) -> None:
"""Reset the state of this system and all constituents in reverse dependency order"""
if not self.settings.allow_reset:
raise ValueError("Resetting is not allowed by this configuration (to enable it, set `allow_reset` to `True` in your Settings() or include `ALLOW_RESET=TRUE` in your environment variables)")
raise ValueError(
"Resetting is not allowed by this configuration (to enable it, set `allow_reset` to `True` in your Settings() or include `ALLOW_RESET=TRUE` in your environment variables)"
)
for component in reversed(list(self.components())):
component.reset_state()


class BasicAuthClientProvider(ClientAuthProvider):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment - move to /auth

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All moved to auth.

_basic_auth_token: SecretStr

def __init__(self, settings: "Settings") -> None:
super().__init__(settings)
self._settings = settings
if os.environ.get("CHROMA_CLIENT_AUTH_BASIC_USERNAME") and os.environ.get(
Copy link
Collaborator

@HammadB HammadB Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets move all the env variables into one place and define an enum and use the enum, string hardcoding is error prone and harder to track. I imagine we'll add a bunch of env vars in the future and keeping them in one place makes it better self-documentation.

Copy link
Collaborator

@HammadB HammadB Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the settings can get automatically populated by environment variables (thats how docker passes settings via env), should we just rely on that instead of doing .get()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the challenges I thought of here was polluting the Settings with tons of params that are specific to each Auth provider. But let me mull it over.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah settings is becoming a sort of 'junk drawer'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the settings object has no concept of hierarchy - some of the settings only belong to certain components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in "theory" you can have another netsted settings inside Settings ... or can you 🤔 (Inception)

"CHROMA_CLIENT_AUTH_BASIC_PASSWORD"
):
self._basic_auth_token = _create_token(
os.environ.get("CHROMA_CLIENT_AUTH_BASIC_USERNAME", ""),
os.environ.get("CHROMA_CLIENT_AUTH_BASIC_PASSWORD", ""),
)
elif isinstance(
self._settings.chroma_client_auth_provider_config, str
Copy link
Collaborator

@HammadB HammadB Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Comment the cases with the case for ease of reading.

# User supplied a path
# User supplied env vars
# User supplied a config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HammadB Actually, I am refactoring this whole mess. It won't scale/adapt to new ways of providing configuration and credentials. Instead, now we'll have two new abstractions (driven by settings with certain, hard-coded for now, precedence - env, file, ... anything else).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah a "loader" interface makes sense.

) and os.path.exists(self._settings.chroma_client_auth_provider_config):
with open(self._settings.chroma_client_auth_provider_config) as f:
# read first line of file which should be user:password
_auth_data = f.readline().strip().split(":")
# validate auth data
if len(_auth_data) != 2:
raise ValueError("Invalid auth data")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for this case - can we make error message more descriptive?

Expected username:password got {}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move this whole code into an abstraction for credentials loading where depending on the "loader" (e.g. htpasswd, DB, env etc.) we'll have a more specific message to help the user understand why credentials loading was no successful.

self._basic_auth_token = _create_token(_auth_data[0], _auth_data[1])
elif self._settings.chroma_client_auth_provider_config and isinstance(
self._settings.chroma_client_auth_provider_config, dict
):
self._basic_auth_token = _create_token(
self._settings.chroma_client_auth_provider_config["username"],
self._settings.chroma_client_auth_provider_config["password"],
)
else:
raise ValueError("Basic auth credentials not found")

@overrides
def authenticate(self, session: requests.Session) -> None:
session.headers.update(
{"Authorization": f"Basic {self._basic_auth_token.get_secret_value()}"}
)


class ChromaAuthMiddleware(BaseHTTPMiddleware): # type: ignore
def __init__(self, app: ASGIApp, settings: "Settings") -> None:
super().__init__(app)
self._settings = settings
self._settings.require("chroma_server_auth_provider")
if settings.chroma_server_auth_provider:
_cls = get_class(settings.chroma_server_auth_provider, ServerAuthProvider)
logger.debug(f"Server Auth Provider: {_cls}")
self._auth_provider = _cls(settings)

async def dispatch(
self, request: Request, call_next: RequestResponseEndpoint
) -> Response:
response = self._auth_provider.authenticate(request)
if response is not None:
return response
return await call_next(request)


def _create_token(username: str, password: str) -> SecretStr:
return SecretStr(
base64.b64encode(f"{username}:{password}".encode("utf-8")).decode("utf-8")
)


class BasicAuthServerProvider(ServerAuthProvider):
_basic_auth_token: SecretStr
_ignore_auth_paths: List[str] = ["/api/v1", "/api/v1/heartbeat", "/api/v1/version"]

def __init__(self, settings: "Settings") -> None:
super().__init__(settings)
self._settings = settings
self._basic_auth_token = SecretStr("")
if os.environ.get("CHROMA_SERVER_AUTH_BASIC_USERNAME") and os.environ.get(
"CHROMA_SERVER_AUTH_BASIC_PASSWORD"
):
self._basic_auth_token = _create_token(
os.environ.get("CHROMA_SERVER_AUTH_BASIC_USERNAME", ""),
os.environ.get("CHROMA_SERVER_AUTH_BASIC_PASSWORD", ""),
)
self._ignore_auth_paths = os.environ.get(
"CHROMA_SERVER_AUTH_IGNORE_PATHS", ",".join(self._ignore_auth_paths)
).split(",")
elif isinstance(
self._settings.chroma_server_auth_provider_config, str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we parse it as a path in Settings() and then store it, so we can error in settings inside of here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and furthermore I think it makes sense to pull this up in the Auth middleware instead having each provider implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored. It looks way cleaner and more secure in the new code.

chroma_server_auth_ignore_paths: Dict[str, List[str]] = {
        "/api/v1": ["GET"],
        "/api/v1/heartbeat": ["GET"],
        "/api/v1/version": ["GET"]}

then we have:

class FastAPIChromaAuthMiddleware(ChromaAuthMiddleware):
...
    def authenticate(self, request: Request) -> Optional[Response]:
        if (request.url.path in self._ignore_auth_paths.keys() and
                request.method.upper() in self._ignore_auth_paths[request.url.path]):
            logger.debug(f"Skipping auth for path {request.url.path} and method {request.method}")
            return None
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

) and os.path.exists(self._settings.chroma_server_auth_provider_config):
with open(self._settings.chroma_server_auth_provider_config) as f:
# read first line of file which should be user:password
_auth_data = f.readline().strip().split(":")
# validate auth data
if len(_auth_data) != 2:
raise ValueError("Invalid auth data")
self._basic_auth_token = _create_token(_auth_data[0], _auth_data[1])
self._ignore_auth_paths = os.environ.get(
"CHROMA_SERVER_AUTH_IGNORE_PATHS", ",".join(self._ignore_auth_paths)
).split(",")
elif self._settings.chroma_server_auth_provider_config and isinstance(
self._settings.chroma_server_auth_provider_config, dict
):
# encode the username and password base64
self._basic_auth_token = _create_token(
self._settings.chroma_server_auth_provider_config["username"],
self._settings.chroma_server_auth_provider_config["password"],
)
if "ignore_auth_paths" in self._settings.chroma_server_auth_provider_config:
self._ignore_auth_paths = (
self._settings.chroma_server_auth_provider_config[
"ignore_auth_paths"
]
)
else:
raise ValueError("Basic auth credentials not found")

@overrides
def authenticate(self, request: Request) -> Union[Response, None]:
auth_header = request.headers.get("Authorization", "").split()
# Check if the header exists and the token is correct
if request.url.path in self._ignore_auth_paths:
logger.debug(f"Skipping auth for path {request.url.path}")
return None
if (
len(auth_header) != 2
or auth_header[1] != self._basic_auth_token.get_secret_value()
):
return JSONResponse({"error": "Unauthorized"}, status_code=401)
return None


C = TypeVar("C")


Expand Down
8 changes: 7 additions & 1 deletion chromadb/server/fastapi/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
from fastapi import HTTPException, status
from uuid import UUID


import chromadb
from chromadb.api.models.Collection import Collection
from chromadb.api.types import GetResult, QueryResult
Expand Down Expand Up @@ -110,6 +109,13 @@ def __init__(self, settings: Settings):
allow_origins=settings.chroma_server_cors_allow_origins,
allow_methods=["*"],
)
if (
settings.chroma_server_auth_provider is not None
and settings.chroma_server_auth_provider.strip() != ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's store it stripped if are going to strip in all calling code

):
self._app.add_middleware(
chromadb.config.ChromaAuthMiddleware, settings=settings
)

self.router = ChromaAPIRouter()

Expand Down
Loading