-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] CIP-2: Auth Providers Proposal #986
Conversation
- Basic Auth provider for both server and client implemented
# Conflicts: # chromadb/test/conftest.py
- Updated Basic Auth Provider to also load credentials from file or from env vars (CHROMA_BASIC_AUTH_USERNAME and CHROMA_BASIC_AUTH_PASSWORD)
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
@HammadB , Saw some issues:
Gonna look at it later tonight. |
In the meantime let me convert to draft. |
Taz - we can move this to "Accepted" and merge the CIP separately. My only feedback is on the storage of credentials in plaintext, but we can refine this as we build the implementation. Congrats on landing the first community contributed CIP! |
@HammadB, I agree plain text is not great. I can add support for htpasswd style file. This would give all in all 4 alternatives on how to configure basic auth. As a side node, in cloud settings often credentials, as passed in plain text through secret injection - file and env, are generally accepted approaches for the injection. I will make it a point to state in the docs that while credentials might not be encrypted, for prod use, we recommend using secret management (Vault, AWS KMS etc.) to inject the credentials at runtime. |
- Basic Auth implemented for both server and client side using 3 ways of getting the auth config - params passed to client/server, env vars and file based ('user:pass' style). - Docs will be updated later today with an example jupyter notebook - Tests are passing including the new ones for testing negative (auth failures) and variations of positive test cases. Note: Needs a bit of a cleanup for mypy errors warnings.
@HammadB I think we're nearly there. Just a bit of cleanup for mypy errors and it should be good for merge. |
chromadb/test/test_api.py
Outdated
@@ -1361,3 +1362,14 @@ def test_invalid_embeddings(api): | |||
with pytest.raises(ValueError) as e: | |||
collection.upsert(**invalid_records) | |||
assert "embedding" in str(e.value) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we treat test_api.py as deprecated. lets break this out into its own test file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just occurred to me that I need to add a generator for Auth Providers instead of having many fixtures for each provider with each config type (e.g. file, env, params)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahahahahahahahahahaah
- Fixed an issue where if empty strings are passed to params an exception is thrown - Updated `client_auth.ipynb` notebook to with relevant documentation and examples - Added exports of both client and server Basic Auth providers from `chromadb.auth` - Updated docker-compose to include server-side configuration (it works fine if env vars are not defined)
chromadb/auth/__init__.py
Outdated
from chromadb.config import BasicAuthClientProvider, BasicAuthServerProvider | ||
|
||
# Re-export types from chromadb | ||
__all__ = ["BasicAuthClientProvider", "BasicAuthServerProvider"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if its ok if this is chromadb.auth.X - it feels ok to live one level nested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HammadB, I moved things around now all the abstractions will live in chromadb.auth and all the concrete implementations (will live in e.g. chromadb.auth.basic
or chromadb.auth.oidc
etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks
chromadb/config.py
Outdated
@@ -59,7 +67,25 @@ | |||
} | |||
|
|||
|
|||
class Settings(BaseSettings): | |||
class ClientAuthProvider(ABC, EnforceOverrides): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this in /auth? Feels cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I had it like this at the beginning but the problem I faced was circular imports - auth imports settings from config and config imports auth stuff but I figure a way to remove downward deps
chromadb/config.py
Outdated
@@ -88,7 +114,13 @@ class Settings(BaseSettings): | |||
chroma_server_ssl_enabled: Optional[bool] = False | |||
chroma_server_grpc_port: Optional[str] = None | |||
chroma_server_cors_allow_origins: List[str] = [] # eg ["http://localhost:3000"] | |||
# eg ["chromadb.api.fastapi.middlewares.auth.AuthMiddleware"] | |||
chroma_server_middlewares: List[str] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
chromadb/config.py
Outdated
|
||
chroma_server_auth_provider: Optional[str] = None | ||
chroma_server_auth_provider_config: Optional[Union[str, Dict[str, Any]]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we strongly type this instead of leaving it to be an arbitrary dict? (TypedDict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, I have to read the code to know "ignore_auth_paths" is a valid config. I'd prefer if this was self-documenting.
chromadb/config.py
Outdated
|
||
if ( | ||
settings.chroma_client_auth_provider is not None | ||
and settings.chroma_client_auth_provider.strip() != "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if downstream code expects it to be stripped, we should strip and store it to avoid bugs if anyone else is checking? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HammadB I hope you don't mind, but I have added annotated pedantic validator, that ensures the value is not empty already at setting load time.
chromadb/config.py
Outdated
os.environ.get("CHROMA_CLIENT_AUTH_BASIC_PASSWORD", ""), | ||
) | ||
elif isinstance( | ||
self._settings.chroma_client_auth_provider_config, str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Comment the cases with the case for ease of reading.
# User supplied a path
# User supplied env vars
# User supplied a config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HammadB Actually, I am refactoring this whole mess. It won't scale/adapt to new ways of providing configuration and credentials. Instead, now we'll have two new abstractions (driven by settings with certain, hard-coded for now, precedence - env, file, ... anything else).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yeah a "loader" interface makes sense.
chromadb/server/fastapi/__init__.py
Outdated
@@ -110,6 +109,13 @@ def __init__(self, settings: Settings): | |||
allow_origins=settings.chroma_server_cors_allow_origins, | |||
allow_methods=["*"], | |||
) | |||
if ( | |||
settings.chroma_server_auth_provider is not None | |||
and settings.chroma_server_auth_provider.strip() != "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah let's store it stripped if are going to strip in all calling code
chromadb/test/conftest.py
Outdated
|
||
|
||
def fastapi_server_auth_file() -> Generator[System, None, None]: | ||
server_auth_file = os.path.abspath(os.path.join(".", "server-auth")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets use a tmpdir (can search the code base for how we do this)
Also does the path have to be absolute? It can be relative too right?
chromadb/test/conftest.py
Outdated
@@ -183,26 +279,62 @@ def sqlite_persistent() -> Generator[System, None, None]: | |||
|
|||
|
|||
def system_fixtures() -> List[Callable[[], Generator[System, None, None]]]: | |||
fixtures = [fastapi, fastapi_persistent, sqlite, sqlite_persistent] | |||
fixtures = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to add the auth'ed server here? It increases test times by quite a bit because many downstream tests use it.
I think we can probably do more minimal integration testing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, i added it there as shortcut of running both regression and coverage for all possible auth API calls that need auth but I can remove it and instead create tests for each endpoint separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks.
@pytest.fixture(scope="function") | ||
def api(system: System) -> Generator[API, None, None]: | ||
system.reset_state() | ||
api = system.instance(API) | ||
yield api | ||
|
||
|
||
@pytest.fixture(scope="function") | ||
def api_wrong_cred( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a rule, we only add fixtures to this file if they are intended to be shared across tests.
I think we can encapsulate all the testing for this into one test file and maybe just want to share the logic there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a new subdir for all auth tests as those are expected to grow with additional auth providers
I agree with @HammadB 's comments. my only other question, which I think the answer is yes, is how we can re-compose this same functionality into other middleware in the future (eg a grpc service) |
@jeffchuber, I will add an abstraction for the middleware (on top of ChromaAuthMiddleware) so we can uniformly plug auth across APIs with minimal glue code specific to the API (e.i. gRPC). |
- Redoing the whole architecture with additional abstractions and moving auth stuff in `/auth` Note: Not ready for review
- Added Auth credentials and Auth Configuration server-side abstractions.
- Updated abstractions - Updated CIP document (still unfinished)
Updated client and server abstractions in docs. Refs: chroma-core#986
|
||
- Auth Provider - an abstraction that provides authentication functionality for either client or server-side. The | ||
provider is responsible for validating client credentials using (if available) configuration and credentials | ||
providers. The auth provider is also responsible for carrying the Croma-leg of any authentication flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
providers. The auth provider is also responsible for carrying the Croma-leg of any authentication flow. | |
providers. The auth provider is also responsible for carrying the Chroma-leg of any authentication flow. |
- Basic Auth implemented - Simple tests for valid/invalid credentials added Refs: chroma-core#986
@HammadB, I think this is now mostly done. Going to catch some sleep and clean up all the mypy errors. Local tests are passing. |
- Added new provider configuration Htpasswd provider to read from config value (env or passed via settings) - Added shorthands for providers (e.g. `basic` for both Client and Server auths) - Added registry for auth providers to simplify shorthand lookups - Fixed mypy errors Refs: chroma-core#986
- Basic usage added to usage-guide.md Refs: chroma-core/chroma#986
@HammadB docs updated - chroma-core/docs#117 |
- Added missing JS tab Refs: chroma-core/chroma#986
return False | ||
|
||
@override | ||
def instrument_server(self, app: ASGIApp) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused abstraction - remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gonna use this in the authZ part (coming up next). Also, if we want to implement basic auth with a challenge-response style can also be useful.
- Removed unused code. Refs: chroma-core#986
Looks great! Thanks for the work! |
- Merged all three tables into what which only lists first-party clients + server-side support for auth methods - Added Section for the Basic Auth (in the future we'll add more of those as auth methods are added) - Added Windows commands where applicable [DX]. - Added a common subsection how to create .htpasswd file Refs: chroma-core/chroma#986
- Transposed supported auth methods table for readability. Refs: chroma-core/chroma#986
- Removed platform specific commands. Now things are platform independent - Removed server verification with curl - no value - The default way of running Chroma is via docker compose now Refs: chroma-core/chroma#986
Description of changes
Summarize the changes made by this PR.
Test plan
TBD
Documentation Changes
Docs should change to describe how to use auth providers on the client and server. CIP added in
docs/