Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] CIP-2: Auth Providers Proposal #986

Merged
merged 15 commits into from
Aug 23, 2023

Conversation

tazarov
Copy link
Contributor

@tazarov tazarov commented Aug 14, 2023

Description of changes

Summarize the changes made by this PR.

  • New functionality
    • Auth Provide Client and Server Side Abstractions
    • Basic Auth Provider

Test plan

TBD

Documentation Changes

Docs should change to describe how to use auth providers on the client and server. CIP added in docs/

- Basic Auth provider for both server and client implemented
# Conflicts:
#	chromadb/test/conftest.py
- Updated Basic Auth Provider to also load credentials from file or from env vars (CHROMA_BASIC_AUTH_USERNAME and CHROMA_BASIC_AUTH_PASSWORD)
@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readbility, Modularity, Intuitiveness)

@tazarov tazarov changed the title Feature/simple auth [ENH] CIP-2: Auth Providers Proposal Aug 14, 2023
@tazarov
Copy link
Contributor Author

tazarov commented Aug 15, 2023

@HammadB , Saw some issues:

chromadb/config.py:85: in ServerAuthProvider
    def authenticate(self, request: Request) -> Response | None:
E   TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

Gonna look at it later tonight.

@tazarov
Copy link
Contributor Author

tazarov commented Aug 15, 2023

In the meantime let me convert to draft.

@tazarov tazarov marked this pull request as draft August 15, 2023 18:00
@HammadB
Copy link
Collaborator

HammadB commented Aug 15, 2023

Taz - we can move this to "Accepted" and merge the CIP separately. My only feedback is on the storage of credentials in plaintext, but we can refine this as we build the implementation. Congrats on landing the first community contributed CIP!

@tazarov
Copy link
Contributor Author

tazarov commented Aug 15, 2023

@HammadB, I agree plain text is not great. I can add support for htpasswd style file. This would give all in all 4 alternatives on how to configure basic auth.

As a side node, in cloud settings often credentials, as passed in plain text through secret injection - file and env, are generally accepted approaches for the injection.

I will make it a point to state in the docs that while credentials might not be encrypted, for prod use, we recommend using secret management (Vault, AWS KMS etc.) to inject the credentials at runtime.

- Basic Auth implemented for both server and client side using 3 ways of getting the auth config - params passed to client/server, env vars and file based ('user:pass' style).
- Docs will be updated later today with an example jupyter notebook
- Tests are passing including the new ones for testing negative (auth failures) and variations of positive test cases.

Note: Needs a bit of a cleanup for mypy errors warnings.
@tazarov
Copy link
Contributor Author

tazarov commented Aug 16, 2023

@HammadB I think we're nearly there. Just a bit of cleanup for mypy errors and it should be good for merge.

@@ -1361,3 +1362,14 @@ def test_invalid_embeddings(api):
with pytest.raises(ValueError) as e:
collection.upsert(**invalid_records)
assert "embedding" in str(e.value)

Copy link
Collaborator

@HammadB HammadB Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we treat test_api.py as deprecated. lets break this out into its own test file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just occurred to me that I need to add a generator for Auth Providers instead of having many fixtures for each provider with each config type (e.g. file, env, params)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that!

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahahahahahahahahahaah

- Fixed an issue where if empty strings are passed to params an exception is thrown
- Updated `client_auth.ipynb` notebook to with relevant documentation and examples
- Added exports of both client and server Basic Auth providers from `chromadb.auth`
- Updated docker-compose to include server-side configuration (it works fine if env vars are not defined)
from chromadb.config import BasicAuthClientProvider, BasicAuthServerProvider

# Re-export types from chromadb
__all__ = ["BasicAuthClientProvider", "BasicAuthServerProvider"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if its ok if this is chromadb.auth.X - it feels ok to live one level nested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HammadB, I moved things around now all the abstractions will live in chromadb.auth and all the concrete implementations (will live in e.g. chromadb.auth.basic or chromadb.auth.oidc etc)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks

@@ -59,7 +67,25 @@
}


class Settings(BaseSettings):
class ClientAuthProvider(ABC, EnforceOverrides):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put this in /auth? Feels cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had it like this at the beginning but the problem I faced was circular imports - auth imports settings from config and config imports auth stuff but I figure a way to remove downward deps

@@ -88,7 +114,13 @@ class Settings(BaseSettings):
chroma_server_ssl_enabled: Optional[bool] = False
chroma_server_grpc_port: Optional[str] = None
chroma_server_cors_allow_origins: List[str] = [] # eg ["http://localhost:3000"]
# eg ["chromadb.api.fastapi.middlewares.auth.AuthMiddleware"]
chroma_server_middlewares: List[str] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?


chroma_server_auth_provider: Optional[str] = None
chroma_server_auth_provider_config: Optional[Union[str, Dict[str, Any]]] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we strongly type this instead of leaving it to be an arbitrary dict? (TypedDict)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I have to read the code to know "ignore_auth_paths" is a valid config. I'd prefer if this was self-documenting.


if (
settings.chroma_client_auth_provider is not None
and settings.chroma_client_auth_provider.strip() != ""
Copy link
Collaborator

@HammadB HammadB Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if downstream code expects it to be stripped, we should strip and store it to avoid bugs if anyone else is checking? wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HammadB I hope you don't mind, but I have added annotated pedantic validator, that ensures the value is not empty already at setting load time.

os.environ.get("CHROMA_CLIENT_AUTH_BASIC_PASSWORD", ""),
)
elif isinstance(
self._settings.chroma_client_auth_provider_config, str
Copy link
Collaborator

@HammadB HammadB Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Comment the cases with the case for ease of reading.

# User supplied a path
# User supplied env vars
# User supplied a config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HammadB Actually, I am refactoring this whole mess. It won't scale/adapt to new ways of providing configuration and credentials. Instead, now we'll have two new abstractions (driven by settings with certain, hard-coded for now, precedence - env, file, ... anything else).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah a "loader" interface makes sense.

@@ -110,6 +109,13 @@ def __init__(self, settings: Settings):
allow_origins=settings.chroma_server_cors_allow_origins,
allow_methods=["*"],
)
if (
settings.chroma_server_auth_provider is not None
and settings.chroma_server_auth_provider.strip() != ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's store it stripped if are going to strip in all calling code



def fastapi_server_auth_file() -> Generator[System, None, None]:
server_auth_file = os.path.abspath(os.path.join(".", "server-auth"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets use a tmpdir (can search the code base for how we do this)

Also does the path have to be absolute? It can be relative too right?

@@ -183,26 +279,62 @@ def sqlite_persistent() -> Generator[System, None, None]:


def system_fixtures() -> List[Callable[[], Generator[System, None, None]]]:
fixtures = [fastapi, fastapi_persistent, sqlite, sqlite_persistent]
fixtures = [
Copy link
Collaborator

@HammadB HammadB Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to add the auth'ed server here? It increases test times by quite a bit because many downstream tests use it.

I think we can probably do more minimal integration testing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, i added it there as shortcut of running both regression and coverage for all possible auth API calls that need auth but I can remove it and instead create tests for each endpoint separately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks.

@pytest.fixture(scope="function")
def api(system: System) -> Generator[API, None, None]:
system.reset_state()
api = system.instance(API)
yield api


@pytest.fixture(scope="function")
def api_wrong_cred(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a rule, we only add fixtures to this file if they are intended to be shared across tests.

I think we can encapsulate all the testing for this into one test file and maybe just want to share the logic there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a new subdir for all auth tests as those are expected to grow with additional auth providers

@jeffchuber
Copy link
Contributor

I agree with @HammadB 's comments. my only other question, which I think the answer is yes, is how we can re-compose this same functionality into other middleware in the future (eg a grpc service)

@tazarov
Copy link
Contributor Author

tazarov commented Aug 17, 2023

I agree with @HammadB 's comments. my only other question, which I think the answer is yes, is how we can re-compose this same functionality into other middleware in the future (eg a grpc service)

@jeffchuber, I will add an abstraction for the middleware (on top of ChromaAuthMiddleware) so we can uniformly plug auth across APIs with minimal glue code specific to the API (e.i. gRPC).

- Redoing the whole architecture with additional abstractions and moving auth stuff in `/auth`

Note:  Not ready for review
- Added Auth credentials and Auth Configuration server-side abstractions.
- Updated abstractions
- Updated CIP document (still unfinished)
Updated client and server abstractions in docs.

Refs: chroma-core#986

- Auth Provider - an abstraction that provides authentication functionality for either client or server-side. The
provider is responsible for validating client credentials using (if available) configuration and credentials
providers. The auth provider is also responsible for carrying the Croma-leg of any authentication flow.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
providers. The auth provider is also responsible for carrying the Croma-leg of any authentication flow.
providers. The auth provider is also responsible for carrying the Chroma-leg of any authentication flow.

- Basic Auth implemented
- Simple tests for valid/invalid credentials added

Refs: chroma-core#986
@tazarov
Copy link
Contributor Author

tazarov commented Aug 22, 2023

@HammadB, I think this is now mostly done. Going to catch some sleep and clean up all the mypy errors. Local tests are passing.

@tazarov tazarov marked this pull request as ready for review August 22, 2023 01:58
- Added new provider configuration Htpasswd provider to read from config value (env or passed via settings)
- Added shorthands for providers (e.g. `basic` for both Client and Server auths)
- Added registry for auth providers to simplify shorthand lookups
- Fixed mypy errors

Refs: chroma-core#986
tazarov added a commit to amikos-tech/chroma-docs that referenced this pull request Aug 22, 2023
- Basic usage added to usage-guide.md

Refs: chroma-core/chroma#986
@tazarov
Copy link
Contributor Author

tazarov commented Aug 22, 2023

@HammadB docs updated - chroma-core/docs#117

tazarov added a commit to amikos-tech/chroma-docs that referenced this pull request Aug 22, 2023
chromadb/auth/providers.py Outdated Show resolved Hide resolved
chromadb/auth/providers.py Outdated Show resolved Hide resolved
return False

@override
def instrument_server(self, app: ASGIApp) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused abstraction - remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna use this in the authZ part (coming up next). Also, if we want to implement basic auth with a challenge-response style can also be useful.

- Removed unused code.

Refs: chroma-core#986
@HammadB
Copy link
Collaborator

HammadB commented Aug 23, 2023

Looks great! Thanks for the work!

@HammadB HammadB merged commit 48700dd into chroma-core:main Aug 23, 2023
tazarov added a commit to amikos-tech/chroma-docs that referenced this pull request Aug 23, 2023
- Merged all three tables into what which only lists first-party clients + server-side support for auth methods
- Added Section for the Basic Auth (in the future we'll add more of those as auth methods are added)
- Added Windows commands where applicable [DX].
- Added a common subsection how to create .htpasswd file

Refs: chroma-core/chroma#986
tazarov added a commit to amikos-tech/chroma-docs that referenced this pull request Aug 23, 2023
- Transposed supported auth methods table for readability.

Refs: chroma-core/chroma#986
tazarov added a commit to amikos-tech/chroma-docs that referenced this pull request Aug 23, 2023
- Removed platform specific commands. Now things are platform independent
- Removed server verification with curl - no value
- The default way of running Chroma is via docker compose now

Refs: chroma-core/chroma#986
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants