Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented auth_check #2497

Merged
merged 13 commits into from
Sep 11, 2024
Merged

implemented auth_check #2497

merged 13 commits into from
Sep 11, 2024

Conversation

cjfghk5697
Copy link
Contributor

@cjfghk5697 cjfghk5697 commented Aug 30, 2024

Related to #2466

I have implemented auth_check related to the issue #2466.

  • Implemented auth_check
  • Added docstring for auth_check
  • Wrote test code for auth_check

Please review the implementation and let me know if any further changes are required.

@cjfghk5697 cjfghk5697 changed the title check auth implemented auth_check Aug 30, 2024
Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @cjfghk5697, thank you for this PR 🤗 I left few comments, otherwise everything looks good to me. The indentation fix should resolve both the documentation build and the code quality check in the CI pipelines

If the repository is gated or does not exist, the respective error is raised.
"""
headers = self._build_hf_headers(token=token)
path = f"{self.endpoint}/api/datasets/{repo_id}/auth-check"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation only handles the datasets repo type. It should handle all repo types supported by the Hugging Face Hub (i.e. datasets, models and spaces). You can add a repo_type argument to the method.

Suggested change
path = f"{self.endpoint}/api/datasets/{repo_id}/auth-check"
if repo_type is None:
repo_type = constants.REPO_TYPE_MODEL
if repo_type not in constants.REPO_TYPES:
raise ValueError(
f"Invalid repo type, must be one of {constants.REPO_TYPES}"
)
path = f"{self.endpoint}/api/{repo_type}s/{repo_id}/auth-check"

@@ -9525,6 +9529,51 @@ def list_user_following(self, username: str) -> Iterable[User]:
for followed_user in r.json():
yield User(**followed_user)

def auth_check(self, repo_id: str, token: Union[bool, str, None] = None) -> None:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small indentation fix needed on this line

@@ -9525,6 +9529,51 @@ def list_user_following(self, username: str) -> Iterable[User]:
for followed_user in r.json():
yield User(**followed_user)

def auth_check(self, repo_id: str, token: Union[bool, str, None] = None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more convenient if the method returns True when the user has access. This change would allow users to use the boolean return in conditional statements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While a boolean value is convenient to use in a if/else statement, it will nonetheless be hiding some information to the end user. For now there are 3 cases:

  1. passing correctly
  2. RepositoryNotFoundError error if private or not existing
  3. GatedRepoError error if exists but wrong permission

The end user will eventually want to deal with these errors differently depending on their use cases. Letting the correct error be raised makes it flexible.

However what would be nice is to document how to use the method:

from huggingface_hub import auth_check
from huggingface_hub.utils import  GatedRepoError, RepositoryNotFoundError

try:
    auth_check(repo_id, repo_type=repo_type)
except GatedRepoError:
    # Handle gated repo
    ...
except RepositoryNotFoundError:
    # Handle missing repo
    ...

This snippet can be added as an example in the docstring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, auth_check is for more advanced usage of the lib'. For power users it's fine to have slightly more verbose boilerplate code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!


@patch.object(HfApi, "auth_check", return_value=None)
def test_auth_check_success(self, mock_auth_check):
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If auth_check raises any exception, the test will automatically fail without needing self.fail(), The try/except block with self.fail() is redundant in this case.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @cjfghk5697! Agree with most comments @hanouticelina above :) I've added a few comment especially related to how we should test this feature. Let us know if you have any question!

Comment on lines 9558 to 9560
Example:
>>> api = HfApi(token="your_token")
>>> api.auth_check(repo_id="user/my-cool-model")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example:
>>> api = HfApi(token="your_token")
>>> api.auth_check(repo_id="user/my-cool-model")
Example:
```py
>>> from huggingface_hub import HfApi
>>> api = HfApi(token="your_token")
>>> api.auth_check(repo_id="user/my-cool-model")
```

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
@@ -9525,6 +9529,51 @@ def list_user_following(self, username: str) -> Iterable[User]:
for followed_user in r.json():
yield User(**followed_user)

def auth_check(self, repo_id: str, token: Union[bool, str, None] = None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While a boolean value is convenient to use in a if/else statement, it will nonetheless be hiding some information to the end user. For now there are 3 cases:

  1. passing correctly
  2. RepositoryNotFoundError error if private or not existing
  3. GatedRepoError error if exists but wrong permission

The end user will eventually want to deal with these errors differently depending on their use cases. Letting the correct error be raised makes it flexible.

However what would be nice is to document how to use the method:

from huggingface_hub import auth_check
from huggingface_hub.utils import  GatedRepoError, RepositoryNotFoundError

try:
    auth_check(repo_id, repo_type=repo_type)
except GatedRepoError:
    # Handle gated repo
    ...
except RepositoryNotFoundError:
    # Handle missing repo
    ...

This snippet can be added as an example in the docstring.

@@ -9525,6 +9529,51 @@ def list_user_following(self, username: str) -> Iterable[User]:
for followed_user in r.json():
yield User(**followed_user)

def auth_check(self, repo_id: str, token: Union[bool, str, None] = None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, auth_check is for more advanced usage of the lib'. For power users it's fine to have slightly more verbose boilerplate code.

@@ -4239,3 +4231,27 @@ def test_upload_large_folder(self, repo_url: RepoUrl) -> None:
for j in range(N_FILES_PER_FOLDER):
assert f"subfolder_{i}/file_lfs_{i}_{j}.bin" in uploaded_files
assert f"subfolder_{i}/file_regular_{i}_{j}.txt" in uploaded_files


class TestHfApiAuthCheck(HfApiCommonTest):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests as well. Mocked tests are useful in many cases but for the core HfApi features, we want to test the behavior "for real" on the staging environment. Here is how you can do that!

class TestHfApiAuthCheck(HfApiCommonTest):
    @use_tmp_repo(repo_type="dataset")
    def test_auth_check_success(self, repo_url: RepoUrl) -> None:
        self._api.auth_check(repo_id=repo_url.repo_id, repo_type=repo_url.repo_type)

    def test_auth_check_repo_missing(self) -> None:
        with self.assertRaises(RepositoryNotFoundError):
            self._api.auth_check(repo_id="username/missing_repo_id")

In the first case, you test what happens if the repo exists (using @use_tmp_repo decorator). In the second case, you test what happens if the repo is missing (with a fake url). Finally, you'll need a last case where you create a new repo + set it as gated (as done here) + try auth_check with a different token. Since a user has always access to its own repo, no matter if it's gated, you need to create a repo with user A and then test access with user B. You can use TOKEN and OTHER_TOKEN for that.

@cjfghk5697
Copy link
Contributor Author

cjfghk5697 commented Sep 4, 2024

@Wauplin @hanouticelina Thank you so much for your reviews! I've refactored the code based on your feedback. Could you please review it again?

  • I really appreciate your work. Thanks to you, I feel like I've learned how to write better code😊🤗

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @cjfghk5697, thanks for the refactoring :) I've left a comment related to the return type of auth_check, otherwise everything else looks good to me

@@ -9525,6 +9529,83 @@ def list_user_following(self, username: str) -> Iterable[User]:
for followed_user in r.json():
yield User(**followed_user)

def auth_check(self, repo_id: str, repo_type: Optional[str] = None, token: Union[bool, str, None] = None) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed by @Wauplin here, using a boolean for auth_check combines different error types into a single 'False' result. Instead, allowing specific exceptions (GatedRepoError, RepositoryNotFoundError) to propagate provides users with more detailed information and enables more precise error handling in their code.

tests/test_hf_api.py Outdated Show resolved Hide resolved
@cjfghk5697
Copy link
Contributor Author

@Wauplin @hanouticelina
I've completed the updates according to review. The auth_check function now works as expected, and the test case properly verifies access denial for gated repositories. Thank you for guidance and support!

@Wauplin
Copy link
Contributor

Wauplin commented Sep 10, 2024

Hi @cjfghk5697, thanks for the update. As discussed in #2497 (comment) and #2497 (comment) we would prefer for auth_check to raise an error instead of returning a boolean to check authorization. Could you take care of it (see more details in above comments). Thanks!

@cjfghk5697
Copy link
Contributor Author

Hi @Wauplin,

I apologize for the mistake, and thank you for pointing it out. I've made the necessary adjustments as per your feedback. Could you please review it again when you have a moment?

Thank you!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cjfghk5697, thanks a lot for the iterations! seems good to me 👍
cc @Wauplin

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @cjfghk5697! I have left final comments regarding code styling and docstrings that I will merge right away. Other than that, we are good to merge! 🎉

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
with self.assertRaises(RepositoryNotFoundError):
self._api.auth_check(repo_id="username/missing_repo_id")

def test_auth_check_gated_repo(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test looks good!

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
@Wauplin
Copy link
Contributor

Wauplin commented Sep 11, 2024

All green! Thanks a lot @cjfghk5697 🤗 🎉

@Wauplin Wauplin merged commit 855755b into huggingface:main Sep 11, 2024
16 checks passed
@cjfghk5697
Copy link
Contributor Author

@Wauplin @hanouticelina Thank you so much for all your help. If there's anything else that needs to be handled, please let me know!

@cjfghk5697 cjfghk5697 deleted the auth_check branch September 12, 2024 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants