Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination broken in get_tags for ECR when limit None is given #173

Closed
natefaerber opened this issue Dec 5, 2024 · 0 comments · Fixed by #174
Closed

Pagination broken in get_tags for ECR when limit None is given #173

natefaerber opened this issue Dec 5, 2024 · 0 comments · Fixed by #174

Comments

@natefaerber
Copy link
Contributor

get_tags fails when limit is not give or is given as None (same thing.)

from oras.client import OrasClient
client = OrasClient('__SNIPPED_BY_ME__.dkr.ecr.us-west-2.amazonaws.com', auth_backend='basic')
pw = "..."
client.login(password=pw, username="AWS", hostname="__SNIPPED_BY_ME__.dkr.ecr.us-west-2.amazonaws.com")
from oras.logger import setup_logger, logger
setup_logger(quiet=False, debug=True)

client.get_tags('my/app')

my/app is a "namespaced" repository. This repository has over 1000 images which I suspect is part of the issue here since I don't have issues with smaller repos or issues when a pass N<=1000.

Errors seen after enabling debug logging.

Traceback (most recent call last):
  File "__SNIPPED_BY_ME__/.venv/lib/python3.12/site-packages/oras/decorator.py", line 40, in inner
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "__SNIPPED_BY_ME__/.venv/lib/python3.12/site-packages/oras/provider.py", line 967, in do_request
    response = self.session.request(
               ^^^^^^^^^^^^^^^^^^^^^
  File "__SNIPPED_BY_ME__/.venv/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "__SNIPPED_BY_ME__/.venv/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "__SNIPPED_BY_ME__/.venv/lib/python3.12/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='__SNIPPED_BY_ME__.dkr.ecr.us-west-2.amazonaws.comhttps', port=443): Max retries exceeded with url: /__SNIPPED_BY_ME__.dkr.ecr.us-west-2.amazonaws.com/v2/my/app/tags/list?last=ukD72mdD%2FmC8b5xV3susmJzzaTgp3hKwR9nRUW1yZZ45rnWqcRvUcjdSjqGstiFS1nz2HtUlMNI14iKrrj%2F35AU1TdEkjIZ<snipped>A8VLpN5xHZ3%2BeRlKHJ7d%2FbioNLy3R5jOon7X61YbIG%2BRHzgQyJieYh5fCaZoH8fw%3D%3D (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x104e53ef0>: Failed to resolve '__SNIPPED_BY_ME__.dkr.ecr.us-west-2.amazonaws.comhttps' ([Errno 8] nodename nor servname provided, or not known)"))

Walking through the _do_paginated_request function, I can get a sense of the issue.

The response to the do_request will give me a links object like this:

{'next': {'url': 'https://__SNIPPED_BY_ME__.dkr.ecr.us-west-2.amazonaws.com/v2/my/app/tags/list?last=ukD72mdD%2FmC8b5xV3susmJzzaTgp3hKwR9nRUW1yZZ45rnWqcRvUcjdSjqGstiFS1nz2HtUlMNI14iKrrj%2F35AU1TdEkjIZ<snipped>A8VLpN5xHZ3%2BeRlKHJ7d%2FbioNLy3R5jOon7X61YbIG%2BRHzgQyJieYh5fCaZoH8fw%3D%3D', 'rel': 'next'}}

If I don't hit a condition to break out of the while loop, I get to this part of the code at https://github.com/oras-project/oras-py/blob/0.2.25/oras/provider.py#L394-L395.

# use link + base url to continue with next page
url = f"{base_url}{link}"

where link is https://__SNIPPED_BY_ME__.dkr.ecr.us-west-2.amazonaws.com/v2/my/app/tags/list?last=ukD72mdD%2FmC8b5xV3susmJzzaTgp3hKwR9nRUW1yZZ45rnWqcRvUcjdSjqGstiFS1nz2HtUlMNI14iKrrj%2F35AU1TdEkjIZ<snipped>A8VLpN5xHZ3%2BeRlKHJ7d%2FbioNLy3R5jOon7X61YbIG%2BRHzgQyJieYh5fCaZoH8fw%3D%3D which is going to be a problem when appended to {base_url} which is https://__SNIPPED_BY_ME__.dkr.ecr.us-west-2.amazonaws.com.

I suspect a solution could be as simple as changing

url = f"{base_url}{link}"

to

url = urllib.parse.urljoin(base_url, link)

However, I don't know that I have the bandwidth or context to know how to verify that this won't impact the repos for which this pagination was originally written and presumably working. Based on #68 (comment), I have a good feeling this will work as expected for repos that return the path as the link instead of the full URL.

I will followup with a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant