Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update remote artifact urls on sync if the url of the artifact has changed #1623

Merged
merged 1 commit into from
Sep 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES/9395.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixed an issue where on_demand content might not be downloaded properly if the remote URL was changed (even if re-synced).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we write an automated test for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we had a pulp_file fixture that had a different layout but the same content we could test this by:

  1. sync one of the fixtures with on_demand
  2. change the remote.url to the other fixture
  3. observe the 404
  4. sync
  5. observe the 404's go away

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot, because a change to the layout would yield a different set of content, because relative_path is part of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmbouter Test added

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! +1

30 changes: 20 additions & 10 deletions pulpcore/plugin/stages/artifact_stages.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,13 +276,11 @@ async def run(self):
The coroutine for this stage.
"""
async for batch in self.batches():
await sync_to_async(RemoteArtifact.objects.bulk_get_or_create)(
await self._needed_remote_artifacts(batch)
)
await self._handle_remote_artifacts(batch)
for d_content in batch:
await self.put(d_content)

async def _needed_remote_artifacts(self, batch):
async def _handle_remote_artifacts(self, batch):
"""
Build a list of only :class:`~pulpcore.plugin.models.RemoteArtifact` that need
to be created for the batch.
Expand Down Expand Up @@ -318,7 +316,8 @@ async def _needed_remote_artifacts(self, batch):
#
# We can end up with duplicates (diff pks, same sha256) in the sequence below,
# so we store by-sha256 and then return the final values
needed_ras = {} # { str(<sha256>): RemoteArtifact, ... }
ras_to_create = {} # { str(<sha256>): RemoteArtifact, ... }
ras_to_update = {}
for d_content in batch:
for d_artifact in d_content.d_artifacts:
if not d_artifact.remote:
Expand Down Expand Up @@ -379,14 +378,25 @@ async def _needed_remote_artifacts(self, batch):
async for remote_artifact in sync_to_async_iterable(
content_artifact._remote_artifact_saver_ras
):
if remote_artifact.remote_id == d_artifact.remote.pk:
if d_artifact.url == remote_artifact.url:
break

if d_artifact.remote.pk == remote_artifact.remote_id:
key = f"{content_artifact.pk}-{remote_artifact.remote_id}"
remote_artifact.url = d_artifact.url
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR:

If the relative path of the artifact within the repo has changed, or if the base path of the repo has changed, we update the RemoteArtifact to match the new URL

Copy link
Member

@ipanova ipanova Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some specific examples on how this problem can affect plugins:

  1. RPM plugin: part of the remote_artifact.url is composed of location_href which is specified in the repodata. It might happen so that it will change in the remote repo , and with the next sync no new RemoteArtifacts will be created because of uniqueness constraint and old RemoteArtifacts would point to an invalid url and as a result user will get 404.
  2. File plugin: create and sync a local_repoA with on_demand policy from https://remote.repos.com/repoA/PULP_MANIFEST. Then update remote.url to point to https://remote.repos.com/exact_copy_of_repoA/PULP_MANIFEST. Imagine that remote.repos.com/repoA. has been removed/broken and only remote.repos.com/exact_copy_of_repoA is left. Re-sync local_repoA. Users will get 404 afterwards because remote_artifact.url will still point to old unavailable url and not new remote_artifacts would be created.
  3. Container plugin: similar to file plugin workflow just the url to the registry would change

I'd prefer to change uniqueness constraint of the RemoteArtifact so it also contains the url in addition to content_artifact and remote however due to outlined by @dralley reasons that this would not be backportable taken by him approach makes sense.

ras_to_update[key] = remote_artifact
break
else:
remote_artifact = self._create_remote_artifact(d_artifact, content_artifact)
key = f"{str(content_artifact.pk)}-{str(d_artifact.remote.pk)}"
needed_ras[key] = remote_artifact

return list(needed_ras.values())
dralley marked this conversation as resolved.
Show resolved Hide resolved
key = f"{content_artifact.pk}-{d_artifact.remote.pk}"
ras_to_create[key] = remote_artifact

if ras_to_create:
await sync_to_async(RemoteArtifact.objects.bulk_create)(list(ras_to_create.values()))
if ras_to_update:
await sync_to_async(RemoteArtifact.objects.bulk_update)(
list(ras_to_update.values()), fields=["url"]
)

@staticmethod
def _create_remote_artifact(d_artifact, content_artifact):
Expand Down
7 changes: 7 additions & 0 deletions pulpcore/tests/functional/api/using_plugin/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,13 @@
FILE_FIXTURE_URL = urljoin(PULP_FIXTURES_BASE_URL, "file/")
"""The URL to a file repository."""

FILE_FIXTURE_WITH_MISSING_FILES_URL = urljoin(PULP_FIXTURES_BASE_URL, "file-manifest/")
"""The URL to a file repository with missing files."""

FILE_FIXTURE_WITH_MISSING_FILES_MANIFEST_URL = urljoin(
FILE_FIXTURE_WITH_MISSING_FILES_URL, "PULP_MANIFEST"
)
"""The URL to a file repository with missing files manifest."""

FILE_CHUNKED_FIXTURE_URL = urljoin(PULP_FIXTURES_BASE_URL, "file-chunked/")
"""The URL to a file repository."""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from urllib.parse import urljoin

from pulp_smash import api, config, utils
from pulp_smash.pulp3.bindings import delete_orphans
from pulp_smash.pulp3.bindings import delete_orphans, monitor_task, PulpTestCase
from pulp_smash.pulp3.constants import ON_DEMAND_DOWNLOAD_POLICIES
from pulp_smash.pulp3.utils import (
download_content_unit,
Expand All @@ -16,16 +16,26 @@
)
from requests import HTTPError

from pulpcore.client.pulp_file import (
PublicationsFileApi,
RemotesFileApi,
RepositoriesFileApi,
RepositorySyncURL,
DistributionsFileApi,
)
from pulpcore.tests.functional.api.using_plugin.constants import (
FILE_CONTENT_NAME,
FILE_DISTRIBUTION_PATH,
FILE_FIXTURE_URL,
FILE_FIXTURE_MANIFEST_URL,
FILE_FIXTURE_WITH_MISSING_FILES_MANIFEST_URL,
FILE_REMOTE_PATH,
FILE_REPO_PATH,
)
from pulpcore.tests.functional.api.using_plugin.utils import (
create_file_publication,
gen_file_remote,
gen_file_client,
)
from pulpcore.tests.functional.api.using_plugin.utils import ( # noqa:F401
set_up_module as setUpModule,
Expand Down Expand Up @@ -105,3 +115,70 @@ def test_content_remote_delete(self):
).hexdigest()

self.assertEqual(pulp_hash, fixtures_hash)


class RemoteArtifactUpdateTestCase(PulpTestCase):
@classmethod
def setUpClass(cls):
"""Clean out Pulp before testing."""
delete_orphans()
client = gen_file_client()
cls.repo_api = RepositoriesFileApi(client)
cls.remote_api = RemotesFileApi(client)
cls.publication_api = PublicationsFileApi(client)
cls.distributions_api = DistributionsFileApi(client)
cls.cfg = config.get_config()

def tearDown(self):
"""Clean up Pulp after testing."""
self.doCleanups()
delete_orphans()

def test_remote_artifact_url_update(self):
"""Test that downloading on_demand content works after a repository layout change."""

FILE_NAME = "1.iso"

# 1. Create a remote, repository and distribution - remote URL has links that should 404
remote_config = gen_file_remote(
policy="on_demand", url=FILE_FIXTURE_WITH_MISSING_FILES_MANIFEST_URL
)
remote = self.remote_api.create(remote_config)
self.addCleanup(self.remote_api.delete, remote.pulp_href)

repo = self.repo_api.create(gen_repo(autopublish=True, remote=remote.pulp_href))
self.addCleanup(self.repo_api.delete, repo.pulp_href)

body = gen_distribution(repository=repo.pulp_href)
distribution_response = self.distributions_api.create(body)
created_resources = monitor_task(distribution_response.task).created_resources
distribution = self.distributions_api.read(created_resources[0])
self.addCleanup(self.distributions_api.delete, distribution.pulp_href)

# 2. Sync the repository, verify that downloading artifacts fails
repository_sync_data = RepositorySyncURL(remote=remote.pulp_href)

sync_response = self.repo_api.sync(repo.pulp_href, repository_sync_data)
monitor_task(sync_response.task)

with self.assertRaises(HTTPError):
download_content_unit(self.cfg, distribution.to_dict(), FILE_NAME)

# 3. Update the remote URL with one that works, sync again, check that downloading
# artifacts works.
update_response = self.remote_api.update(
remote.pulp_href, gen_file_remote(policy="on_demand", url=FILE_FIXTURE_MANIFEST_URL)
)
monitor_task(update_response.task)

sync_response = self.repo_api.sync(repo.pulp_href, repository_sync_data)
monitor_task(sync_response.task)

content = download_content_unit(self.cfg, distribution.to_dict(), FILE_NAME)
pulp_hash = hashlib.sha256(content).hexdigest()

fixtures_hash = hashlib.sha256(
utils.http_get(urljoin(FILE_FIXTURE_URL, FILE_NAME))
).hexdigest()

self.assertEqual(pulp_hash, fixtures_hash)