Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in the Archivematica transfer monitor #117

Merged
merged 2 commits into from
Dec 5, 2022

Conversation

alexwlchan
Copy link
Contributor

The transfer monitor provides end-to-end testing of Archivematica – it looks at the files uploaded to the S3 bucket as Archivematica inputs, and checks to see if they've successfully reached the storage service. It does that by looking for the METS file in the reporting cluster. Unfortunately, the name of that file is structured in a way that makes it slightly difficult to query:

data/objects/submissionDocumentation/transfer-ARTCOOB14-46f0a350-d18d-4123-a162-70589bf9786e/METS.xml

So the monitor has to query the list of storage service files in an approximate way:

SELECT files
WHERE externalIdentifier={archivematicaIdentifier}
AND suffix=".xml"

which can fail if a bag contains lots of XML files and the METS file isn't in the first page of results.

We can fix this in the storage service, but we haven't done that work yet. See wellcomecollection/storage-service#1028

This ups the result window from 100 to 10,000, so it's more likely we'll find the METS file in that first page of results. If this still isn't enough, we can go back to the storage service.

When the transfer monitor runs, it's looking for the METS file in the
reporting cluster to see that the transfer completed successfully.
Unfortunately, the name of that file is structured in a way that makes
it slightly difficult to query:

    data/objects/submissionDocumentation/transfer-ARTCOOB14-46f0a350-d18d-4123-a162-70589bf9786e/METS.xml

So it has to query the list of storage service files in an approximate
way:

    SELECT files
    WHERE externalIdentifier={archivematicaIdentifier}
    AND suffix=".xml"

which can fail if a bag contains lots of XML files and the METS file
isn't in the first page of results.

We can fix this in the storage service, but we haven't done that work yet.

This ups the result window from 100 to 10,000, so it's more likely we'll
find the METS file in that first page of results.  If this still isn't
enough, we can go back to the storage service.
Currently the two stacks are fighting over whether this should be tagged
as prod/staging, which creates unnecessary diff churn.  Create a new rule
per stack to avoid churn.
@alexwlchan alexwlchan merged commit 3dbaa52 into main Dec 5, 2022
@alexwlchan alexwlchan deleted the look-at-more-xml-files branch December 5, 2022 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant