Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address liteary form control field bug #208

Merged
merged 1 commit into from
Nov 1, 2024
Merged

Conversation

ghukill
Copy link
Contributor

@ghukill ghukill commented Oct 31, 2024

Purpose and background context

MARC record alma:9935219630906761 has come through Transmogrifier a handful of times where control field 008 is not 34 characters, and therefore throws an error when we attempt to grab the 33rd character. This stops the Transmogrifier run completely.

How this addresses that need:

  • If control field 008 is < 34 chars, return None for literary form

Side effects of this change:

  • Transmogrifier runs for large swaths of files, e.g. full runs, will not catastrophically fail for this single bug.
  • This might suggest another layer of error handling to avoid full run failure for a single record, will create at ticket for this.

How can a reviewer manually see the effects of these changes?

New test test_get_literary_form_returns_none_if_control_field_too_short tests this edge case.

If interested, the file in production S3 s3://timdex-extract-prod-300442551476/alma/alma-2023-08-18-daily-extracted-records-to-index.xml is known to contain this record and has thrown this error in the past. Running transmogrifier locally against this file should not fail anymore:

pipenv run transform --verbose \
-s alma \
-i s3://timdex-extract-prod-300442551476/alma/alma-2023-08-18-daily-extracted-records-to-index.xml \
-o output/nofail.json

In the output, you should be able to search for string "could not parse literary form" in the verbose debug output and find this record received the value None.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

YES: reduced Transmogrifier run failures

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:

There is at least one MARC record that comes through
where control field 008 is not 34 characters, and therefore
was throwing an error when we attempt to grab the 33rd character.

How this addresses that need:
* If control field 008 is < 34 chars, return None for literary form

Side effects of this change:
* Transmogrifier runs for large swaths of files, e.g. full runs, will
not catastrophically fail for this single bug.
* This might suggest another layer of error handling to avoid full
run failure for a single record, will create at ticket for this.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-355
@ghukill ghukill force-pushed the TIMX-355-control-field-index branch from 9d99e85 to 775458b Compare October 31, 2024 15:34
@ghukill ghukill marked this pull request as ready for review October 31, 2024 15:38
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix!

@ghukill ghukill merged commit 3f77c7c into main Nov 1, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants