Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update post-transform methods to handle None values #147

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Mar 12, 2024

Purpose and background context

An error was encountered when trying to run the latest updates to the transformer against the full gisogm source:

File "/Users/jcuerdo/Documents/repos/transmogrifier/transmogrifier/sources/transformer.py", line 256, in _transform
    fields = self.create_locations_from_spatial_subjects(fields)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jcuerdo/Documents/repos/transmogrifier/transmogrifier/sources/transformer.py", line 417, in create_locations_from_spatial_subjects
    if subject_location not in fields.setdefault("locations", []):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable

When a field method returns None values (resulting from or None statements), the fields dictionary will be populated with a key-value pair formatted as <field_name>: None. The fields.setdefault(<field_name>, []) method will only set values to an empty list if the key is not present. The solution proposed is to have post-transform methods (a) check if the value for a field in fields is None and (b) if None, set the value to an empty list.

How can a reviewer manually see the effects of these changes?

  1. Temporarily set AWS credentials for TimdexManagers for Dev1 in your terminal.

  2. Run transform on full set of gisogm records:

    • Run the following command from your local clone of transmogrifier:
      pipenv run transform -i s3://timdex-extract-dev-222053980223/gisogm/gisogm-2024-03-01-full-extracted-records-to-index.jsonl -o output/output_gisogm.json -s gisogm -v
      
  3. Verify completion without error!

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* When a field method returns "None", the resulting "fields" dictionary will
set a key-value pair, where key = <field_name> and value = None. This will result
in the .setdefault() method being ignored during the post-transform methods
as it will only set the value in the "fields" dictionary to an empty list if
the key is not present.

How this addresses that need:
* Check if field methods that are updated by post-transform methods are
already set to None. If None, set to an empty list.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-217
* https://mitlibraries.atlassian.net/browse/GDT-210
@jonavellecuerdo jonavellecuerdo self-assigned this Mar 12, 2024
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review March 12, 2024 14:26
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

I like the added tests, specifically that they test these new "enrichment" (name still TBD) methods specifically.

And as mentioned in discussions, I think this is a good solution given our current architecture, and we can revisit (if needed) in any future refactorings.

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great tests!

@jonavellecuerdo jonavellecuerdo merged commit dd2c4ac into main Mar 12, 2024
5 checks passed
@jonavellecuerdo jonavellecuerdo deleted the GDT-217-handle-none-values branch March 12, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants