Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gdt 82 transformer class refactor #106

Merged
merged 9 commits into from
Nov 27, 2023
Merged

Conversation

ehanson8
Copy link
Contributor

Helpful background context

The addition of the Transformer base class and associated refactoring will be followed by a new JsonTransformer format class that will be used for upcoming aardvark transforms.

How can a reviewer manually see the effects of these changes?

Run a transform on a test fixture to ensure that the base class and associated refactor hasn't broken the application

pipenv run transform -i tests/fixtures/ead/ead_record_all_fields.xml -o output/aspace-records.json -s aspace

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Includes new or updated dependencies?

YES

* Rename Transformer > XmlTransformer to prepare for new base class
* Rename xml arg to source_record for XmlTransformer methods
Why these changes are being introduced:
* A format-agnostic Transformer base class is needed for deriving both XmlTransformer and JsonTransformer format classes

How this addresses that need:
* Create a Transformer base class
* Add JSON type for validation
* Refactor XmlTransformer to derive from Transformer class
* Rename arg xml > source_record and update docstrings

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-82
Why these changes are being introduced:
* parse_xml_records is a function in the helpers module that will be replaced by a method in the Transformer base class

How this addresses that need:
* Add parse_source_records method to Transformer base class as an abstractmethod
* Add parse_source_records to XmlTransformer class and corresponding unit test

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-82
* Remove parse_xml_records function from helpers module and corresponding unit test
* Replace all calls of that function with the appropriate class method
* Shift get_transformer from config module to Transformer class staticmethod along with corresponding unit tests
* Update CLI to call new method
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this looks fantastic.

The requested changes are mostly quite small, just some rough edges from a fairly big refactoring process.

And, acknowledging per conversations that some more changes may be coming related to getting the appropriate transformer class type and parsing input records.

transmogrifier/helpers.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Outdated Show resolved Hide resolved
tests/test_datacite.py Show resolved Hide resolved
transmogrifier/helpers.py Show resolved Hide resolved
transmogrifier/cli.py Outdated Show resolved Hide resolved
transmogrifier/sources/transformer.py Show resolved Hide resolved
* Remove instance from transformer_instance variable name
* Update write_timdex_records_to_json type hinting
* Update get_transformer type hinting
* Update docstrings
* Add load and write_timdex_records_to_json methods to Transformer class
* Remove write_timdex_records_to_json function from helpers module
* Update CLI command with new methods
* Update variable names and docstrings for consistency
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

I did have one question, but perhaps this is slated for a future PR. Any reason to not move helpers.write_deleted_records_to_file() (link) to a Transformer method as well?

If so, I think it calls into question the nature of the method refactored method Transformer.write_timdex_records_to_json().

I wonder if a higher level method like Transformer.write_output_files() could encapsulate the logic and processing of both? It could:

  • write all normal records to output_file
  • if deleted records, rename output_file and write
  • log all information that was formerly in CLI

Still marking as approved, as I think this PR represents a big and good change, but just floating that idea for future PRs.

@ehanson8
Copy link
Contributor Author

  • write all normal records to output_file
  • if deleted records, rename output_file and write
  • log all information that was formerly in CLI

Exactly what I was thinking, saving for a future PR since this was already so big!

@ehanson8 ehanson8 merged commit ee4da86 into main Nov 27, 2023
3 of 5 checks passed
@ehanson8 ehanson8 deleted the gdt-82-transformer-class-refactor branch April 23, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants