Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timx 285 aardvark fmr #198

Merged
merged 4 commits into from
Jul 12, 2024
Merged

Timx 285 aardvark fmr #198

merged 4 commits into from
Jul 12, 2024

Conversation

ehanson8
Copy link
Contributor

Purpose and background context

Refactors the first set of Aardvark field methods to align with the conventions developed since this class was initially created.

How can a reviewer manually see the effects of these changes?

Run the following command to see that the Aardvark class transform still transforms a source file:

pipenv run transform -o output/gismit.json -i  tests/fixtures/aardvark_record_all_fields.jsonl -s gismit

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

* Reorder Aardvark unit tests to match method order
* Add get_timdex_record_id unit test
* Remove unnecessary non-field method unit test
Why these changes are being introduced:
* Update Aardvark class to use current field method conventions

How this addresses that need:
* Add field methods and associated private methods for alternate_titles, content_type, contributors, and dates
* Update unit tests with current conventions

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-285
def test_aardvark_get_alternate_titles_success(aardvark_record_all_fields):
assert MITAardvark.get_alternate_titles(next(aardvark_record_all_fields)) == [
def test_aardvark_get_alternate_titles_success(aardvark_records):
source_record = next(aardvark_records)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stub records are a lot different with JSON vs. XML so I don't fully approve of this particular approach but figure we would discuss in this PR before I go further. All suggestions welcome so we can properly set our JSON source testing conventions going forward!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree that JSON records are easier to manipulate, given that once was parsed it's just a dictionary we can easily modify.

Instead of using the overhead of an iterator and next(...), what about a similar stub method -- for symmetry -- that just returns a barebones dictionary? Then the test is free to add/remove keys as needed.

While I think a test could just do:

source_record = {}

The benefit of a stub might be that we could set some shared defaults, maybe bits and bobs that we know are commonly accessed. But at the end of the day, it is just a dictionary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct about the value of symmetry, I'll create a method!

Comment on lines +211 to +214
def _issued_dates(cls, source_record: dict) -> Iterator[timdex.Date]:
if issued_date := source_record.get("dct_issued_s"): # noqa: SIM102
if validate_date(issued_date, cls.get_source_record_id(source_record)):
yield (timdex.Date(value=issued_date, kind="Issued"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm shifting the date validation before the Date instance is created, which seems more inline with how we're doing it in other transforms but again, I want to hear everyone's thoughts on this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this pattern! This is very easy to parse and understand. With a nod to the oustanding date parsing issue, I think this looks great.

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a couple of conversational comments, overall looks great! Thinks it's a testament to the quality of the original pass.

def test_aardvark_get_alternate_titles_success(aardvark_record_all_fields):
assert MITAardvark.get_alternate_titles(next(aardvark_record_all_fields)) == [
def test_aardvark_get_alternate_titles_success(aardvark_records):
source_record = next(aardvark_records)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree that JSON records are easier to manipulate, given that once was parsed it's just a dictionary we can easily modify.

Instead of using the overhead of an iterator and next(...), what about a similar stub method -- for symmetry -- that just returns a barebones dictionary? Then the test is free to add/remove keys as needed.

While I think a test could just do:

source_record = {}

The benefit of a stub might be that we could set some shared defaults, maybe bits and bobs that we know are commonly accessed. But at the end of the day, it is just a dictionary.

Comment on lines +211 to +214
def _issued_dates(cls, source_record: dict) -> Iterator[timdex.Date]:
if issued_date := source_record.get("dct_issued_s"): # noqa: SIM102
if validate_date(issued_date, cls.get_source_record_id(source_record)):
yield (timdex.Date(value=issued_date, kind="Issued"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this pattern! This is very easy to parse and understand. With a nod to the oustanding date parsing issue, I think this looks great.

* Add create_aardvark_source_record_stub function
* Update unit tests to call stub function
* Refactor Aardvark.record_is_deleted method and update corresponding unit test
@ehanson8 ehanson8 requested a review from ghukill July 11, 2024 19:01
@ehanson8
Copy link
Contributor Author

@ghukill @jonavellecuerdo Pushed a new commit and re-requesting review

Comment on lines +10 to +11
def create_aardvark_source_record_stub() -> dict:
return {"id": "123", "dct_title_s": "Test title 1"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines -105 to +110
if isinstance(source_record["gbl_suppressed_b"], bool):
if isinstance(source_record.get("gbl_suppressed_b"), bool):
return source_record["gbl_suppressed_b"]

message = (
f"Record ID '{cls.get_source_record_id(source_record)}': "
"'gbl_suppressed_b' value is not a boolean"
"'gbl_suppressed_b' value is not a boolean or missing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice updates here! Good catch.

@ehanson8 ehanson8 merged commit ec77d65 into main Jul 12, 2024
5 checks passed
@ehanson8 ehanson8 deleted the TIMX-285-aardvark-fmr branch July 12, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants