-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timx 287 ead field method refactor 2 #187
Conversation
def test_ead_transform_with_multiple_unitid_gets_valid_ids(): | ||
ead_xml_records = Ead.parse_source_file( | ||
"tests/fixtures/ead/ead_record_attribute_and_subfield_variations.xml" | ||
) | ||
output_record = next(Ead("aspace", ead_xml_records)) | ||
for identifier in output_record.identifiers: | ||
assert identifier.value != "unitid-that-should-not-be-identifier" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehanson8 Related to #185 (comment). I removed the test and in its place are the tests for the new get_identifiers
field method. 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks great! I'm requesting the dreaded changes just for consistency in the positional/keyword usage. It's ultimately not that big of a deal, but while we're on the topic, seems worth mentioning.
source_record_id = cls.get_source_record_id(source_record) | ||
dates.extend( | ||
cls._parse_date_elements(collection_description_did, source_record_id) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're not addressing this now, but just want to continue to note that pulling the source_record_id
, just so it can be logged for bad date parsing, feels awkward.
Apologies if an issue or note exists elsewhere, but just in case, created this one: #188.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed and thanks for creating the issue!
def test_get_contributors_success(): | ||
source_record = create_ead_source_record_stub( | ||
metadata_insert=( | ||
""" | ||
<origination label="Creator"> | ||
<persname> | ||
Author, Best E. | ||
<part>( <emph> Best <emph>Ever</emph> </emph> )</part> | ||
</persname> | ||
</origination> | ||
""" | ||
), | ||
parent_element="did", | ||
) | ||
assert Ead.get_contributors(source_record) == [ | ||
timdex.Contributor(value="Author, Best E. ( Best Ever )", kind="Creator") | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a random example, but really digging these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great but a few suggestions!
] | ||
|
||
|
||
def test_get_dates_success(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a non-range date value as well so we're testing everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change has been made in commit 95b6b47)
transmogrifier/sources/xml/ead.py
Outdated
# get valid date ranges or dates | ||
if "/" in date_value: | ||
gte_date, lte_date = date_value.split("/") | ||
if gte_date != lte_date: | ||
if validate_date_range( | ||
gte_date, | ||
lte_date, | ||
source_record_id, | ||
): | ||
date_instance.range = timdex.DateRange( | ||
gte=gte_date, | ||
lte=lte_date, | ||
) | ||
else: | ||
# get valid date (if dates in ranges are the same) | ||
date_string = gte_date | ||
if validate_date( | ||
date_string, | ||
source_record_id, | ||
): | ||
date_instance.value = date_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be broken out as a separate method to improve readability, Datacite
has a _parse_date_range
method that would be very similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review the changes in latest commit!
29e6d6e
to
95b6b47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Approved.
elif validate_date(date_value, source_record_id): | ||
date_instance.value = date_value | ||
if "/" in date_string: | ||
date_instance = cls._parse_date_range(date_string, source_record_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a nice change. I almost commented in the previous PR that this field method was a bit unwieldy, but struggling a bit with how much of that refactoring -- particularly around dates -- to suggest in this first pass of just getting to a field method approach.
Either way, much easier to scan and understand now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
elif validate_date(date_value, source_record_id): | ||
date_instance.value = date_value | ||
if "/" in date_string: | ||
date_instance = cls._parse_date_range(date_string, source_record_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, great work!
def _parse_date_range(cls, date_string: str, source_record_id: str) -> timdex.Date: | ||
date_instance = timdex.Date() | ||
gte_date, lte_date = date_string.split("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly different approach from Datacite
to instantiate a separate date instance in the method rather than pass the existing one from get_dates
but as Graham said we'll certainly be evaluating and hopefully centralizing a lot of the date parsing functionality from out of the source transforms. We can decide on the preferred approach then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
* Add field methods and corresponding unit tests: contents, contributors, dates, and identifiers * Update generate_name_identifier_url * Clarify that it is a private method associated with 'get_contributors' * Update syntax for 'create_string_*' and 'create_list_*' methods to use keyword args
95b6b47
to
9a736b9
Compare
Purpose and background context
Field method refactor for transform class Ead (Part 2).
contents
,contributors
,dates
,and
identifiers
.get_dates
How can a reviewer manually see the effects of these changes?
make test
and verify all unit tests are passing.Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)