-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 1st set of DspaceDim field methods #194
Conversation
Why these changes are being introduced: * Refactor DspaceDim to use field methods How this addresses that need: * Add create_dspace_dim_source_record_stub function to DspaceDim test module * Rename param xml > source_record * Add field methods and associated private methods for alternate_titles, citation, and content_type * Refactor Whoas.get_content_type method to raise exception for more explicit flow control * Add unit tests for new field methods Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-282
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all, I think it looks great! Left a couple comments about yielding + list extending, but I'm finding these really easy to parse and reason about. Nice work on the changes, and the PR chunking.
yield timdex.Contributor( | ||
value=str(creator.string), | ||
kind="Creator", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious (and same for _get_contributors...()
below), why the yield
?
On one hand, I kind of like it: no need for list comprehensions, just yield Contributor
objects as they are found. On the other hand: feels a bit different than other places?
FWIW, I do kind of like the yield
? Wonder -- not pressing by any means -- if opportunity to move in this direction for other methods as we encounter them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually copying the pattern from Datacite
, I believe we've used it in a few places but we could check for consistency when this refactor is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, those might make for a couple of nice PRs when the first pass is over: "Normalize yield..", "Normalize BS4 string casting...", etc. Though I think by and large things are pretty good.
alternate_titles = [ | ||
timdex.AlternateTitle( | ||
value=str(alternate_title.string), | ||
kind=alternate_title["qualifier"], | ||
) | ||
for alternate_title in source_record.find_all( | ||
"dim:field", element="title", string=True | ||
) | ||
if alternate_title.get("qualifier") | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether we ultimately keep BS4, or move into Xpath/lxml, I'm finding this pattern increasingly easy to read.
kind="coverage", | ||
) | ||
else: | ||
yield timdex.Date(note=coverage_value, kind="coverage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, more yields
! Maybe I just missed them in other transforms. Digging this pattern.
timdex.Contributor(value="Jamerson, James", kind="Creator"), | ||
timdex.Contributor( | ||
value="LaFountain, James R.", | ||
kind="author", | ||
), | ||
timdex.Contributor( | ||
value="Oldenbourg, Rudolf", | ||
kind="author", | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting now, but don't think we should make these changes in this refactor, but this feels like a good opportunity at some point to normalized kind
values. Seeing Creator
(title cased) and author
(lowercase). I know author
comes from the XML itself, and it's always risky to programatically convert case.
Maybe normalization is some kind of post-processing / post record transform? Again, not needed now, just noting for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was almost certain we had a Jira ticket for this but we don't, I'll create an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating the ticket. Maybe when this field method refactor is done, before we launch into orchestration, we could meet and survey the tickets created? Get a feel for anything we may want to also tackle during the orchestration? or after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before sounds like a good idea!
if cls.valid_content_types(content_types): | ||
return content_types | ||
message = f'Record skipped based on content type: "{content_types}"' | ||
raise SkippedRecordEvent(message, cls.get_source_record_id(source_record)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
* Remove list() from .extend calls
@ghukill Had to add some type hinting in |
Ahhh, didn't think of that. Please feel free to disregard the removal of |
I prefer it without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
|
||
@classmethod | ||
def get_contributors(cls, source_record: Tag) -> list[timdex.Contributor] | None: | ||
contributors: list[timdex.Contributor] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. I'm also a little surprised that mypy needed this to understand what what was coming from those methods which are both clearly typed.
yield timdex.Contributor( | ||
value=str(creator.string), | ||
kind="Creator", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, those might make for a couple of nice PRs when the first pass is over: "Normalize yield..", "Normalize BS4 string casting...", etc. Though I think by and large things are pretty good.
Purpose and background context
Refactors the first set of
DspaceDim
fields as separate field methods.How can a reviewer manually see the effects of these changes?
Run the following command to see that the
DspaceDim
transform still transforms a source file and skips an invalid content type according to theWhoas
transform:Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)