-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timx 232 springshare ids #101
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -117,11 +117,16 @@ def get_required_fields(self, xml: Tag) -> dict: | |
xml: A BeautifulSoup Tag representing a single OAI-PMH XML record. | ||
""" | ||
source_record_id = self.get_source_record_id(xml) | ||
|
||
# run methods to generate required fields | ||
source_link = self.get_source_link(self.source_base_url, source_record_id, xml) | ||
timdex_record_id = self.get_timdex_record_id(self.source, source_record_id, xml) | ||
title = self.get_valid_title(source_record_id, xml) | ||
|
||
return { | ||
"source": self.source_name, | ||
"source_link": self.source_base_url + source_record_id, | ||
"timdex_record_id": f"{self.source}:{source_record_id.replace('/', '-')}", | ||
"source_link": source_link, | ||
"timdex_record_id": timdex_record_id, | ||
"title": title, | ||
} | ||
|
||
|
@@ -180,7 +185,7 @@ def get_valid_title(cls, source_record_id: str, xml: Tag) -> str: | |
source_record_id, | ||
all_titles, | ||
) | ||
if all_titles and type(all_titles[0]) == str: | ||
if all_titles and isinstance(all_titles[0], str): | ||
title = all_titles[0] | ||
elif all_titles and all_titles[0].string: | ||
title = all_titles[0].string | ||
|
@@ -191,3 +196,41 @@ def get_valid_title(cls, source_record_id: str, xml: Tag) -> str: | |
) | ||
title = "Title not provided" | ||
return title | ||
|
||
@classmethod | ||
def get_source_link( | ||
cls, source_base_url: str, source_record_id: str, xml: Tag | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice catch, and I had also paused on this. I think if these are getting decoupled with the understanding transformers may override them, it's plausible they might need the document XML, so I included here. Because the base transform calls this method, it will always use the same signature, so a sub-class transformer couldn't add We could adopt a See pros and cons both ways. My thinking was to model these after other methods that are designed to overridden, like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But if helpful, we could add the docstring comment:
which There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's just add the comment, I did some more reading and see your point on losing the type hinting |
||
) -> str: | ||
""" | ||
Class method to set the source link for the item. | ||
|
||
May be overridden by source subclasses if needed. | ||
|
||
Default behavior is to concatenate the source base URL + source record id. | ||
|
||
Args: | ||
source_base_url: Source base URL. | ||
source_record_id: Record identifier for the source record. | ||
xml: A BeautifulSoup Tag representing a single XML record. | ||
- not used by default implementation, but could be useful for subclass | ||
overrides | ||
""" | ||
return source_base_url + source_record_id | ||
|
||
@classmethod | ||
def get_timdex_record_id(cls, source: str, source_record_id: str, xml: Tag) -> str: | ||
""" | ||
Class method to set the TIMDEX record id. | ||
|
||
May be overridden by source subclasses if needed. | ||
|
||
Default behavior is to concatenate the source name + source record id. | ||
|
||
Args: | ||
source: Source name. | ||
source_record_id: Record identifier for the source record. | ||
xml: A BeautifulSoup Tag representing a single XML record. | ||
- not used by default implementation, but could be useful for subclass | ||
overrides | ||
""" | ||
return f"{source}:{source_record_id.replace('/', '-')}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: you could call these directly in the dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change if you'd like, and I considered that too. But at the risk of a being a bit more verbose, I thought it helped break it up a bit. There is some important stuff happening here -- establishing record title, primary timdex id, link, etc. -- and I found it helpful to see those steps as distinct statements vs embedded in a dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional is always just a suggestion 🙂