-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gdt 116 update locations #113
Conversation
* Add functionality to record_is_deleted method as well as corresponding unit tests to reflect possible data scenarios * Add unit tests for write_output_files method now that deleted records are properly processed * Update aardvark_records fixtures to include deleted records
Why these changes are being introduced: * OpenSearch can parse the WKT strings that are found in MITAardvark records so the values don't require the previously expected parsing. How this addresses that need: * Rename Location.geodata attribute to geoshape as well as update corresponding unit tests and fixtures * Refactor get_locations method to store WKT strings as well as update corresponding unit tests and fixtures * Remove parse_geodata_string from helpers.py and corresponding unit tests * Add default kind value to get_identifiers method Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/GDT-116
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, think it's looking great. I requested changes mostly for the namespacing by way of semicolons ;
in the subjects. I did leave a couple questions, but nothing very blocking. Nice work! Getting into the fiddly bits of the transformation.
* Update aardvark fixtures * Refactor record_is_deleted method and update corresponding unit test * Update kind values for get_subjects method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for the updates here.
timdex.Subject(value=["Country"], kind="DCAT; Keyword"), | ||
timdex.Subject(value=["Political boundaries"], kind="DCAT; Theme"), | ||
timdex.Subject(value=["Some city, Some country"], kind="Dublin Core; Spatial"), | ||
timdex.Subject(value=["Geography"], kind="Dublin Core; Subject"), | ||
timdex.Subject(value=["Earth"], kind="Dublin Core; Subject"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm digging it. I've seen semicolons used for dilineating namespaces in freetext values like this. Feels like a decent middle ground.
if isinstance(source_record["gbl_suppressed_b"], bool): | ||
return source_record["gbl_suppressed_b"] | ||
else: | ||
message = ( | ||
f"Record ID '{cls.get_source_record_id(source_record)}': " | ||
"'gbl_suppressed_b' value is not a boolean" | ||
) | ||
raise ValueError(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good. If we don't know definitively if the record is deleted or not, raises an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiya' Eric! Small request for change re: syntax for BBOX (...)
strings in tests to align with MITAardvark.get_locations() method. Might be safer to include that space between characters in the event that OpenSearch requires it.
Also had one clarifying question!
Otherwise, I think this is looking great. :)
@@ -75,31 +75,6 @@ def parse_date_from_string( | |||
return None | |||
|
|||
|
|||
def parse_geodata_string(geodata_string: str, source_record_id: str) -> list[float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for deleting this helper function? Just want to understand. 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're just taking the WKT string as is given than OpenSearch can handle it, we don't need this function to do the parsing that we originally though was needed. Does that make sense?
Helpful background context
Includes several other minor bug fixes related to recently discovered issues with the
MITAardvark
transform.How can a reviewer manually see the effects of these changes?
Run the following command to run the transform on the fixture:
Review
output/gismit.json
to see that WKT values are mapped tolocations > geoshape
.What are the relevant tickets?
Developer
Code Reviewer
(not just this pull request message)
Includes new or updated dependencies?
YES