Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gdt 116 update locations #113

Merged
merged 5 commits into from
Jan 22, 2024
Merged

Gdt 116 update locations #113

merged 5 commits into from
Jan 22, 2024

Conversation

ehanson8
Copy link
Contributor

Helpful background context

Includes several other minor bug fixes related to recently discovered issues with the MITAardvark transform.

How can a reviewer manually see the effects of these changes?

Run the following command to run the transform on the fixture:

pipenv run transform -o output/gismit.json -i  tests/fixtures/aardvark/aardvark_record_all_fields.jsonl -s gismit

Review output/gismit.json to see that WKT values are mapped to locations > geoshape.

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Includes new or updated dependencies?

YES

* Add functionality to record_is_deleted method as well as corresponding unit tests to reflect possible data scenarios
* Add unit tests for write_output_files method now that deleted records are properly processed
* Update aardvark_records fixtures to include deleted records
Why these changes are being introduced:
* OpenSearch can parse the WKT strings that are found in MITAardvark records so the values don't require the previously expected parsing.

How this addresses that need:
* Rename Location.geodata attribute to geoshape as well as update corresponding unit tests and fixtures
* Refactor get_locations method to store WKT strings as well as update corresponding unit tests and fixtures
* Remove parse_geodata_string from helpers.py and corresponding unit tests
* Add default kind value to get_identifiers method

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/GDT-116
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, think it's looking great. I requested changes mostly for the namespacing by way of semicolons ; in the subjects. I did leave a couple questions, but nothing very blocking. Nice work! Getting into the fiddly bits of the transformation.

transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
tests/sources/json/test_aardvark.py Outdated Show resolved Hide resolved
transmogrifier/sources/json/aardvark.py Outdated Show resolved Hide resolved
* Update aardvark fixtures
* Refactor record_is_deleted method and update corresponding unit test
* Update kind values for get_subjects method
@ehanson8 ehanson8 requested a review from ghukill January 17, 2024 18:18
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for the updates here.

Comment on lines +262 to +266
timdex.Subject(value=["Country"], kind="DCAT; Keyword"),
timdex.Subject(value=["Political boundaries"], kind="DCAT; Theme"),
timdex.Subject(value=["Some city, Some country"], kind="Dublin Core; Spatial"),
timdex.Subject(value=["Geography"], kind="Dublin Core; Subject"),
timdex.Subject(value=["Earth"], kind="Dublin Core; Subject"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm digging it. I've seen semicolons used for dilineating namespaces in freetext values like this. Feels like a decent middle ground.

Comment on lines +93 to +100
if isinstance(source_record["gbl_suppressed_b"], bool):
return source_record["gbl_suppressed_b"]
else:
message = (
f"Record ID '{cls.get_source_record_id(source_record)}': "
"'gbl_suppressed_b' value is not a boolean"
)
raise ValueError(message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. If we don't know definitively if the record is deleted or not, raises an error.

Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiya' Eric! Small request for change re: syntax for BBOX (...) strings in tests to align with MITAardvark.get_locations() method. Might be safer to include that space between characters in the event that OpenSearch requires it.

Also had one clarifying question!

Otherwise, I think this is looking great. :)

@@ -75,31 +75,6 @@ def parse_date_from_string(
return None


def parse_geodata_string(geodata_string: str, source_record_id: str) -> list[float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for deleting this helper function? Just want to understand. 🤓

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're just taking the WKT string as is given than OpenSearch can handle it, we don't need this function to do the parsing that we originally though was needed. Does that make sense?

@ehanson8 ehanson8 merged commit 4f83410 into main Jan 22, 2024
5 checks passed
@ehanson8 ehanson8 deleted the gdt-116-update-locations branch January 22, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants