-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/ted 87 metadata normaliser - extractor #14
Conversation
…ED-87 � Conflicts: � tests/conftest.py
Codecov Report
@@ Coverage Diff @@
## main #14 +/- ##
==========================================
+ Coverage 95.87% 96.37% +0.49%
==========================================
Files 22 26 +4
Lines 461 744 +283
==========================================
+ Hits 442 717 +275
- Misses 19 27 +8
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your efforts in figuring out XPaths! Also good start in organising the domain and service layers. We need some refactoring and simplification.
Extracting metadata from xml manifestation | ||
""" | ||
|
||
def __init__(self, manifestation_root, namespaces): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to leave out XML parsing or can we start (i.e. receive as a constructor parameter) from the XMLManifestation object?
I think operating with "domain" objects might improve the code readability.
If you agree, then I would invite you to reconsider having namespaces as a constructor parameter. Can this be determined from the XML manifestation?
Scenario: Extracting metadata | ||
Given a notice | ||
When the extracting process is executed | ||
Then a extracted metadata is available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to list here ALL the keys you expect to extract.
This way you are on the safe side with testing. Turn this into a Scenario overview
and list the expected keys as "examples"
tests/unit/domain/model/conftest.py
Outdated
@@ -31,7 +31,7 @@ def publicly_available_notice(fetched_notice_data) -> Notice: | |||
xml_manifestation=xml_manifestation) | |||
notice._rdf_manifestation = RDFManifestation(object_data="RDF manifestation content", validation=validation) | |||
notice._mets_manifestation = METSManifestation(object_data="METS manifestation content") | |||
notice._normalised_metadata = NormalisedMetadata(title="a never known title") | |||
notice._normalised_metadata = NormalisedMetadata(title=["a never known title"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the title!
|
||
|
||
def test_metadata(): | ||
metadata = TEDMetadata(**{"AA": "Value here", "No_key": "Value"}) | ||
assert metadata.AA == "Value here" | ||
assert "No_key" not in metadata.dict().keys() | ||
|
||
print(metadata.dict().keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to skip prints in the final version of tests;
assertions are best while testing.
extracted_metadata_dict = metadata_extractor.dict() | ||
assert isinstance(metadata_extractor, ExtractedMetadata) | ||
assert extracted_metadata_dict.keys() == ExtractedMetadata.__fields__.keys() | ||
assert notice_2018.ted_id in extracted_metadata_dict["notice_publication_number"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only this key?
You see, This is the reason why I recommended to list the keys as variables in the "feature file"
from ted_sws.metadata_normaliser.services.metadata_normalizer import MetadataNormaliser | ||
|
||
|
||
def test_metadata_extractor(raw_notice): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be a bit more sophisticated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is good to go after all the previous comments are addressed.
from ted_sws.domain.model.metadata import Metadata | ||
|
||
|
||
class LanguageTaggedString(NamedTuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generic enough to be moved into the common
package.
title_country: LanguageTaggedString = None | ||
|
||
|
||
class EncodedValue(NamedTuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be moved into common
module
:param element: | ||
:return: str | ||
""" | ||
if element is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really?!
what is the difference between if element is not None:
and 'if element:`
:return: | ||
""" | ||
if element is not None: | ||
return EncodedValue(code=extract_attribute_from_element(element=element, attrib_key="CODE"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning: we might not always have "CODE" attribute; it is worth having it as a default variable.
No description provided.