Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ted 87 metadata normaliser - extractor #14

Merged
merged 12 commits into from
Mar 11, 2022
Merged

Conversation

Dragos0000
Copy link
Collaborator

No description provided.

@codecov
Copy link

codecov bot commented Feb 24, 2022

Codecov Report

Merging #14 (58608bb) into main (843ccb7) will increase coverage by 0.49%.
The diff coverage is 96.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #14      +/-   ##
==========================================
+ Coverage   95.87%   96.37%   +0.49%     
==========================================
  Files          22       26       +4     
  Lines         461      744     +283     
==========================================
+ Hits          442      717     +275     
- Misses         19       27       +8     
Impacted Files Coverage Δ
ted_sws/domain/model/metadata.py 100.00% <ø> (ø)
...etadata_normaliser/services/metadata_normalizer.py 76.92% <76.92%> (ø)
...r/services/xml_manifestation_metadata_extractor.py 98.52% <98.52%> (ø)
...sws/metadata_normaliser/services/xpath_registry.py 98.82% <98.82%> (ø)
ted_sws/metadata_normaliser/model/metadata.py 100.00% <100.00%> (ø)
ted_sws/metadata_normaliser/__init__.py 100.00% <0.00%> (+100.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4367cb...58608bb. Read the comment docs.

Copy link
Collaborator

@costezki costezki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your efforts in figuring out XPaths! Also good start in organising the domain and service layers. We need some refactoring and simplification.

ted_sws/domain/model/metadata.py Outdated Show resolved Hide resolved
ted_sws/metadata_normaliser/services/__init__.py Outdated Show resolved Hide resolved
ted_sws/metadata_normaliser/services/__init__.py Outdated Show resolved Hide resolved
ted_sws/metadata_normaliser/services/__init__.py Outdated Show resolved Hide resolved
Extracting metadata from xml manifestation
"""

def __init__(self, manifestation_root, namespaces):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to leave out XML parsing or can we start (i.e. receive as a constructor parameter) from the XMLManifestation object?

I think operating with "domain" objects might improve the code readability.

If you agree, then I would invite you to reconsider having namespaces as a constructor parameter. Can this be determined from the XML manifestation?

Scenario: Extracting metadata
Given a notice
When the extracting process is executed
Then a extracted metadata is available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to list here ALL the keys you expect to extract.

This way you are on the safe side with testing. Turn this into a Scenario overview and list the expected keys as "examples"

@@ -31,7 +31,7 @@ def publicly_available_notice(fetched_notice_data) -> Notice:
xml_manifestation=xml_manifestation)
notice._rdf_manifestation = RDFManifestation(object_data="RDF manifestation content", validation=validation)
notice._mets_manifestation = METSManifestation(object_data="METS manifestation content")
notice._normalised_metadata = NormalisedMetadata(title="a never known title")
notice._normalised_metadata = NormalisedMetadata(title=["a never known title"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the title!



def test_metadata():
metadata = TEDMetadata(**{"AA": "Value here", "No_key": "Value"})
assert metadata.AA == "Value here"
assert "No_key" not in metadata.dict().keys()

print(metadata.dict().keys())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to skip prints in the final version of tests;
assertions are best while testing.

extracted_metadata_dict = metadata_extractor.dict()
assert isinstance(metadata_extractor, ExtractedMetadata)
assert extracted_metadata_dict.keys() == ExtractedMetadata.__fields__.keys()
assert notice_2018.ted_id in extracted_metadata_dict["notice_publication_number"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only this key?

You see, This is the reason why I recommended to list the keys as variables in the "feature file"

from ted_sws.metadata_normaliser.services.metadata_normalizer import MetadataNormaliser


def test_metadata_extractor(raw_notice):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a bit more sophisticated

Copy link
Collaborator

@costezki costezki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to go after all the previous comments are addressed.

from ted_sws.domain.model.metadata import Metadata


class LanguageTaggedString(NamedTuple):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generic enough to be moved into the common package.

title_country: LanguageTaggedString = None


class EncodedValue(NamedTuple):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved into common module

:param element:
:return: str
"""
if element is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really?!

what is the difference between if element is not None: and 'if element:`

:return:
"""
if element is not None:
return EncodedValue(code=extract_attribute_from_element(element=element, attrib_key="CODE"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning: we might not always have "CODE" attribute; it is worth having it as a default variable.

@costezki costezki merged commit 0286ae3 into main Mar 11, 2022
@costezki costezki deleted the feature/TED-87 branch March 11, 2022 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants