Feature/ted 87 metadata normaliser - extractor #14

Dragos0000 · 2022-02-24T17:17:49Z

No description provided.

…ED-87 � Conflicts: � tests/conftest.py

codecov · 2022-02-24T17:46:22Z

Codecov Report

Merging #14 (58608bb) into main (843ccb7) will increase coverage by 0.49%.
The diff coverage is 96.83%.

@@            Coverage Diff             @@
##             main      #14      +/-   ##
==========================================
+ Coverage   95.87%   96.37%   +0.49%     
==========================================
  Files          22       26       +4     
  Lines         461      744     +283     
==========================================
+ Hits          442      717     +275     
- Misses         19       27       +8

Impacted Files	Coverage Δ
ted_sws/domain/model/metadata.py	`100.00% <ø> (ø)`
...etadata_normaliser/services/metadata_normalizer.py	`76.92% <76.92%> (ø)`
...r/services/xml_manifestation_metadata_extractor.py	`98.52% <98.52%> (ø)`
...sws/metadata_normaliser/services/xpath_registry.py	`98.82% <98.82%> (ø)`
ted_sws/metadata_normaliser/model/metadata.py	`100.00% <100.00%> (ø)`
ted_sws/metadata_normaliser/__init__.py	`100.00% <0.00%> (+100.00%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4367cb...58608bb. Read the comment docs.

costezki

I like your efforts in figuring out XPaths! Also good start in organising the domain and service layers. We need some refactoring and simplification.

ted_sws/domain/model/metadata.py

ted_sws/metadata_normaliser/services/__init__.py

costezki · 2022-02-24T17:30:10Z

ted_sws/metadata_normaliser/services/__init__.py

+      Extracting metadata from xml manifestation
+    """
+
+    def __init__(self, manifestation_root, namespaces):


Do we need to leave out XML parsing or can we start (i.e. receive as a constructor parameter) from the XMLManifestation object?

I think operating with "domain" objects might improve the code readability.

If you agree, then I would invite you to reconsider having namespaces as a constructor parameter. Can this be determined from the XML manifestation?

costezki · 2022-02-24T17:58:11Z

tests/features/metadata_normaliser/notice_extractor.feature

+  Scenario: Extracting metadata
+    Given a notice
+    When the extracting process is executed
+    Then a extracted metadata is available


I think we want to list here ALL the keys you expect to extract.

This way you are on the safe side with testing. Turn this into a Scenario overview and list the expected keys as "examples"

costezki · 2022-02-24T17:58:47Z

tests/unit/domain/model/conftest.py

@@ -31,7 +31,7 @@ def publicly_available_notice(fetched_notice_data) -> Notice:
                    xml_manifestation=xml_manifestation)
    notice._rdf_manifestation = RDFManifestation(object_data="RDF manifestation content", validation=validation)
    notice._mets_manifestation = METSManifestation(object_data="METS manifestation content")
-    notice._normalised_metadata = NormalisedMetadata(title="a never known title")
+    notice._normalised_metadata = NormalisedMetadata(title=["a never known title"])


I like the title!

costezki · 2022-02-24T17:59:33Z

tests/unit/domain/model/test_metadata.py



 def test_metadata():
    metadata = TEDMetadata(**{"AA": "Value here", "No_key": "Value"})
    assert metadata.AA == "Value here"
    assert "No_key" not in metadata.dict().keys()
-
+    print(metadata.dict().keys())


we want to skip prints in the final version of tests;
assertions are best while testing.

costezki · 2022-02-24T18:00:47Z

tests/unit/notice_normaliser/test_metadata_extractor.py

+    extracted_metadata_dict = metadata_extractor.dict()
+    assert isinstance(metadata_extractor, ExtractedMetadata)
+    assert extracted_metadata_dict.keys() == ExtractedMetadata.__fields__.keys()
+    assert notice_2018.ted_id in extracted_metadata_dict["notice_publication_number"]


why only this key?

You see, This is the reason why I recommended to list the keys as variables in the "feature file"

costezki · 2022-02-24T18:01:30Z

tests/unit/notice_normaliser/test_metadata_normaliser.py

+from ted_sws.metadata_normaliser.services.metadata_normalizer import MetadataNormaliser
+
+
+def test_metadata_extractor(raw_notice):


this might be a bit more sophisticated

costezki

It is good to go after all the previous comments are addressed.

costezki · 2022-03-03T07:58:21Z

ted_sws/metadata_normaliser/model/metadata.py

+from ted_sws.domain.model.metadata import Metadata
+
+
+class LanguageTaggedString(NamedTuple):


This is generic enough to be moved into the common package.

costezki · 2022-03-03T07:58:44Z

ted_sws/metadata_normaliser/model/metadata.py

+    title_country: LanguageTaggedString = None
+
+
+class EncodedValue(NamedTuple):


This can be moved into common module

costezki · 2022-03-03T07:59:53Z

ted_sws/metadata_normaliser/services/xml_manifestation_metadata_extractor.py

+    :param element:
+    :return: str
+    """
+    if element is not None:


really?!

what is the difference between if element is not None: and 'if element:`

costezki · 2022-03-03T08:00:46Z

ted_sws/metadata_normaliser/services/xml_manifestation_metadata_extractor.py

+    :return:
+    """
+    if element is not None:
+        return EncodedValue(code=extract_attribute_from_element(element=element, attrib_key="CODE"),


Warning: we might not always have "CODE" attribute; it is worth having it as a default variable.

Dragos0000 added 3 commits February 21, 2022 11:35

wip

77cbed8

metadata normaliser

fbd930a

Merge branch 'main' of github.com:meaningfy-ws/ted-sws into feature/T…

0fe9179

…ED-87 � Conflicts: � tests/conftest.py

Dragos0000 requested review from costezki and CaptainOfHacks February 24, 2022 17:17

Dragos0000 added 2 commits February 24, 2022 17:32

fixed test with new implementation from master

0b4220e

fixed test with new implementation from master 2

836eb78

costezki requested changes Feb 24, 2022

View reviewed changes

Dragos0000 added 5 commits February 27, 2022 10:54

first review changes

d2b360e

refactored metadata extractor

fd588a9

added xpath registry

0060777

Changes by lps

c305fe1

added test and removed print statement

347cab9

costezki approved these changes Mar 3, 2022

View reviewed changes

costezki added 2 commits March 11, 2022 15:25

minor refactoring

e48ae78

Merge branch 'main' into feature/TED-87

58608bb

costezki merged commit 0286ae3 into main Mar 11, 2022

costezki deleted the feature/TED-87 branch March 11, 2022 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/ted 87 metadata normaliser - extractor #14

Feature/ted 87 metadata normaliser - extractor #14

Dragos0000 commented Feb 24, 2022

codecov bot commented Feb 24, 2022 •

edited

Loading

costezki left a comment

costezki Feb 24, 2022

costezki Feb 24, 2022

costezki Feb 24, 2022

costezki Feb 24, 2022

costezki Feb 24, 2022

costezki Feb 24, 2022

costezki left a comment

costezki Mar 3, 2022

costezki Mar 3, 2022

costezki Mar 3, 2022

costezki Mar 3, 2022

		from ted_sws.metadata_normaliser.services.metadata_normalizer import MetadataNormaliser


		def test_metadata_extractor(raw_notice):

		from ted_sws.domain.model.metadata import Metadata


		class LanguageTaggedString(NamedTuple):

		title_country: LanguageTaggedString = None


		class EncodedValue(NamedTuple):

Feature/ted 87 metadata normaliser - extractor #14

Feature/ted 87 metadata normaliser - extractor #14

Conversation

Dragos0000 commented Feb 24, 2022

codecov bot commented Feb 24, 2022 • edited Loading

Codecov Report

costezki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

costezki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 24, 2022 •

edited

Loading