Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/faciliate redownload on source url change #39

Merged
merged 5 commits into from
Apr 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions HOW_TO_UPDATE_TESTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,34 @@ Build the docker image locally

make build_test


MAKE SURE YOU HAVE THE CORRECT AWS CREDENTIALS SET UP.

export AWS_PROFILE=${PROFILE_NAME}

Set up the test buckets

python setup_test_buckets ${document_bucket} ${pipeline_bucket} ${region}
python -m integration_tests.setup_test_buckets ${document_bucket} ${pipeline_bucket} ${region}

Sync the test data to the s3 bucket

aws s3 sync integration_tests/data/pipeline_in s3://${pipeline_bucket}

Run the docker image
Run the docker image. If you are trying to figure out what the variables are look in the env var section of the following file: .github/workflows/integration-tests.yml. Also note that the prefixes used must match the subdirectory names of the data/pipeline_in directory.

docker run -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} -e API_HOST="" -e MACHINE_USER_EMAIL="" -e MACHINE_USER_PASSWORD="" navigator-data-ingest-test --pipeline-bucket ${PIPELINE_BUCKET} --document-bucket ${DOCUMENT_BUCKET} --input-file ${TEST_DATA_UPLOAD_PATH} --output-prefix ${OUTPUT_PREFIX} --embeddings-input-prefix ${EMBEDDINGS_INPUT_PREFIX} --indexer-input-prefix ${INDEXER_INPUT_PREFIX}

Assert that the output is correct and if so snyc the data locally to the pipeline_out directory
Example:

docker run -e AWS_ACCESS_KEY_ID=XXXX -e AWS_SECRET_ACCESS_KEY=XXXX -e API_HOST="" -e MACHINE_USER_EMAIL="" -e MACHINE_USER_PASSWORD="" navigator-data-ingest-test --pipeline-bucket pipbucket123123123 --document-bucket docbucket123123123 --input-file input/new_and_updated_documents.json --output-prefix ingest_unit_test_parser_input --embeddings-input-prefix ingest_unit_test_embeddings_input --indexer-input-prefix ingest_unit_test_indexer_input


Assert that the output is correct and if so manually delete all the files in the pipeline_out directory and sync the data locally to the pipeline_out directory

cd integration_tests/data/pipeline_out

aws s3 sync s3://${pipeline_bucket}/ .
aws s3 sync s3://${pipeline_bucket}/ .

Remove the test buckets

python -m integration_tests.remove_test_buckets ${document_bucket} ${pipeline_bucket} ${region}
Original file line number Diff line number Diff line change
Expand Up @@ -838,6 +838,102 @@
}
],
"slug": "european-union_2013_decision-no-13862013eu-of-the-european-parliament-and-of-the-council-of-20-november-2013-on-a-general-union-environment-action-programme-to-2020-living-well-within-the-limits-of-our-planet_8570_3017"
},
{
"publication_ts": "2013-01-01T00:00:00",
"name": "DECISION No 1386/2013/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 20 November 2013 on a General Union Environment Action Programme to 2020 \u2018Living well, within the limits of our planet\u2019",
"description": "The Decision no 1386/2013/EU sets up the General Union Environment Action Programme to 2020 \u2018Living well, within the limits of our planet'. It adopts the '7th Environment Action programme' or \u20187th EAP'. The priority objectives of the 7th EAP are: (a) to protect, conserve and enhance the Union's natural capital; (b) to turn the Union into a resource-efficient, green and competitive low-carbon economy; (c) to safeguard the Union's citizens from environment-related pressures and risks to health and well-being; (d) to maximise the benefits of Union environment legislation by improving implementation; (e) to improve the knowledge and evidence base for Union environment policy; (f) to secure investment for environment and climate policy and address environmental externalities; (g) to improve environmental integration and policy coherence; (h) to enhance the sustainability of the Union's cities; (i) to increase the Union's effectiveness in addressing inter\u00ad national environmental and climate-related challenges.",
"source_url": "http://existing.com",
"url": null,
"md5_sum": null,
"type": "EU Decision",
"source": "CCLW",
"import_id": "TESTCCLW.executive.3.3",
"category": "Law",
"frameworks": [],
"geography": "EUR",
"hazards": [],
"instruments": [
"Capacity building|Governance",
"Education, training and knowledge dissemination|Information"
],
"keywords": [
"Adaptation",
"Institutions / Administrative Arrangements",
"Research And Development",
"Energy Supply",
"Energy Demand",
"REDD+ And LULUCF",
"Transport"
],
"languages": [
"English"
],
"sectors": [
"Economy-wide",
"Health",
"Transport"
],
"topics": [
"Adaptation",
"Mitigation"
],
"events": [
{
"name": "Law passed",
"description": "",
"created_ts": "2013-11-20T00:00:00"
}
],
"slug": "european-union_2013_decision-no-13862013eu-of-the-european-parliament-and-of-the-council-of-20-november-2013-on-a-general-union-environment-action-programme-to-2020-living-well-within-the-limits-of-our-planet_8570_3017"
},
{
"publication_ts": "2013-01-01T00:00:00",
"name": "DECISION No 1386/2013/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 20 November 2013 on a General Union Environment Action Programme to 2020 \u2018Living well, within the limits of our planet\u2019",
"description": "description",
"source_url": "http://existing.com",
"url": null,
"md5_sum": null,
"type": "EU Decision",
"source": "CCLW",
"import_id": "TESTCCLW.executive.4.4",
"category": "Law",
"frameworks": [],
"geography": "EUR",
"hazards": [],
"instruments": [
"Capacity building|Governance",
"Education, training and knowledge dissemination|Information"
],
"keywords": [
"Adaptation",
"Institutions / Administrative Arrangements",
"Research And Development",
"Energy Supply",
"Energy Demand",
"REDD+ And LULUCF",
"Transport"
],
"languages": [
"English"
],
"sectors": [
"Economy-wide",
"Health",
"Transport"
],
"topics": [
"Adaptation",
"Mitigation"
],
"events": [
{
"name": "Law passed",
"description": "",
"created_ts": "2013-11-20T00:00:00"
}
],
"slug": "european-union_2013_decision-no-13862013eu-of-the-european-parliament-and-of-the-council-of-20-november-2013-on-a-general-union-environment-action-programme-to-2020-living-well-within-the-limits-of-our-planet_8570_3017"
}
],
"updated_documents": {
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"document_name": "name",
"document_description": "description",
"document_id": "TESTCCLW.executive.3.3",
"document_source_url": "http://existing.com",
"document_cdn_object": null,
"document_content_type": "text/html",
"document_md5_sum": null,
"document_metadata": {},
"document_slug": "fake_slug",
"languages": [
"en"
],
"translated": false,
"html_data": {
"detected_title": "One Stop Shop Service",
"detected_date": null,
"has_valid_text": true,
"text_blocks": [
{
"text": [
"Why use a One Stop Shop"
],
"text_block_id": "b0",
"language": "en",
"type": "Text",
"type_confidence": 1.0
}
]
},
"pdf_data": null
}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"document_name": "name", "document_description": "new description", "document_id": "TESTCCLW.executive.4.4", "document_source_url": "http://existing.com", "document_cdn_object": null, "document_content_type": "text/html", "document_md5_sum": null, "document_metadata": {}, "document_slug": "fake_slug", "languages": ["en"], "translated": false, "html_data": {"detected_title": "One Stop Shop Service", "detected_date": null, "has_valid_text": true, "text_blocks": [{"text": ["Why use a One Stop Shop"], "text_block_id": "b0", "language": "en", "type": "Text", "type_confidence": 1.0}]}, "pdf_data": null}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"document_name": "name",
"document_description": "description",
"document_id": "TESTCCLW.executive.3.3",
"document_source_url": "http://existing.com",
"document_cdn_object": null,
"document_content_type": "text/html",
"document_md5_sum": null,
"document_metadata": {},
"document_slug": "fake_slug",
"languages": [
"en"
],
"translated": false,
"html_data": {
"detected_title": "One Stop Shop Service",
"detected_date": null,
"has_valid_text": true,
"text_blocks": [
{
"text": [
"Why use a One Stop Shop"
],
"text_block_id": "b0",
"language": "en",
"type": "Text",
"type_confidence": 1.0
}
]
},
"pdf_data": null
}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"document_name": "name", "document_description": "new description", "document_id": "TESTCCLW.executive.4.4", "document_source_url": "http://existing.com", "document_cdn_object": null, "document_content_type": "text/html", "document_md5_sum": null, "document_metadata": {}, "document_slug": "fake_slug", "languages": ["en"], "translated": false, "html_data": {"detected_title": "One Stop Shop Service", "detected_date": null, "has_valid_text": true, "text_blocks": [{"text": ["Why use a One Stop Shop"], "text_block_id": "b0", "language": "en", "type": "Text", "type_confidence": 1.0}]}, "pdf_data": null}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"document_name": "name",
"document_description": "description",
"document_id": "TESTCCLW.executive.3.3",
"document_source_url": "http://existing.com",
"document_cdn_object": null,
"document_content_type": "text/html",
"document_md5_sum": null,
"document_metadata": {},
"document_slug": "fake_slug"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"document_name": "name", "document_description": "new description", "document_id": "TESTCCLW.executive.4.4", "document_source_url": "http://existing.com", "document_cdn_object": null, "document_content_type": "text/html", "document_md5_sum": null, "document_metadata": {}, "document_slug": "fake_slug"}
Original file line number Diff line number Diff line change
@@ -1 +1,56 @@
{"document_name": "name", "document_description": "description", "document_id": "TESTCCLW.executive.3.3", "document_source_url": "http://new.com", "document_cdn_object": null, "document_content_type": "text/html", "document_md5_sum": null, "document_metadata": {}, "document_slug": "fake_slug"}
{
"document_name": "DECISION No 1386/2013/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 20 November 2013 on a General Union Environment Action Programme to 2020 \u2018Living well, within the limits of our planet\u2019",
"document_description": "The Decision no 1386/2013/EU sets up the General Union Environment Action Programme to 2020 \u2018Living well, within the limits of our planet'. It adopts the '7th Environment Action programme' or \u20187th EAP'. The priority objectives of the 7th EAP are: (a) to protect, conserve and enhance the Union's natural capital; (b) to turn the Union into a resource-efficient, green and competitive low-carbon economy; (c) to safeguard the Union's citizens from environment-related pressures and risks to health and well-being; (d) to maximise the benefits of Union environment legislation by improving implementation; (e) to improve the knowledge and evidence base for Union environment policy; (f) to secure investment for environment and climate policy and address environmental externalities; (g) to improve environmental integration and policy coherence; (h) to enhance the sustainability of the Union's cities; (i) to increase the Union's effectiveness in addressing inter\u00ad national environmental and climate-related challenges.",
"document_id": "TESTCCLW.executive.3.3",
"document_source_url": "http://existing.com",
"document_cdn_object": null,
"document_content_type": null,
"document_md5_sum": null,
"document_metadata": {
"name": "DECISION No 1386/2013/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 20 November 2013 on a General Union Environment Action Programme to 2020 \u2018Living well, within the limits of our planet\u2019",
"description": "The Decision no 1386/2013/EU sets up the General Union Environment Action Programme to 2020 \u2018Living well, within the limits of our planet'. It adopts the '7th Environment Action programme' or \u20187th EAP'. The priority objectives of the 7th EAP are: (a) to protect, conserve and enhance the Union's natural capital; (b) to turn the Union into a resource-efficient, green and competitive low-carbon economy; (c) to safeguard the Union's citizens from environment-related pressures and risks to health and well-being; (d) to maximise the benefits of Union environment legislation by improving implementation; (e) to improve the knowledge and evidence base for Union environment policy; (f) to secure investment for environment and climate policy and address environmental externalities; (g) to improve environmental integration and policy coherence; (h) to enhance the sustainability of the Union's cities; (i) to increase the Union's effectiveness in addressing inter\u00ad national environmental and climate-related challenges.",
"import_id": "TESTCCLW.executive.3.3",
"slug": "european-union_2013_decision-no-13862013eu-of-the-european-parliament-and-of-the-council-of-20-november-2013-on-a-general-union-environment-action-programme-to-2020-living-well-within-the-limits-of-our-planet_8570_3017",
"publication_ts": "2013-01-01T00:00:00",
"source_url": "http://existing.com",
"type": "EU Decision",
"source": "CCLW",
"category": "Law",
"geography": "EUR",
"frameworks": [],
"instruments": [
"Capacity building|Governance",
"Education, training and knowledge dissemination|Information"
],
"hazards": [],
"keywords": [
"Adaptation",
"Institutions / Administrative Arrangements",
"Research And Development",
"Energy Supply",
"Energy Demand",
"REDD+ And LULUCF",
"Transport"
],
"languages": [
"English"
],
"sectors": [
"Economy-wide",
"Health",
"Transport"
],
"topics": [
"Adaptation",
"Mitigation"
],
"events": [
{
"name": "Law passed",
"description": "",
"created_ts": "2013-11-20T00:00:00"
}
]
},
"document_slug": "european-union_2013_decision-no-13862013eu-of-the-european-parliament-and-of-the-council-of-20-november-2013-on-a-general-union-environment-action-programme-to-2020-living-well-within-the-limits-of-our-planet_8570_3017"
}
Loading