Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Elasticsearch management commands #5367

Merged
merged 6 commits into from
Mar 30, 2023
Merged

Conversation

fec-jli
Copy link
Contributor

@fec-jli fec-jli commented Mar 11, 2023

Summary (required)

This PR refactor all ES management commands, fix some function bugs, make function reliable and easy to maintain in the future.
Also fix case document category description display bug #5369

Completion criteria

  • Refactor and review all Elasticsearch management commands
  • Auto Add/update Case, AO test cases
  • Test all management commands on local and space
  • Update ReadMe
  • Update Wiki

Required reviewers

1-2 devs

Impacted areas of the application

Elasticsearch management (index, upload legal data, repository, snapshot.)
ES_flow_chart

Update ES mapping and reload data flowchart
update_mapping_chart

How to test

Part One: test partial commands on local

python cli.py display_index_alias
python cli.py create_index case_index
python cli.py create_index ao_index
python cli.py create_index arch_mur_index
python cli.py delete_index case_index
python cli.py delete_index ao_index
python cli.py delete_index arch_mur_index

python cli.py display_mapping case_index
python cli.py display_mapping ao_index
python cli.py display_mapping arch_mur_index
  • Upload sample data in local ES
python cli.py load_current_murs 7958
python cli.py load_current_murs 8003

python cli.py load_admin_fines 3726 
python cli.py load_admin_fines 3571
python cli.py load_admin_fines 3314

python cli.py load_adrs 101
python cli.py load_adrs 102
python cli.py load_adrs 103

python cli.py load_advisory_opinions 2022-08

python cli.py load_archived_murs 9
python cli.py load_archived_murs 99
python cli.py load_archived_murs 1

python cli.py load_statutes

export FEC_EREGS_API=https://fec-dev-eregs.app.cloud.gov/regulations/api/
python cli.py load_regulations
  • test reload all data without downtime
python cli.py reload_all_data_by_index ao_index 
python cli.py reload_all_data_by_index case_index
python cli.py reload_all_data_by_index arch_mur_index
  • test initialize AO data on local with downtime 15 mins
python cli.py initialize_legal_data case_index
  • test update_mapping_and_reload_legal_data , this command work only on ao_index until we upgrade Elasticserch.
python cli.py update_mapping_and_reload_legal_data ao_index
  • test endpoint: /legal/docs/<DOC_TYPE>/no/
http://127.0.0.1:5000/v1/legal/docs/advisory_opinions/2022-25/
http://127.0.0.1:5000/v1/legal/docs/statutes/9001/
http://127.0.0.1:5000/v1/legal/docs/regulations/1.1/

http://127.0.0.1:5000/v1/legal/docs/adrs/1091/
http://127.0.0.1:5000/v1/legal/docs/admin_fines/4399/
http://127.0.0.1:5000/v1/legal/docs/murs/8070/
(archived mur)
http://127.0.0.1:5000/v1/legal/docs/murs/4790/

Part Two: deploy test branch on dev and repeat Part One test, then test all other commands(tasks), you can ignore repository management commands

  • test index tasks
cf run-task api --command "python cli.py display_index_alias" -m 2G --name display_index_alias
cf run-task api --command "python cli.py create_index" -m 2G --name create_case_index
cf run-task api --command "python cli.py create_index ao_index" -m 2G --name create_ao_index
cf run-task api --command "python cli.py create_index arch_mur_index" -m 2G --name create_arch_mur_index
  • test load or reload single legal data(when no mapping change)
cf run-task api --command "python cli.py load_advisory_opinions 2022-10" -m 2G  --name load_advisory_opinions  
cf run-task api --command "python cli.py load_current_murs 7212" -m 2G --name load_one_current_mur
cf run-task api --command "python cli.py load_current_murs 8003" -m 2G --name load_one_current_mur
cf run-task api --command "python cli.py load_admin_fines 3726" -m 2G --name load_one_admin_fine
cf run-task api --command "python cli.py load_admin_fines 3571" -m 2G --name load_one_admin_fine
cf run-task api --command "python cli.py load_adrs 101" -m 2G --name load_one_load_adr
cf run-task api --command "python cli.py load_adrs 102" -m 2G --name load_one_load_adr
cf run-task api --command "python cli.py load_archived_murs 400" -m 2G --name upload_one_arch_mur
cf run-task api --command "python cli.py load_archived_murs 99" -m 2G --name upload_one_arch_mur
  • test load or reload all legal data(when no mapping change)
(12 mins):
cf run-task api --command "python cli.py load_advisory_opinions" -m 2G --name load_all_advisory_opinions
(15 mins):
cf run-task api --command "python cli.py load_adrs" -m 2G --name load_all_adrs
(20 mins):
cf run-task api --command "python cli.py load_admin_fines" -m 2G --name load_all_admin_fines
(1 min):
cf run-task api --command "python cli.py load_regulations" -m 2G --name load_all_regulations
(1 min):
cf run-task api --command "python cli.py load_statutes" -m 2G --name load_all_statutes
(20 mins):
cf run-task api --command "python cli.py load_archived_murs" -m 2G --name upload_all_arch_mur
(1h 33mins):
cf run-task api --command "python cli.py load_current_murs" -m 2G --name load_all_current_murs
  • test reload all legal data by index (no downtime)
cf run-task api --command "python cli.py reload_all_data_by_index case_index" -m 4G --name reload_all_data_case
cf run-task api --command "python cli.py reload_all_data_by_index ao_index" -m 4G --name reload_all_data_ao
cf run-task api --command "python cli.py reload_all_data_by_index arch_mur_index" -m 4G
  • test repository tasks ( optional)
cf run-task api --command "python cli.py configure_snapshot_repository case_repo" -m 2G --name configure_snapshot_repository
cf run-task api --command "python cli.py configure_snapshot_repository ao_repo" -m 2G --name configure_snapshot_repository_ao
cf run-task api --command "python cli.py configure_snapshot_repository arch_mur_repo" -m 2G --name configure_snapshot_repository_arch_mur
  • test snapshot tasks
cf run-task api --command "python cli.py display_snapshots case_repo" -m 2G --name display_snapshots_case

cf run-task api --command "python cli.py restore_es_snapshot ao_repo ao_snapshot_202303091728 ao_index" -m 2G --name restore_es_snapshot_ao

cf run-task api --command "python cli.py restore_es_snapshot_downtime ao_repo ao_snapshot_202303091728 ao_index" -m 2G --name restore_es_snapshot_ao_dt

cf run-task api --command "python cli.py display_snapshot_detail ao_repo ao_snapshot_202303091728" -m 2G --name display_snapshot_detail_ao
  • test initialize data (with downtime 20mins - 2+ hours)
cf run-task api --command "python cli.py initialize_legal_data case_index" -m 4G --name init_case_data
cf run-task api --command "python cli.py initialize_legal_data ao_index" -m 4G --name init_ao_data
cf run-task api --command "python cli.py initialize_legal_data arch_mur_index" -m 4G --name init_arch_mur_data
  • test update_mapping_and_reload_legal_data , this command work only on ao_index until we upgrade Elasticserch.
    Increase api instance first: cf scale api -i 2 -m 2G
cf run-task api --command "python cli.py update_mapping_and_reload_legal_data ao_index" -m 4G --name update_mapping_reload_data_ao

@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2023

Codecov Report

Merging #5367 (cf6c3be) into develop (c3141ba) will decrease coverage by 0.23%.
The diff coverage is 21.82%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##           develop    #5367      +/-   ##
===========================================
- Coverage    85.79%   85.57%   -0.23%     
===========================================
  Files           81       81              
  Lines         8181     8201      +20     
===========================================
- Hits          7019     7018       -1     
- Misses        1162     1183      +21     
Impacted Files Coverage Δ
webservices/legal_docs/archived_murs.py 18.98% <0.00%> (-0.25%) ⬇️
webservices/legal_docs/regulations.py 18.51% <0.00%> (-0.72%) ⬇️
webservices/legal_docs/statutes.py 17.80% <0.00%> (-0.77%) ⬇️
webservices/tasks/__init__.py 71.42% <ø> (ø)
webservices/tasks/legal_docs.py 0.00% <0.00%> (ø)
webservices/legal_docs/current_cases.py 81.47% <7.69%> (-3.20%) ⬇️
webservices/legal_docs/__init__.py 41.46% <11.11%> (-14.79%) ⬇️
webservices/legal_docs/advisory_opinions.py 89.69% <21.05%> (-1.97%) ⬇️
webservices/legal_docs/es_management.py 27.20% <35.40%> (+4.15%) ⬆️
webservices/resources/legal.py 63.72% <100.00%> (+0.11%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@fec-jli fec-jli force-pushed the feature/create_ao_index_es branch from 64ee917 to e7ac02b Compare March 13, 2023 23:31
@fec-jli fec-jli changed the title [WIP]Refactor Elasticsearch management commands Refactor Elasticsearch management commands Mar 14, 2023
@cnlucas
Copy link
Member

cnlucas commented Mar 16, 2023

Since we are removing the initialize_current_legal_docs from the CLI, should we remove it from the README along with initialize_archived_mur_docs?

@fec-jli
Copy link
Contributor Author

fec-jli commented Mar 16, 2023

Since we are removing the initialize_current_legal_docs from the CLI, should we remove it from the README along with initialize_archived_mur_docs?

Yes, Thank you reminder.

@fec-jli fec-jli force-pushed the feature/create_ao_index_es branch 3 times, most recently from 8387413 to 57ab5df Compare March 20, 2023 01:33
@fec-jli fec-jli force-pushed the feature/create_ao_index_es branch from 2429aa4 to d528d8e Compare March 27, 2023 20:00
@pkfec
Copy link
Contributor

pkfec commented Mar 27, 2023

Test advisory_opinion:

python cli.py reload_all_data_by_index ao_index when indexes do not exist -

  1. log level should be ERROR and not INFO.
  2. Cannot is one word and not can not

python cli.py update_mapping_and_reload_legal_data ao_index:

original_alias and original_index, swapping_index are not indexes. Rephrase the log message to original index and alias, swapping index etc...

  1. INFO:webservices.legal_docs.es_management: Removing original_alias 'ao_alias' from original_index 'ao_index'...
  2. INFO:webservices.legal_docs.es_management: Switching original_alias 'ao_alias' to swapping_index 'ao_swap_index'...
  3. INFO:webservices.legal_docs.es_management: Switched original_alias 'ao_alias' to swapping_index 'ao_swap_index' successfully.
  4. INFO:webservices.legal_docs.es_management: Removing original_alias 'search_alias' from original_index 'ao_index'...
  5. INFO:webservices.legal_docs.es_management: Switching original_alias 'search_alias' to swapping_index 'ao_swap_index'...
  6. INFO:webservices.legal_docs.es_management: Switched original_alias 'search_alias' to swapping_index 'ao_swap_index' successfully.
  7. change L to lower case in this log message:
    INFO:webservices.legal_docs.advisory_opinions:Index alias 'ao_alias' exists, start Loading advisory opinions...

@fec-jli fec-jli force-pushed the feature/create_ao_index_es branch 2 times, most recently from 8cd230a to fef42ea Compare March 29, 2023 15:54
@pkfec
Copy link
Contributor

pkfec commented Mar 30, 2023

Move all the log msg inside the condition if es_client.indices.exists(index=<index_name>) when regulations and current cases are reloaded without a proper index.


2023-03-29T15:17:38.05-0400 [APP/TASK/reload_all_data_ao/0] OUT ERROR:webservices.legal_docs.advisory_opinions: The index alias 'ao_alias' is not found, cannot load advisory opinions
   2023-03-29T15:17:38.16-0400 [APP/TASK/reload_all_data_ao/0] OUT ERROR:webservices.legal_docs.statutes: The index alias 'ao_alias' is not found, cannot load statutes.
   2023-03-29T15:17:38.16-0400 [APP/TASK/reload_all_data_ao/0] OUT INFO:webservices.legal_docs.regulations:Uploading regulations...
   2023-03-29T15:17:38.33-0400 [APP/TASK/reload_all_data_ao/0] OUT ERROR:webservices.legal_docs.regulations: The index alias 'ao_alias' is not found, cannot load regulations.
   2023-03-29T15:17:38.56-0400 [APP/TASK/reload_all_data_ao/0] OUT Exit status 0
2023-03-29T15:37:03.60-0400 [CELL/0] OUT Cell 666f4c22-43e5-4020-95f1-36eef13541b8 successfully created container for instance e89b40c5-a0f1-4f7a-8a88-48a6b1faf0bc
   2023-03-29T15:37:16.04-0400 [APP/TASK/reload_all_data_case/0] OUT INFO:webservices.legal_docs.current_cases:Loading MUR(s)
   2023-03-29T15:37:16.13-0400 [APP/TASK/reload_all_data_case/0] OUT ERROR:webservices.legal_docs.current_cases: The index alias 'case_alias' is not found, cannot load cases (mur/adr/af)
   2023-03-29T15:37:16.13-0400 [APP/TASK/reload_all_data_case/0] OUT INFO:webservices.legal_docs.current_cases:Loading ADR(s)
   2023-03-29T15:37:16.15-0400 [APP/TASK/reload_all_data_case/0] OUT ERROR:webservices.legal_docs.current_cases: The index alias 'case_alias' is not found, cannot load cases (mur/adr/af)
   2023-03-29T15:37:16.15-0400 [APP/TASK/reload_all_data_case/0] OUT INFO:webservices.legal_docs.current_cases:Loading AF(s)
   2023-03-29T15:37:16.19-0400 [APP/TASK/reload_all_data_case/0] OUT ERROR:webservices.legal_docs.current_cases: The index alias 'case_alias' is not found, cannot load cases (mur/adr/af)

Copy link
Contributor

@pkfec pkfec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fec-jli Thanks a bunch for refactoring elasticsearch service code and fixing many underlying bug in existing code. It's a rough task but you made it look easy. All tasks worked well except update_mapping_and_reload_legal_data case_index and update_mapping_and_reload_legal_data due to elasticsearch v7 index limitations which we can revisit when we upgrade to elasticsearch v8. Overall, you did an amazing job @fec-jli

@fec-jli fec-jli force-pushed the feature/create_ao_index_es branch from fef42ea to cf6c3be Compare March 30, 2023 00:47
@fec-jli
Copy link
Contributor Author

fec-jli commented Mar 30, 2023

Note:
The function update_mapping_and_reload_legal_data work only on ao_index so far.
We got Reindex exception error 429 rejected execution of coordinating operation(see below) when run this command on case_index and arch_mur_index. This issue will be reviewed after upgrading Elasticsearch version in the future.

Reindexing all documents from index 'case_swap_index' to index 'case_index'...
   2023-03-28T22:05:05.98-0400 [APP/TASK/update_mapping_reload_data_case/0] OUT WARNING:elasticsearch:POST [https://xxxxx.us-gov-west-1.es.amazonaws.com:443/_reindex?wait_for_completion=true](https://xxxxx.us-gov-west-1.es.amazonaws.com/_reindex?wait_for_completion=true) [status:429 request:7.402s]
   2023-03-28T22:05:05.98-0400 [APP/TASK/update_mapping_reload_data_case/0] OUT ERROR:webservices.legal_docs.es_management: Reindex exception error = (429, 'es_rejected_execution_exception', {'error': {'root_cause': [{'type': 'es_rejected_execution_exception', 'reason': 'rejected execution of coordinating operation [coordinating_and_primary_bytes=0, replica_bytes=0, all_bytes=0, coordinating_operation_bytes=217401225, max_coordinating_and_primary_bytes=199583334]'}], 'type': 'es_rejected_execution_exception', 'reason': 'rejected execution of coordinating operation [coordinating_and_primary_bytes=0, replica_bytes=0, all_bytes=0, coordinating_operation_bytes=217401225, max_coordinating_and_primary_bytes=199583334]'}, 'status': 429})
   2023-03-28T22:05:36.00-0400 [APP/TASK/update_mapping_reload_data_case/0] OUT INFO:webservices.legal_docs.es_management: Reindexed all documents from index 'case_swap_index' to index 'case_index' successfully.
   2023-03-28T22:05:36.01-0400 [APP/TASK/update_mapping_reload_data_case/0] OUT INFO:webservices.legal_docs.es_management: Moving aliases 'case_alias' and 'search_alias' to point to case_index...
   2023-03-28T22:05:41.08-0400 [APP/TASK/update_mapping_reload_data_case/0] OUT INFO:webservices.legal_docs.es_management: Moved aliases 'case_alias' and 'search_alias' to point to 'case_index' successfully.

Copy link
Member

@cnlucas cnlucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible work, @fec-jli. Thank you so much! All worked well for me.

@cnlucas cnlucas merged commit 8da9cb1 into develop Mar 30, 2023
@pkfec pkfec deleted the feature/create_ao_index_es branch May 4, 2023 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MUR document category descriptions are from doc_order table Refactor Elasticsearch management functions
4 participants