Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research why published MUR was missing from site and resolve #4592

Closed
1 task
lbeaufort opened this issue Aug 31, 2020 · 18 comments
Closed
1 task

Research why published MUR was missing from site and resolve #4592

lbeaufort opened this issue Aug 31, 2020 · 18 comments

Comments

@lbeaufort
Copy link
Member

lbeaufort commented Aug 31, 2020

From contractor:

We published MUR 7594 and are able to see the data in the PG, but are not able to find the case on the website. Can you please look into this and let us know of any issues.

See follow-up instructions in comment below.

Verified: https://www.fec.gov/data/legal/search/enforcement/?case_no=7594

Following is the query we used:

SELECT V.TBL, V.CNT FROM (
SELECT 'DOCUMENT' as TBL, COUNT(1) as CNT FROM FECMUR.DOCUMENT A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'CALENDAR' as TBL, COUNT(*) as CNT FROM FECMUR.CALENDAR A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'CASE_SUBJECT' as TBL, COUNT(*) as CNT FROM FECMUR.CASE_SUBJECT A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'ELECTIONCYCLE' as TBL, COUNT(*) as CNT FROM FECMUR.ELECTIONCYCLE A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'VIOLATIONS' as TBL, COUNT(*) as CNT FROM FECMUR.VIOLATIONS A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'VOTES' as TBL, COUNT(*) as CNT FROM FECMUR.VOTES A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'COMMISSION' as TBL, COUNT(1) as CNT FROM FECMUR.COMMISSION A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'RELATEDOBJECTS' as TBL, COUNT(*) as CNT FROM FECMUR.RELATEDOBJECTS UNION
SELECT 'ENTITY' as TBL, COUNT(*) as CNT FROM FECMUR.ENTITY UNION
SELECT 'SETTLEMENT' as TBL, COUNT(*) as CNT FROM FECMUR.SETTLEMENT A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'PLAYERS' as TBL, COUNT(*) as CNT FROM FECMUR.PLAYERS A join FECMUR.CASE B on A.case_id = B.case_id where B.case_type = 'MUR' and B.case_no in ('7594') UNION
SELECT 'CASE' as TBL, COUNT(*) as CNT FROM FECMUR.CASE B where B.case_type = 'MUR' and B.case_no in ('7594')) V ORDER BY V.TBL;

Completion criteria:

  • Make sure new MURs will appear on the site
@lbeaufort lbeaufort self-assigned this Aug 31, 2020
@lbeaufort lbeaufort added this to the Sprint 13.3 milestone Aug 31, 2020
@lbeaufort
Copy link
Member Author

lbeaufort commented Aug 31, 2020

Reloaded manually in prod:

cf run-task api "python manage.py load_current_murs -s 7594" -m 4G --name load-MUR-7594

Appears on prod but not stage (I didn't fix it manually there)
https://www.fec.gov/data/legal/matter-under-review/7594/
https://stage.fec.gov/data/legal/matter-under-review/7594/

  • Figure out why this wasn't caught by celery SQL - probably an issue with the pg_date or celery - this was reported at 6pm on Friday 7/28/20

Celery SQL:

RECENTLY_MODIFIED_CASES = """
    SELECT case_no, case_type, pg_date, published_flg
    FROM fecmur.cases_with_parsed_case_serial_numbers_vw
    WHERE pg_date >= NOW() - '8 hour'::INTERVAL
    ORDER BY case_serial
"""

@lbeaufort lbeaufort added the Bug label Aug 31, 2020
@lbeaufort lbeaufort modified the milestones: Sprint 13.3, Sprint 13.4 Aug 31, 2020
@lbeaufort lbeaufort changed the title MUR 7594 missing from site Research why published MUR was missing from site, make sure it won't happen again Aug 31, 2020
@lbeaufort lbeaufort removed their assignment Aug 31, 2020
@jason-upchurch jason-upchurch self-assigned this Sep 1, 2020
@jason-upchurch jason-upchurch changed the title Research why published MUR was missing from site, make sure it won't happen again Research why published MUR was missing from site and resolve Sep 3, 2020
@jason-upchurch
Copy link
Contributor

jason-upchurch commented Sep 3, 2020

Thanks @pkfec for your awesome tutorial on using kibana--I was able to find the following error in stage:

"msg\":\"[2020-08-28 23:55:24,728: ERROR/ForkPoolWorker-1] 
Error uploading document ID 100495928 for MUR %7594: No file image\"

Will continue investigating on the db side and then coordinate appropriately.

cc @patphongs @PaulClark2 to recognize Priya's instruction as invaluable and timesaving.

@jason-upchurch
Copy link
Contributor

from fecmur.cases_with_parsed_case_serial_numbers_vw (stage)

case_id case_no case_serial name case_type pg_date published_flg
60000002999600 7594 7594 ENBRIDGE, INC. MUR 2020-08-28 17:17:20.964009 t

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Sep 3, 2020

refresh_cases found the MUR per logging:
logger.info("%s %s found modified at %s", row["case_type"]

MUR 7594 found modified at 2020-08-28 09:53:59.397021

and load cases:

INFO/ForkPoolWorker-1] Loading MUR: 7594

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Sep 3, 2020

def get_elasticsearch_connection():
    es_conn = env.get_service(name='fec-api-search56')
    if es_conn:
        url = es_conn.get_url(url='uri')
    else:
        url = 'http://localhost:9200'
    es = Elasticsearch(url, timeout=30, max_retries=10, retry_on_timeout=True)
    return es

@jason-upchurch
Copy link
Contributor

es.index('docs_index', get_es_type(case_type), case, id=case['doc_id'])

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Sep 3, 2020

SELECT 
  doc.document_id, 
  mur.case_no, 
  mur.case_type, 
  doc.filename, 
  doc.category, 
  doc.description, 
  doc.ocrtext, 
  doc.fileimage, 
  length(fileimage) 
AS length, doc.doc_order_id, doc.document_date 
FROM fecmur.document doc 
INNER JOIN fecmur.cases_with_parsed_case_serial_numbers_vw mur 
ON mur.case_id = doc.case_id 
WHERE mur.case_no = '7594' AND doc.document_id='100495928' 
ORDER BY doc.doc_order_id, doc.document_date desc, doc.document_id DESC;

==>

document_id case_no case_type filename category description ocrtext fileimage length doc_order_id document_date
100495928 7594 MUR 7594_449.pdf General Counsel Reports, Briefs, Notifications and Responses 3 2019-12-04 00:00:00

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Sep 3, 2020

SELECT 
  doc.document_id, 
  mur.case_no, 
  mur.case_type, 
  doc.filename, 
  doc.category, 
  doc.description, 
  doc.ocrtext, 
  doc.fileimage, 
  length(fileimage) 
AS length, doc.doc_order_id, doc.document_date 
FROM fecmur.document doc 
INNER JOIN fecmur.cases_with_parsed_case_serial_numbers_vw mur 
ON mur.case_id = doc.case_id 
WHERE mur.case_no = '7594' AND doc.fileimage is NULL
ORDER BY doc.doc_order_id, doc.document_date desc, doc.document_id DESC;

found several missing fileimages.

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Sep 9, 2020

verified in prod, stage, and dev that for every MUR in particular db, there is a corresponding publication of that MUR on our particular website env, except for 7594, which is missing from stage and dev website (Laura manually uploaded to prod). imagefiles appear missing from the respective dbs.

I have a request out to the contractor to discuss process. Folks reading this: please do not manually add to stage or dev as I would like to use this case as a test. Prod is no longer affected after the manual upload.

@jason-upchurch
Copy link
Contributor

call set for 11 a.m. today

@jason-upchurch
Copy link
Contributor

asked contractor to reload data in stage only to verify that celery picks the task up. There was redis maintenance today which affected celery.

@jason-upchurch
Copy link
Contributor

Screen Shot 2020-09-15 at 3 23 30 PM

Screen Shot 2020-09-15 at 3 23 38 PM

@fec-jli @pkfec just to keep you updated--this refresh does indeed appear to be happening only once every hour + 5 more minutes from the last refresh! Also considering how QueueOnce will come into play on a task that runs every 5 minutes and might take longer than 5 minutes to complete.

@jason-upchurch
Copy link
Contributor

Screen Shot 2020-09-15 at 6 54 37 PM

changing to ignore_results=True allows a 5-minute interval.
Changing back to QueueOnce to assess behavior

@jason-upchurch
Copy link
Contributor

Screen Shot 2020-09-15 at 9 06 50 PM

changing to QueueOnce results in resumption of hourly interval

@jason-upchurch
Copy link
Contributor

summary of findings:

  • there are 667 rows being asked for in the get_documents function of current_cases
  • The MUR in question is present throughout the refresh logic all the way up to the point of the query:
logger.info("TEST get_documents BEFORE rs {0}" .format(case_id)) # YES MUR case id 60000002999600
  • Then the query is run:
   rs = conn.execute(CASE_DOCUMENTS, case_id)  
  • that query is:
SELECT doc.document_id, 
  mur.case_no, 
  mur.case_type, 
  doc.filename, 
  doc.category, 
  doc.description, 
  doc.ocrtext, 
  doc.fileimage, 
  length(fileimage) 
AS length, doc.doc_order_id, doc.document_date 
FROM fecmur.document doc 
INNER JOIN fecmur.cases_with_parsed_case_serial_numbers_vw mur 
ON mur.case_id = doc.case_id 
WHERE doc.case_id = '60000002999600' 
ORDER BY doc.doc_order_id, doc.document_date desc, doc.document_id DESC;
  • the MUR is missing from the resultset (in sqlalchmey--it's a good local db query)
 logger.info("TEST get_documents AFTER rs {0}" .format(case_id)) # NO MUR case id 60000002999600

@jason-upchurch
Copy link
Contributor

checking logs reveals for the MUR the following was raised:

   2020-09-25T13:20:06.87-0400 [APP/PROC/WEB/0] ERR [2020-09-25 17:20:06,875: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:75 exited with 'signal 9 (SIGKILL)'
   2020-09-25T13:20:06.88-0400 [APP/PROC/WEB/0] ERR [2020-09-25 17:20:06,887: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
   2020-09-25T13:20:06.88-0400 [APP/PROC/WEB/0] ERR Traceback (most recent call last):
   2020-09-25T13:20:06.88-0400 [APP/PROC/WEB/0] ERR   File "/home/vcap/deps/0/python/lib/python3.7/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost
   2020-09-25T13:20:06.88-0400 [APP/PROC/WEB/0] ERR     human_status(exitcode)),
   2020-09-25T13:20:06.88-0400 [APP/PROC/WEB/0] ERR billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
   2020-09-25T13:25:00.13-0400 [APP/PROC/WEB/0] ERR [2020-09-25 17:25:00,130: INFO/MainProcess] Received task: webservices.tasks.legal_docs.refresh[b7999675-5ee7-4993-89f0-b7575e930368]  

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Sep 25, 2020

The last environment to have the entire MUR successfully loaded was DEV. That completed today with the full inclusion of the disposition text: https://dev.fec.gov/data/legal/matter-under-review/7594/

The issue was that 1G celery-worker memory was not enough and the process errored out.

Opened issue #4638 to have team discussion on technical approach to strategically manage celery-worker memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants