Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix totals for off-year special candidates on /candidates/totals endpoint #3198

Merged
merged 4 commits into from
Jun 19, 2018

Conversation

lbeaufort
Copy link
Member

@lbeaufort lbeaufort commented Jun 7, 2018

Summary (required)

1) #3196: Fixes incorrect totals for /candidates/totals/ for off-year special election candidates

Change to ofec_candidate_totals_mv:
Add a SELECT DISTINCT:

 ), link AS ( --Off-year special election candidates have > 1 row/cycle
         SELECT DISTINCT cand_id,
            cand_election_yr + cand_election_yr % 2 as rounded_election_yr,
            fec_election_yr,
            cmte_id,
            cmte_dsgn
          FROM ofec_cand_cmte_linkage_mv
          WHERE cmte_dsgn IN ('P', 'A')
cand_id cand_election_yr fec_election_yr cmte_id cmte_tp cmte_dsgn
S0AL00156 2018 2018 C00640623 S P
S0AL00156 2017 2018 C00640623 S P

Performance

Before: 87037 rows, cost=(22821.52..22840.34)
After: 87037 rows, cost=(17454.39..17462.67)

Example: Doug Jones, Raised

Total raised Total Receipts (from candidate page)
$46,956,378.00 $23,478,189.00

Incorrect totals in API call:
https://api.open.fec.gov/v1/candidates/totals/?api_key=DEMO_KEY&per_page=20&page=1&candidate_id=S0AL00156&cycle=2018

2) #3200 : Fixes missing off-year special candidates

Change to ofec_candidate_totals_with_0s_mv: Remove restrictive join on totals (ofec_candidate_totals_mv) that was preventing odd-year special candidates who didn't run again (H8GA06195 is an example) from showing up:
AND cand.candidate_election_year = totals.election_year

Missing candidate in API call (top raising candidate in 2018)
https://api.open.fec.gov/v1/candidates/totals/?api_key=DEMO_KEY&sort_hide_null=true&q=H8GA06195&cycle=2018&sort=-receipts&per_page=30&page=1&office=H

Data tracking: https://docs.google.com/spreadsheets/d/1NUlhWZ6aHvNNSugPk3BWemDQSZn3pimjbqW_4IC_cnw/edit#gid=1784243483

Background

  • Resource is TotalsCandidateView
  • MV is ofec_candidate_totals_with_0s_mv which relies on ofec_candidate_totals_mv
  • ofec_candidate_totals_mv has incorrect totals for Doug Jones (S0AL00156)
  • The problem is with the join on ofec_cand_cmte_linkage_mv - off-year special election candidates will have two entries for 2018 in there.

Impacted areas:

How to test locally

  • Testing migration: Drop and recreate your local cfdm_test db, then invoke create_sample_db

Testing real data:

  • Connect your SQLA_CONN to the dev database.
    (There is an ofec_candidate_totals_mv_tmp and ofec_candidate_totals_with_0s_mv_tmp based off it for data testing.)
  • To use these MV's, change CandidateTotal(db.Model) in webservices/common/models/candidates.py to point to ofec_candidate_totals_with_0s_mv_tmp
  • Test totals for off-year special election candidates. S0AL00156 and H8GA06195 are a start, but extra eyes on other candidates are helpful. Compare totals to https://www.fec.gov/data/candidate// page. Can use "Impacted areas" to see where these appear on the front end https://docs.google.com/spreadsheets/d/1r9QtqGwWrK5ZZVz3tTBB8TpmDzPBgjzOvCmuHrbQ6tg/edit#gid=0

Outstanding items

@lbeaufort lbeaufort changed the title fix ofec_candidate_totals_mv doubling off-year special candidate totals [WIP] Fix totals for off-year special candidates on /candidates/totals endpoint Jun 7, 2018
@codecov-io
Copy link

codecov-io commented Jun 7, 2018

Codecov Report

Merging #3198 into develop will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop   #3198      +/-   ##
==========================================
- Coverage    88.15%   88.1%   -0.05%     
==========================================
  Files           77      77              
  Lines         6145    6129      -16     
==========================================
- Hits          5417    5400      -17     
- Misses         728     729       +1
Impacted Files Coverage Δ
webservices/common/models/candidates.py 100% <ø> (ø) ⬆️
webservices/flow.py 100% <ø> (ø) ⬆️
webservices/resources/candidate_aggregates.py 96.72% <100%> (-0.21%) ⬇️
webservices/resources/committees.py 84.09% <0%> (-1.14%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f95b78b...faa1e69. Read the comment docs.

@lbeaufort lbeaufort force-pushed the feature/fix-special-candidate-totals branch from 98e66f7 to b0b6b2b Compare June 8, 2018 12:29
@lbeaufort lbeaufort changed the title [WIP] Fix totals for off-year special candidates on /candidates/totals endpoint Fix totals for off-year special candidates on /candidates/totals endpoint Jun 8, 2018
@lbeaufort
Copy link
Member Author

I updated the data tracking sheet with the new data using the temporary MV's on dev: https://docs.google.com/spreadsheets/d/1NUlhWZ6aHvNNSugPk3BWemDQSZn3pimjbqW_4IC_cnw/edit#gid=1104244016

@hcaofec
Copy link
Contributor

hcaofec commented Jun 8, 2018

Data looks good. One thing I noticed though is when running this query, each ID returns two rows, the only difference is is_election column. So one candidate has both False and True for is_election column in 2018. I am not sure if that conforms to business rule.

select *
from public.ofec_candidate_totals_mv_tmp
where candidate_id in ('H8MT01182', 'H8GA06286')

@lbeaufort

@lbeaufort lbeaufort changed the title Fix totals for off-year special candidates on /candidates/totals endpoint [HOLD ON MERGE PLEASE] Fix totals for off-year special candidates on /candidates/totals endpoint Jun 8, 2018
@fecjjeng
Copy link
Contributor

fecjjeng commented Jun 8, 2018

select *
from public.ofec_candidate_totals_mv
where candidate_id in ('H8MT01182', 'H8GA06286')
would result in the same phenomenon. ie the same candidate has both False and True for is_election column in 2018 (the amount of money is good with the fix though).
So it seems the distinct solve one problem but not all problem?

@lbeaufort
Copy link
Member Author

@fecjjeng and @hcaofec thanks for taking a look. @jwchumley is going to spend some time investigating the data because Jon Ossoff is missing (see outstanding item in PR). I don't follow the is_election logic but it might be part of the problem.

@lbeaufort
Copy link
Member Author

lbeaufort commented Jun 8, 2018

@hcaofec @fecjjeng - Pat and I looked into is_election field and it really only makes sense for the Senate/Presidential candidates that need 2-year totals in an off-election year. is_election True means "This has the totals for the entire election cycle 4 or 6 years for Prez/Sen) and False means it's just the 2-year total within the larger 4 or 6 year cycle. S2MA00170 2018 is a good example.

@lbeaufort lbeaufort changed the title [HOLD ON MERGE PLEASE] Fix totals for off-year special candidates on /candidates/totals endpoint Fix totals for off-year special candidates on /candidates/totals endpoint Jun 9, 2018
@lbeaufort lbeaufort requested a review from rjayasekera June 11, 2018 16:07
@lbeaufort lbeaufort force-pushed the feature/fix-special-candidate-totals branch from 1bfabcb to 1a3990b Compare June 11, 2018 17:14
@lbeaufort lbeaufort changed the title Fix totals for off-year special candidates on /candidates/totals endpoint [ON HOLD] Fix totals for off-year special candidates on /candidates/totals endpoint Jun 11, 2018
@lbeaufort lbeaufort force-pushed the feature/fix-special-candidate-totals branch from 1a3990b to 1991c37 Compare June 12, 2018 00:01
@lbeaufort lbeaufort changed the title [ON HOLD] Fix totals for off-year special candidates on /candidates/totals endpoint Fix totals for off-year special candidates on /candidates/totals endpoint Jun 12, 2018
- Drop MV: ofec_candidate_history_latest_mv
- Remove MV from refresh management task
- Remove factories for CandidateHistoryLatest
@lbeaufort lbeaufort force-pushed the feature/fix-special-candidate-totals branch from f237541 to faa1e69 Compare June 12, 2018 19:23
@lbeaufort lbeaufort requested review from ccostino and removed request for fecjjeng June 13, 2018 19:19
@lbeaufort lbeaufort changed the title Fix totals for off-year special candidates on /candidates/totals endpoint [On hold] Fix totals for off-year special candidates on /candidates/totals endpoint Jun 15, 2018
@lbeaufort lbeaufort changed the title [On hold] Fix totals for off-year special candidates on /candidates/totals endpoint Fix totals for off-year special candidates on /candidates/totals endpoint Jun 15, 2018
@hcaofec
Copy link
Contributor

hcaofec commented Jun 19, 2018

By following the testing steps written in this PR, here are the results:

  1. Dropped and recreated cfdm_test db locally, then ran ‘invoke create_sample_db’, migration went successfully and the creating script of ofec_candidate_totals_mv_tmp has the latest version
  2. Set local SQLA_CONN to the dev database.
  3. Change CandidateTotal(db.Model)  to point to ofec_candidate_totals_with_0s_mv_tmp
  4. Test /candidate/:  + /candidates/totals/ endpoint, compared result with same endpoint in production
candidate_id counts before change after change
S0AL00156 before: 2; after: 3 "receipts": 46956378, "receipts": 23478189,
S8AL00340 1 "receipts": 12527334 "receipts": 6263667
H8GA06195 not found "receipts": 31381944.8

The data comparison looks good. But the creation sql script can be refactored to simply the logic. ( see comment above )

AND totals.cycle <= election.cand_election_year
AND totals.cycle > election.prev_election_year
GROUP BY link.cand_id, election.cand_election_year, totals.cycle
), election_aggregates AS (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The left join here ( line 66 - 69 ) may not be necessary here because of the distinct logic used in link subquery. And because ofec_candidate_election_mv is used on the right side of the left join, it will not have any effect on the result of cycle_totals subquery.
@vrajmohan @lbeaufort

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hcaofec good catch, thanks for taking a look! I had to make more changes to ofec_candidate_totals_mv for future cycle work. Because this changeset is thoroughly tested, I will look at removing this join in #3219 and re-test if that sounds ok to you.

Copy link
Contributor

@hcaofec hcaofec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested on local by running py.test. No errors.

@hcaofec hcaofec merged commit 056cedd into develop Jun 19, 2018
@hcaofec hcaofec deleted the feature/fix-special-candidate-totals branch June 20, 2018 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants