Verify high-profile 2020 candidate data on April 16 #2630

lbeaufort · 2019-01-29T15:18:40Z

So that users can find correct data on high-profile 2020 candidates, we should do a baseline assessment of high-profile data to make sure everything is working as expected.

Completion criteria:

Make a spreadsheet of high profile candidates to test as a baseline. Possibly looking up articles, analytics on frequented candidate pages, may want to use numbers of who is raising currently (only do 5 examples from each office)
Create a baseline assessment for each office to ensure accuracy, flow of how users are getting to this data. Are the numbers the same in all the places we display this data.

lbeaufort · 2019-01-29T18:25:13Z

https://www.fec.gov/data/candidates/president/?election_year=2020&cycle=2020&election_full=true

https://www.fec.gov/data/candidates/senate/?election_year=2020&cycle=2020&election_full=true

https://www.fec.gov/data/candidates/house/

lbeaufort · 2019-02-26T16:20:53Z

Here's the sheet I started before I was away: https://docs.google.com/spreadsheets/d/1mobo7zoaTY7GnE5ecHZRNY7JOtTUCJtZELkV3SGf0QY/edit?ouid=114351900669501389307&usp=sheets_home&ths=true

We ran into enough data issues that I wanted to wait to mark this as complete until I could cross-check all the new changes. I'll take another look.

lbeaufort · 2019-03-08T21:19:36Z

I'm finding some issues (see #fecgov/openFEC#3611) and still working on refining my script. Here's what I have so far, it's really rough:

import os
import requests

candidate_datatable = "/candidates/totals/?candidate_id={0}&election_year={1}&full_election={2}&api_key={3}&sort=-receipts"
candidate_profile = "/candidate/{0}/totals/?cycle={1}&full_election={2}&api_key={3}"
election_profile = "/elections/?candidate_id={0}&cycle={1}&election_full={2}&api_key={3}&office={4}&sort_nulls_last=true&sort=-total_receipts&per_page=100"
dev = 'https://fec-dev-api.app.cloud.gov'
prod = 'https://api.open.fec.gov'
api_key = os.environ.get("FEC_API_KEY")


def compare_endpoints():

    top_candidates = []
    candidate_url = "https://api.open.fec.gov/v1/candidates/totals/?api_key={}&sort_hide_null=false&sort_nulls_last=true&election_full=true&sort=-receipts&per_page=30&page=1".format(api_key)

    # 2020 Presidential
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=P").json().get('results'))

    # 2020 Senate
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=S").json().get('results'))

    # 2022 Senate
    top_candidates.extend(requests.get(candidate_url + "&election_year=2022&office=S").json().get('results'))

    # 2018 House
    top_candidates.extend(requests.get(candidate_url + "&election_year=2018&office=H").json().get('results'))

    mismatch_list = []

    for candidate in top_candidates:
        candidate_id = candidate.get('candidate_id')
        candidate_name = candidate.get('name')

        datatable_url = (prod + "/v1" + candidate_datatable).format(candidate_id, '2020', 'true', api_key)
        candidate_profile_url = (prod + "/v1" + candidate_profile).format(candidate_id, '2020', 'true', api_key)
        election_profile_url = (prod + "/v1" + election_profile).format(candidate_id, '2020', 'true', api_key, candidate.get('office_full').lower())

        # Add state for House/Senate and District for House
        if candidate.get('office') != "P":
            election_profile_url.append("&state={}".format(candidate.get('state')))
        if candidate.get('office') == "H":
            election_profile_url.append("&district={}".format(candidate.get('district')))

        print("\nChecking: {}, {}".format(candidate_id, candidate_name))
        print("\ndatatable_url")
        print(datatable_url)
        print("\ncandidate_profile_url")
        print(candidate_profile_url)
        print("\nelection_profile_url")
        print(election_profile_url)

        datatable_result = requests.get(datatable_url).json().get('results')
        candidate_result = requests.get(candidate_profile_url).json().get('results')
        election_result = requests.get(election_profile_url).json().get('results')

        for result in election_result:
            if result.get('candidate_id') == candidate_id:
                election_candidate_result = [result]
                break
            else:
                election_candidate_result = None

        if datatable_result is None or candidate_result is None or election_candidate_result is None:
            print("ERROR: No results for one endpoint")
            print("Candidate datatable results: {}".format(datatable_result is None))
            print("Candidate profile page results: {}".format(candidate_result is None))
            print("Election profile page results: {}".format(election_candidate_result is None))
            mismatch_list.append((candidate_id, candidate_name))

        # Take top totals result - some have more than one 2-year period
        datatable_result = [datatable_result[0]]

        if not len(datatable_result) == len(candidate_result) == len(election_candidate_result):
            print("!!! ERROR - results lengths don't match!!!")
            print("Candidate datatable: {}\n".format(datatable_result))
            print("Candidate profile page: {}\n".format(candidate_result))
            print("Election profile page: {}\n".format(election_candidate_result))
            mismatch_list.append((candidate_id, candidate_name))
            break

        # Compare receipts, disbursements, and COH

        if not datatable_result[0].get('receipts') == candidate_result[0].get('receipts') == election_candidate_result[0].get('total_receipts'):
            print("!!! ERROR - results don't match!!!")
            print("Candidate datatable: {}".format(datatable_result[0].get('receipts')))
            print("Candidate profile page: {}".format(candidate_result[0].get('receipts')))
            print("Election profile page: {}".format(election_candidate_result[0].get('total_receipts')))
            mismatch_list.append((candidate_id, candidate_name))

        if not datatable_result[0].get('disbursements') == candidate_result[0].get('disbursements') == election_candidate_result[0].get('total_disbursements'):
            print("!!! ERROR - results don't match!!!")
            print("Candidate datatable: {}".format(datatable_result[0].get('disbursements')))
            print("Candidate profile page: {}".format(candidate_result[0].get('disbursements')))
            print("Election profile page: {}".format(election_candidate_result[0].get('total_disbursements')))
            mismatch_list.append((candidate_id, candidate_name))

        print("Mismatch list: {}".format(mismatch_list))


if __name__ == "__main__":
    compare_endpoints()

lbeaufort · 2019-03-15T19:30:15Z

New version. We should run this again after Q1's start coming in.

import os
import requests

candidate_datatable = "/candidates/totals/?candidate_id={0}&election_year={1}&election_full={2}&api_key={3}&sort=-receipts"
candidate_profile = "/candidate/{0}/totals/?cycle={1}&api_key={2}"
election_profile = "/elections/?candidate_id={0}&cycle={1}&election_full={2}&api_key={3}&office={4}&sort_nulls_last=true&sort=-total_receipts&per_page=100"
dev = 'https://fec-dev-api.app.cloud.gov'
prod = 'https://api.open.fec.gov'
api_key = os.environ.get("FEC_API_KEY")


def compare_endpoints():

    top_candidates = []
    candidate_url = "https://api.open.fec.gov/v1/candidates/totals/?api_key={0}&sort_hide_null=false&sort_nulls_last=true&is_active_candidate=True&election_full=true&sort=-receipts&page=1".format(api_key)

    # Top 20 Presidential 2020
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=P&per_page=20").json().get('results'))
    # Top 30 Senate 2020 
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=S&per_page=30").json().get('results'))
    # Top 30 Senate 2022 
    top_candidates.extend(requests.get(candidate_url + "&election_year=2022&office=S&per_page=30").json().get('results'))
    
    # Top 30 Senate 2024 - check after Q1 2019 
    # top_candidates.extend(requests.get(candidate_url + "&election_year=2024&office=S").json().get('results'))
    
    # Top 100 House 2018
    top_candidates.extend(requests.get(candidate_url + "&election_year=2018&office=H&per_page=100").json().get('results'))

    mismatch_list = set([])

    for candidate in top_candidates:
        candidate_id = candidate.get('candidate_id')
        candidate_name = candidate.get('name')
        candidate_election = candidate.get('election_year')

        datatable_url = (prod + "/v1" + candidate_datatable).format(candidate_id, candidate_election, 'true', api_key)
        candidate_profile_url = (prod + "/v1" + candidate_profile).format(candidate_id, candidate_election, api_key)
        election_profile_url = (prod + "/v1" + election_profile).format(candidate_id, candidate_election, 'true', api_key, candidate.get('office_full').lower())

        # show full election for S, P 
        if candidate.get('office') in ("S", "P"):
            candidate_profile_url += "&full_election=True"
        else:
            candidate_profile_url += "&full_election=False"

        # Add state for House/Senate and District for House
        if candidate.get('office') != "P":
            election_profile_url += "&state={}&election_full={}".format(candidate.get('state'), 'True')
        if candidate.get('office') == "H":
            election_profile_url += "&district={}&election_full={}".format(candidate.get('district'), 'False')

        print("\n*********Checking: {}, {}************\n".format(candidate_id, candidate_name))

        print("\ndatatable_url")
        print(datatable_url)
        print("\ncandidate_profile_url")
        print(candidate_profile_url)
        print("\nelection_profile_url")
        print(election_profile_url)

        datatable_result = requests.get(datatable_url).json().get('results')
        candidate_result = requests.get(candidate_profile_url).json().get('results')
        election_result = requests.get(election_profile_url).json().get('results')

        for result in election_result:
            if result.get('candidate_id') == candidate_id:
                election_candidate_result = [result]
                break
            else:
                election_candidate_result = None

        try:
            # Take top totals result - some have more than one 2-year period
            datatable_result = [datatable_result[0]]
        except:
            pass

        if datatable_result is None or candidate_result is None or election_candidate_result is None:
            print("\nERROR: No results for one endpoint")
            print("Candidate datatable has results? {}".format(datatable_result is None))
            print("Candidate profile page has results? {}".format(candidate_result is None))
            print("Election profile page has results? {}".format(election_candidate_result is None))
            mismatch_list.add((candidate_id, candidate_name))
        else:
            if not len(datatable_result) == len(candidate_result) == len(election_candidate_result):
                print("\n!!! ERROR - results lengths don't match!!!")
                print("Candidate datatable: {}\n".format(datatable_result))
                print("Candidate profile page: {}\n".format(candidate_result))
                print("Election profile page: {}\n".format(election_candidate_result))
                mismatch_list.add((candidate_id, candidate_name))
                break
            # Compare receipts, disbursements, and COH
            if not datatable_result[0].get('receipts') == candidate_result[0].get('receipts') == election_candidate_result[0].get('total_receipts'):
                print("\n!!! ERROR - results don't match!!!")
                print("Candidate datatable: {}".format(datatable_result[0].get('receipts')))
                print("Candidate profile page: {}".format(candidate_result[0].get('receipts')))
                print("Election profile page: {}".format(election_candidate_result[0].get('total_receipts')))
                mismatch_list.add((candidate_id, candidate_name))

            if not datatable_result[0].get('disbursements') == candidate_result[0].get('disbursements') == election_candidate_result[0].get('total_disbursements'):
                print("\n!!! ERROR - disbursement results don't match!!!")
                print("Candidate datatable: {}".format(datatable_result[0].get('disbursements')))
                print("Candidate profile page: {}".format(candidate_result[0].get('disbursements')))
                print("Election profile page: {}".format(election_candidate_result[0].get('total_disbursements')))
                mismatch_list.add((candidate_id, candidate_name))

            # Check COH
            if not datatable_result[0].get('cash_on_hand_end_period') == candidate_result[0].get('last_cash_on_hand_end_period') == election_candidate_result[0].get('cash_on_hand_end_period'):
                print("\n!!! ERROR - COH results don't match!!!")
                print("Candidate datatable: {}".format(datatable_result[0].get('cash_on_hand_end_period')))
                print("Candidate profile page: {}".format(candidate_result[0].get('last_cash_on_hand_end_period')))
                print("Election profile page: {}".format(election_candidate_result[0].get('cash_on_hand_end_period')))
                mismatch_list.add((candidate_id, candidate_name))

        print("\nMismatch list: {}".format(mismatch_list))


if __name__ == "__main__":
    compare_endpoints()

lbeaufort · 2019-04-18T18:48:10Z

Found the following issues:
fecgov/openFEC#3611
fecgov/openFEC#3700
fecgov/openFEC#3709
fecgov/openFEC#3701

lbeaufort · 2019-04-18T18:48:44Z

Newest, improved version of script, now with redacted API keys and markdown charts:

import os
import requests

candidate_datatable = "/candidates/totals/?candidate_id={0}&election_year={1}&election_full={2}&api_key={3}&sort=-receipts"
candidate_profile = "/candidate/{0}/totals/?cycle={1}&api_key={2}"
election_profile = "/elections/?candidate_id={0}&cycle={1}&election_full={2}&api_key={3}&office={4}&sort_nulls_last=true&sort=-total_receipts&per_page=100"
dev = 'https://fec-dev-api.app.cloud.gov'
prod = 'https://api.open.fec.gov'
api_key = os.environ.get("FEC_API_KEY")


def compare_endpoints():

    top_candidates = []
    candidate_url = "https://api.open.fec.gov/v1/candidates/totals/?api_key={0}&sort_hide_null=false&sort_nulls_last=true&is_active_candidate=True&election_full=true&sort=-receipts&page=1".format(api_key)

    # Top 20 Presidential 2020
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=P&per_page=20").json().get('results'))
    # Top 30 Senate 2020
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=S&per_page=30").json().get('results'))
    # Top 30 Senate 2022
    top_candidates.extend(requests.get(candidate_url + "&election_year=2022&office=S&per_page=30").json().get('results'))

    # Top 30 Senate 2024 - check after Q1 2019
    top_candidates.extend(requests.get(candidate_url + "&election_year=2024&office=S").json().get('results'))

    # Top 100 House 2020
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=H&per_page=100").json().get('results'))

    mismatch_list = set([])

    for candidate in top_candidates:
        candidate_id = candidate.get('candidate_id')
        candidate_name = candidate.get('name')
        candidate_election = candidate.get('election_year')

        datatable_url = (prod + "/v1" + candidate_datatable).format(candidate_id, candidate_election, 'true', api_key)
        candidate_profile_url = (prod + "/v1" + candidate_profile).format(candidate_id, candidate_election, api_key)
        election_profile_url = (prod + "/v1" + election_profile).format(candidate_id, candidate_election, 'true', api_key, candidate.get('office_full').lower())

        # show full election for S, P
        if candidate.get('office') in ("S", "P"):
            candidate_profile_url += "&full_election=True"
        else:
            candidate_profile_url += "&full_election=False"

        # Add state for House/Senate and District for House
        if candidate.get('office') != "P":
            election_profile_url += "&state={}&election_full={}".format(candidate.get('state'), 'True')
        if candidate.get('office') == "H":
            election_profile_url += "&district={}&election_full={}".format(candidate.get('district'), 'False')

        print("\n*********Checking: {}, {}************\n".format(candidate_id, candidate_name))


        datatable_url_printable = datatable_url.replace(api_key, 'DEMO_KEY')
        candidate_profile_url_printable = candidate_profile_url.replace(api_key, 'DEMO_KEY')
        election_profile_url_printable = election_profile_url.replace(api_key, 'DEMO_KEY')

        print("\nCandidate totals datatable url: {}".format(datatable_url_printable))
        print("\nCandidate profile page totals url: {}".format(candidate_profile_url_printable))
        print("\n Election profile page totals url: {}".format(election_profile_url_printable))

        datatable_result = requests.get(datatable_url).json().get('results')
        candidate_result = requests.get(candidate_profile_url).json().get('results')
        election_result = requests.get(election_profile_url).json().get('results')

        for result in election_result:
            if result.get('candidate_id') == candidate_id:
                election_candidate_result = [result]
                break
            else:
                election_candidate_result = None

        try:
            # Take top totals result - some have more than one 2-year period
            datatable_result = [datatable_result[0]]
        except:
            pass

        if datatable_result is None or candidate_result is None or election_candidate_result is None:
            print("\nERROR: No results for one endpoint")
            print("Candidate datatable has results? {}".format(datatable_result is None))
            print("Candidate profile page has results? {}".format(candidate_result is None))
            print("Election profile page has results? {}".format(election_candidate_result is None))
            mismatch_list.add((candidate_id, candidate_name))
        else:
            if not len(datatable_result) == len(candidate_result) == len(election_candidate_result):
                print("\n!!! ERROR - results lengths don't match!!!")
                print("Candidate datatable: {}\n".format(len(datatable_result)))
                print("Candidate profile page: {}\n".format(len(candidate_result)))
                print("Election profile page: {}\n".format(len(election_candidate_result)))
                mismatch_list.add((candidate_id, candidate_name))
                break
            # Compare receipts, disbursements, and COH
            if not datatable_result[0].get('receipts') == candidate_result[0].get('receipts') == election_candidate_result[0].get('total_receipts'):
                print("\n!!! ERROR - contribution results don't match!!!\n")
                print("| Data source | Total receipts |\n|--|--|")
                print("| Candidate datatable \t\t|\t${:,.2f}|".format(datatable_result[0].get('receipts')))
                print("| Candidate profile page \t|\t${:,.2f}|".format(candidate_result[0].get('receipts')))
                print("| Election profile page \t|\t${:,.2f}|".format(election_candidate_result[0].get('total_receipts')))
                mismatch_list.add((candidate_id, candidate_name))

            if not datatable_result[0].get('disbursements') == candidate_result[0].get('disbursements') == election_candidate_result[0].get('total_disbursements'):
                print("\n!!! ERROR - disbursement results don't match!!! \n")
                print("| Data source | Total disbursements |\n|--|--|")
                print("| Candidate datatable \t\t|\t${:,.2f} |".format(datatable_result[0].get('disbursements')))
                print("| Candidate profile page \t|\t${:,.2f} |".format(candidate_result[0].get('disbursements')))
                print("| Election profile page \t|\t${:,.2f} |".format(election_candidate_result[0].get('total_disbursements')))
                mismatch_list.add((candidate_id, candidate_name))

            # Check COH
            if not datatable_result[0].get('cash_on_hand_end_period') == candidate_result[0].get('last_cash_on_hand_end_period') == election_candidate_result[0].get('cash_on_hand_end_period'):
                print("\n!!! ERROR - COH results don't match!!!\n")
                print("| Data source | Cash on hand |\n|--|--|")
                print("| Candidate datatable \t\t|\t${:,.2f} |".format(datatable_result[0].get('cash_on_hand_end_period')))
                print("| Candidate profile page \t|\t${:,.2f} |".format(candidate_result[0].get('last_cash_on_hand_end_period')))
                print("| Election profile page \t|\t${:,.2f} |".format(election_candidate_result[0].get('cash_on_hand_end_period')))
                mismatch_list.add((candidate_id, candidate_name))

        print("\nMismatch list: {}".format(mismatch_list))


if __name__ == "__main__":
    compare_endpoints()

lbeaufort added this to the Sprint 8.1 milestone Jan 29, 2019

patphongs assigned lbeaufort Jan 30, 2019

patphongs modified the milestones: Sprint 8.1, Sprint 8.2 Feb 12, 2019

patphongs modified the milestones: Sprint 8.2, Sprint 8.3 Mar 4, 2019

JonellaCulmer modified the milestones: Sprint 8.3, Sprint 8.4 Mar 14, 2019

lbeaufort modified the milestones: Sprint 8.4, Sprint 8.5 Mar 15, 2019

lbeaufort changed the title ~~Verify high-profile 2020 candidate data~~ Verify high-profile 2020 candidate data after Q1 Mar 15, 2019

JonellaCulmer changed the title ~~Verify high-profile 2020 candidate data after Q1~~ Verify high-profile 2020 candidate data on April 16 Mar 22, 2019

JonellaCulmer modified the milestones: Sprint 8.5, Sprint 8.6 Mar 22, 2019

JonellaCulmer added the High priority label Mar 22, 2019

JonellaCulmer mentioned this issue Apr 11, 2019

[High Priority] Turn on bar charts feature flag on homepage #2828

Closed

3 tasks

lbeaufort closed this as completed Apr 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify high-profile 2020 candidate data on April 16 #2630

Verify high-profile 2020 candidate data on April 16 #2630

lbeaufort commented Jan 29, 2019 •

edited by patphongs

Loading

lbeaufort commented Jan 29, 2019

lbeaufort commented Feb 26, 2019

lbeaufort commented Mar 8, 2019 •

edited

Loading

lbeaufort commented Mar 15, 2019

lbeaufort commented Apr 18, 2019

lbeaufort commented Apr 18, 2019 •

edited

Loading

Verify high-profile 2020 candidate data on April 16 #2630

Verify high-profile 2020 candidate data on April 16 #2630

Comments

lbeaufort commented Jan 29, 2019 • edited by patphongs Loading

lbeaufort commented Jan 29, 2019

lbeaufort commented Feb 26, 2019

lbeaufort commented Mar 8, 2019 • edited Loading

lbeaufort commented Mar 15, 2019

lbeaufort commented Apr 18, 2019

lbeaufort commented Apr 18, 2019 • edited Loading

lbeaufort commented Jan 29, 2019 •

edited by patphongs

Loading

lbeaufort commented Mar 8, 2019 •

edited

Loading

lbeaufort commented Apr 18, 2019 •

edited

Loading