Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify high-profile 2020 candidate data on April 16 #2630

Closed
2 tasks
lbeaufort opened this issue Jan 29, 2019 · 6 comments
Closed
2 tasks

Verify high-profile 2020 candidate data on April 16 #2630

lbeaufort opened this issue Jan 29, 2019 · 6 comments
Assignees
Milestone

Comments

@lbeaufort
Copy link
Member

lbeaufort commented Jan 29, 2019

So that users can find correct data on high-profile 2020 candidates, we should do a baseline assessment of high-profile data to make sure everything is working as expected.

Completion criteria:

  • Make a spreadsheet of high profile candidates to test as a baseline. Possibly looking up articles, analytics on frequented candidate pages, may want to use numbers of who is raising currently (only do 5 examples from each office)
  • Create a baseline assessment for each office to ensure accuracy, flow of how users are getting to this data. Are the numbers the same in all the places we display this data.
@lbeaufort lbeaufort added this to the Sprint 8.1 milestone Jan 29, 2019
@lbeaufort
Copy link
Member Author

Here's the sheet I started before I was away: https://docs.google.com/spreadsheets/d/1mobo7zoaTY7GnE5ecHZRNY7JOtTUCJtZELkV3SGf0QY/edit?ouid=114351900669501389307&usp=sheets_home&ths=true

We ran into enough data issues that I wanted to wait to mark this as complete until I could cross-check all the new changes. I'll take another look.

@patphongs patphongs modified the milestones: Sprint 8.2, Sprint 8.3 Mar 4, 2019
@lbeaufort
Copy link
Member Author

lbeaufort commented Mar 8, 2019

I'm finding some issues (see #fecgov/openFEC#3611) and still working on refining my script. Here's what I have so far, it's really rough:

import os
import requests

candidate_datatable = "/candidates/totals/?candidate_id={0}&election_year={1}&full_election={2}&api_key={3}&sort=-receipts"
candidate_profile = "/candidate/{0}/totals/?cycle={1}&full_election={2}&api_key={3}"
election_profile = "/elections/?candidate_id={0}&cycle={1}&election_full={2}&api_key={3}&office={4}&sort_nulls_last=true&sort=-total_receipts&per_page=100"
dev = 'https://fec-dev-api.app.cloud.gov'
prod = 'https://api.open.fec.gov'
api_key = os.environ.get("FEC_API_KEY")


def compare_endpoints():

    top_candidates = []
    candidate_url = "https://api.open.fec.gov/v1/candidates/totals/?api_key={}&sort_hide_null=false&sort_nulls_last=true&election_full=true&sort=-receipts&per_page=30&page=1".format(api_key)

    # 2020 Presidential
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=P").json().get('results'))

    # 2020 Senate
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=S").json().get('results'))

    # 2022 Senate
    top_candidates.extend(requests.get(candidate_url + "&election_year=2022&office=S").json().get('results'))

    # 2018 House
    top_candidates.extend(requests.get(candidate_url + "&election_year=2018&office=H").json().get('results'))

    mismatch_list = []

    for candidate in top_candidates:
        candidate_id = candidate.get('candidate_id')
        candidate_name = candidate.get('name')

        datatable_url = (prod + "/v1" + candidate_datatable).format(candidate_id, '2020', 'true', api_key)
        candidate_profile_url = (prod + "/v1" + candidate_profile).format(candidate_id, '2020', 'true', api_key)
        election_profile_url = (prod + "/v1" + election_profile).format(candidate_id, '2020', 'true', api_key, candidate.get('office_full').lower())

        # Add state for House/Senate and District for House
        if candidate.get('office') != "P":
            election_profile_url.append("&state={}".format(candidate.get('state')))
        if candidate.get('office') == "H":
            election_profile_url.append("&district={}".format(candidate.get('district')))

        print("\nChecking: {}, {}".format(candidate_id, candidate_name))
        print("\ndatatable_url")
        print(datatable_url)
        print("\ncandidate_profile_url")
        print(candidate_profile_url)
        print("\nelection_profile_url")
        print(election_profile_url)

        datatable_result = requests.get(datatable_url).json().get('results')
        candidate_result = requests.get(candidate_profile_url).json().get('results')
        election_result = requests.get(election_profile_url).json().get('results')

        for result in election_result:
            if result.get('candidate_id') == candidate_id:
                election_candidate_result = [result]
                break
            else:
                election_candidate_result = None

        if datatable_result is None or candidate_result is None or election_candidate_result is None:
            print("ERROR: No results for one endpoint")
            print("Candidate datatable results: {}".format(datatable_result is None))
            print("Candidate profile page results: {}".format(candidate_result is None))
            print("Election profile page results: {}".format(election_candidate_result is None))
            mismatch_list.append((candidate_id, candidate_name))

        # Take top totals result - some have more than one 2-year period
        datatable_result = [datatable_result[0]]

        if not len(datatable_result) == len(candidate_result) == len(election_candidate_result):
            print("!!! ERROR - results lengths don't match!!!")
            print("Candidate datatable: {}\n".format(datatable_result))
            print("Candidate profile page: {}\n".format(candidate_result))
            print("Election profile page: {}\n".format(election_candidate_result))
            mismatch_list.append((candidate_id, candidate_name))
            break

        # Compare receipts, disbursements, and COH

        if not datatable_result[0].get('receipts') == candidate_result[0].get('receipts') == election_candidate_result[0].get('total_receipts'):
            print("!!! ERROR - results don't match!!!")
            print("Candidate datatable: {}".format(datatable_result[0].get('receipts')))
            print("Candidate profile page: {}".format(candidate_result[0].get('receipts')))
            print("Election profile page: {}".format(election_candidate_result[0].get('total_receipts')))
            mismatch_list.append((candidate_id, candidate_name))

        if not datatable_result[0].get('disbursements') == candidate_result[0].get('disbursements') == election_candidate_result[0].get('total_disbursements'):
            print("!!! ERROR - results don't match!!!")
            print("Candidate datatable: {}".format(datatable_result[0].get('disbursements')))
            print("Candidate profile page: {}".format(candidate_result[0].get('disbursements')))
            print("Election profile page: {}".format(election_candidate_result[0].get('total_disbursements')))
            mismatch_list.append((candidate_id, candidate_name))

        print("Mismatch list: {}".format(mismatch_list))


if __name__ == "__main__":
    compare_endpoints()


@JonellaCulmer JonellaCulmer modified the milestones: Sprint 8.3, Sprint 8.4 Mar 14, 2019
@lbeaufort
Copy link
Member Author

New version. We should run this again after Q1's start coming in.

import os
import requests

candidate_datatable = "/candidates/totals/?candidate_id={0}&election_year={1}&election_full={2}&api_key={3}&sort=-receipts"
candidate_profile = "/candidate/{0}/totals/?cycle={1}&api_key={2}"
election_profile = "/elections/?candidate_id={0}&cycle={1}&election_full={2}&api_key={3}&office={4}&sort_nulls_last=true&sort=-total_receipts&per_page=100"
dev = 'https://fec-dev-api.app.cloud.gov'
prod = 'https://api.open.fec.gov'
api_key = os.environ.get("FEC_API_KEY")


def compare_endpoints():

    top_candidates = []
    candidate_url = "https://api.open.fec.gov/v1/candidates/totals/?api_key={0}&sort_hide_null=false&sort_nulls_last=true&is_active_candidate=True&election_full=true&sort=-receipts&page=1".format(api_key)

    # Top 20 Presidential 2020
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=P&per_page=20").json().get('results'))
    # Top 30 Senate 2020 
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=S&per_page=30").json().get('results'))
    # Top 30 Senate 2022 
    top_candidates.extend(requests.get(candidate_url + "&election_year=2022&office=S&per_page=30").json().get('results'))
    
    # Top 30 Senate 2024 - check after Q1 2019 
    # top_candidates.extend(requests.get(candidate_url + "&election_year=2024&office=S").json().get('results'))
    
    # Top 100 House 2018
    top_candidates.extend(requests.get(candidate_url + "&election_year=2018&office=H&per_page=100").json().get('results'))

    mismatch_list = set([])

    for candidate in top_candidates:
        candidate_id = candidate.get('candidate_id')
        candidate_name = candidate.get('name')
        candidate_election = candidate.get('election_year')

        datatable_url = (prod + "/v1" + candidate_datatable).format(candidate_id, candidate_election, 'true', api_key)
        candidate_profile_url = (prod + "/v1" + candidate_profile).format(candidate_id, candidate_election, api_key)
        election_profile_url = (prod + "/v1" + election_profile).format(candidate_id, candidate_election, 'true', api_key, candidate.get('office_full').lower())

        # show full election for S, P 
        if candidate.get('office') in ("S", "P"):
            candidate_profile_url += "&full_election=True"
        else:
            candidate_profile_url += "&full_election=False"

        # Add state for House/Senate and District for House
        if candidate.get('office') != "P":
            election_profile_url += "&state={}&election_full={}".format(candidate.get('state'), 'True')
        if candidate.get('office') == "H":
            election_profile_url += "&district={}&election_full={}".format(candidate.get('district'), 'False')

        print("\n*********Checking: {}, {}************\n".format(candidate_id, candidate_name))

        print("\ndatatable_url")
        print(datatable_url)
        print("\ncandidate_profile_url")
        print(candidate_profile_url)
        print("\nelection_profile_url")
        print(election_profile_url)

        datatable_result = requests.get(datatable_url).json().get('results')
        candidate_result = requests.get(candidate_profile_url).json().get('results')
        election_result = requests.get(election_profile_url).json().get('results')

        for result in election_result:
            if result.get('candidate_id') == candidate_id:
                election_candidate_result = [result]
                break
            else:
                election_candidate_result = None

        try:
            # Take top totals result - some have more than one 2-year period
            datatable_result = [datatable_result[0]]
        except:
            pass

        if datatable_result is None or candidate_result is None or election_candidate_result is None:
            print("\nERROR: No results for one endpoint")
            print("Candidate datatable has results? {}".format(datatable_result is None))
            print("Candidate profile page has results? {}".format(candidate_result is None))
            print("Election profile page has results? {}".format(election_candidate_result is None))
            mismatch_list.add((candidate_id, candidate_name))
        else:
            if not len(datatable_result) == len(candidate_result) == len(election_candidate_result):
                print("\n!!! ERROR - results lengths don't match!!!")
                print("Candidate datatable: {}\n".format(datatable_result))
                print("Candidate profile page: {}\n".format(candidate_result))
                print("Election profile page: {}\n".format(election_candidate_result))
                mismatch_list.add((candidate_id, candidate_name))
                break
            # Compare receipts, disbursements, and COH
            if not datatable_result[0].get('receipts') == candidate_result[0].get('receipts') == election_candidate_result[0].get('total_receipts'):
                print("\n!!! ERROR - results don't match!!!")
                print("Candidate datatable: {}".format(datatable_result[0].get('receipts')))
                print("Candidate profile page: {}".format(candidate_result[0].get('receipts')))
                print("Election profile page: {}".format(election_candidate_result[0].get('total_receipts')))
                mismatch_list.add((candidate_id, candidate_name))

            if not datatable_result[0].get('disbursements') == candidate_result[0].get('disbursements') == election_candidate_result[0].get('total_disbursements'):
                print("\n!!! ERROR - disbursement results don't match!!!")
                print("Candidate datatable: {}".format(datatable_result[0].get('disbursements')))
                print("Candidate profile page: {}".format(candidate_result[0].get('disbursements')))
                print("Election profile page: {}".format(election_candidate_result[0].get('total_disbursements')))
                mismatch_list.add((candidate_id, candidate_name))

            # Check COH
            if not datatable_result[0].get('cash_on_hand_end_period') == candidate_result[0].get('last_cash_on_hand_end_period') == election_candidate_result[0].get('cash_on_hand_end_period'):
                print("\n!!! ERROR - COH results don't match!!!")
                print("Candidate datatable: {}".format(datatable_result[0].get('cash_on_hand_end_period')))
                print("Candidate profile page: {}".format(candidate_result[0].get('last_cash_on_hand_end_period')))
                print("Election profile page: {}".format(election_candidate_result[0].get('cash_on_hand_end_period')))
                mismatch_list.add((candidate_id, candidate_name))

        print("\nMismatch list: {}".format(mismatch_list))


if __name__ == "__main__":
    compare_endpoints()

@lbeaufort lbeaufort modified the milestones: Sprint 8.4, Sprint 8.5 Mar 15, 2019
@lbeaufort lbeaufort changed the title Verify high-profile 2020 candidate data Verify high-profile 2020 candidate data after Q1 Mar 15, 2019
@JonellaCulmer JonellaCulmer changed the title Verify high-profile 2020 candidate data after Q1 Verify high-profile 2020 candidate data on April 16 Mar 22, 2019
@JonellaCulmer JonellaCulmer modified the milestones: Sprint 8.5, Sprint 8.6 Mar 22, 2019
@lbeaufort
Copy link
Member Author

@lbeaufort
Copy link
Member Author

lbeaufort commented Apr 18, 2019

Newest, improved version of script, now with redacted API keys and markdown charts:

import os
import requests

candidate_datatable = "/candidates/totals/?candidate_id={0}&election_year={1}&election_full={2}&api_key={3}&sort=-receipts"
candidate_profile = "/candidate/{0}/totals/?cycle={1}&api_key={2}"
election_profile = "/elections/?candidate_id={0}&cycle={1}&election_full={2}&api_key={3}&office={4}&sort_nulls_last=true&sort=-total_receipts&per_page=100"
dev = 'https://fec-dev-api.app.cloud.gov'
prod = 'https://api.open.fec.gov'
api_key = os.environ.get("FEC_API_KEY")


def compare_endpoints():

    top_candidates = []
    candidate_url = "https://api.open.fec.gov/v1/candidates/totals/?api_key={0}&sort_hide_null=false&sort_nulls_last=true&is_active_candidate=True&election_full=true&sort=-receipts&page=1".format(api_key)

    # Top 20 Presidential 2020
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=P&per_page=20").json().get('results'))
    # Top 30 Senate 2020
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=S&per_page=30").json().get('results'))
    # Top 30 Senate 2022
    top_candidates.extend(requests.get(candidate_url + "&election_year=2022&office=S&per_page=30").json().get('results'))

    # Top 30 Senate 2024 - check after Q1 2019
    top_candidates.extend(requests.get(candidate_url + "&election_year=2024&office=S").json().get('results'))

    # Top 100 House 2020
    top_candidates.extend(requests.get(candidate_url + "&election_year=2020&office=H&per_page=100").json().get('results'))

    mismatch_list = set([])

    for candidate in top_candidates:
        candidate_id = candidate.get('candidate_id')
        candidate_name = candidate.get('name')
        candidate_election = candidate.get('election_year')

        datatable_url = (prod + "/v1" + candidate_datatable).format(candidate_id, candidate_election, 'true', api_key)
        candidate_profile_url = (prod + "/v1" + candidate_profile).format(candidate_id, candidate_election, api_key)
        election_profile_url = (prod + "/v1" + election_profile).format(candidate_id, candidate_election, 'true', api_key, candidate.get('office_full').lower())

        # show full election for S, P
        if candidate.get('office') in ("S", "P"):
            candidate_profile_url += "&full_election=True"
        else:
            candidate_profile_url += "&full_election=False"

        # Add state for House/Senate and District for House
        if candidate.get('office') != "P":
            election_profile_url += "&state={}&election_full={}".format(candidate.get('state'), 'True')
        if candidate.get('office') == "H":
            election_profile_url += "&district={}&election_full={}".format(candidate.get('district'), 'False')

        print("\n*********Checking: {}, {}************\n".format(candidate_id, candidate_name))


        datatable_url_printable = datatable_url.replace(api_key, 'DEMO_KEY')
        candidate_profile_url_printable = candidate_profile_url.replace(api_key, 'DEMO_KEY')
        election_profile_url_printable = election_profile_url.replace(api_key, 'DEMO_KEY')

        print("\nCandidate totals datatable url: {}".format(datatable_url_printable))
        print("\nCandidate profile page totals url: {}".format(candidate_profile_url_printable))
        print("\n Election profile page totals url: {}".format(election_profile_url_printable))

        datatable_result = requests.get(datatable_url).json().get('results')
        candidate_result = requests.get(candidate_profile_url).json().get('results')
        election_result = requests.get(election_profile_url).json().get('results')

        for result in election_result:
            if result.get('candidate_id') == candidate_id:
                election_candidate_result = [result]
                break
            else:
                election_candidate_result = None

        try:
            # Take top totals result - some have more than one 2-year period
            datatable_result = [datatable_result[0]]
        except:
            pass

        if datatable_result is None or candidate_result is None or election_candidate_result is None:
            print("\nERROR: No results for one endpoint")
            print("Candidate datatable has results? {}".format(datatable_result is None))
            print("Candidate profile page has results? {}".format(candidate_result is None))
            print("Election profile page has results? {}".format(election_candidate_result is None))
            mismatch_list.add((candidate_id, candidate_name))
        else:
            if not len(datatable_result) == len(candidate_result) == len(election_candidate_result):
                print("\n!!! ERROR - results lengths don't match!!!")
                print("Candidate datatable: {}\n".format(len(datatable_result)))
                print("Candidate profile page: {}\n".format(len(candidate_result)))
                print("Election profile page: {}\n".format(len(election_candidate_result)))
                mismatch_list.add((candidate_id, candidate_name))
                break
            # Compare receipts, disbursements, and COH
            if not datatable_result[0].get('receipts') == candidate_result[0].get('receipts') == election_candidate_result[0].get('total_receipts'):
                print("\n!!! ERROR - contribution results don't match!!!\n")
                print("| Data source | Total receipts |\n|--|--|")
                print("| Candidate datatable \t\t|\t${:,.2f}|".format(datatable_result[0].get('receipts')))
                print("| Candidate profile page \t|\t${:,.2f}|".format(candidate_result[0].get('receipts')))
                print("| Election profile page \t|\t${:,.2f}|".format(election_candidate_result[0].get('total_receipts')))
                mismatch_list.add((candidate_id, candidate_name))

            if not datatable_result[0].get('disbursements') == candidate_result[0].get('disbursements') == election_candidate_result[0].get('total_disbursements'):
                print("\n!!! ERROR - disbursement results don't match!!! \n")
                print("| Data source | Total disbursements |\n|--|--|")
                print("| Candidate datatable \t\t|\t${:,.2f} |".format(datatable_result[0].get('disbursements')))
                print("| Candidate profile page \t|\t${:,.2f} |".format(candidate_result[0].get('disbursements')))
                print("| Election profile page \t|\t${:,.2f} |".format(election_candidate_result[0].get('total_disbursements')))
                mismatch_list.add((candidate_id, candidate_name))

            # Check COH
            if not datatable_result[0].get('cash_on_hand_end_period') == candidate_result[0].get('last_cash_on_hand_end_period') == election_candidate_result[0].get('cash_on_hand_end_period'):
                print("\n!!! ERROR - COH results don't match!!!\n")
                print("| Data source | Cash on hand |\n|--|--|")
                print("| Candidate datatable \t\t|\t${:,.2f} |".format(datatable_result[0].get('cash_on_hand_end_period')))
                print("| Candidate profile page \t|\t${:,.2f} |".format(candidate_result[0].get('last_cash_on_hand_end_period')))
                print("| Election profile page \t|\t${:,.2f} |".format(election_candidate_result[0].get('cash_on_hand_end_period')))
                mismatch_list.add((candidate_id, candidate_name))

        print("\nMismatch list: {}".format(mismatch_list))


if __name__ == "__main__":
    compare_endpoints()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants