Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test /schedule_a/by_state/by_candidate data #3777

Closed
lbeaufort opened this issue May 16, 2019 · 8 comments
Closed

Test /schedule_a/by_state/by_candidate data #3777

lbeaufort opened this issue May 16, 2019 · 8 comments

Comments

@lbeaufort
Copy link
Member

lbeaufort commented May 16, 2019

  • Change /schedules/schedule_a/by_state/by_candidate/ to have election_full=true capability
  • By state: cross-compare totals to other endpoints.
  • make sure default election election_full is false
  • Add tests.

Example: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?api_key=DEMO_KEY&candidate_id=P80001571&cycle=2020&per_page=99

@fecjjeng
Copy link
Contributor

@lbeaufort @PaulClark2 I don't see /candidates/totals/by_state endpoint. Do you mean the endpoint /schedules/schedule_a/by_state/by_candidate/? Or this is going to be a new endpoint?

@lbeaufort
Copy link
Member Author

@fecjjeng you are correct, can you update the issue?

@lbeaufort lbeaufort changed the title Change /candidates/totals/by_state to have election_full=true capability Change /schedules/schedule_a/by_state/by_candidate/ to have election_full=true capability May 20, 2019
@PaulClark2 PaulClark2 modified the milestones: Sprint 9.1, Sprint 9.2 May 20, 2019
@lbeaufort lbeaufort assigned lbeaufort and unassigned lbeaufort May 31, 2019
@dorothyyeager dorothyyeager assigned hcaofec and lbeaufort and unassigned hcaofec Jun 4, 2019
@lbeaufort
Copy link
Member Author

lbeaufort commented Jun 5, 2019

We need to be able to check these individual aggregate endpoints (by_state and by_size etc) so we should consider adding totals /schedule_a/individuals/by_candidate

one level up totals for
​/schedules​/schedule_a​/by_size​/by_candidate​/
​/schedules​/schedule_a​/by_state​/by_candidate​/
-> The breakdowns, when added up, should equal the totals

if feasible, we should be able to check this data for committees as well
committee/C00580100/schedules/schedule_a/by_state/
committee/C00580100/schedules/schedule_a/by_size/
committee/C00580100/schedules/schedule_a/by_employer/
committee/C00580100/schedules/schedule_a/by_occupation/
-> The breakdowns, when added up, should equal the totals

Individual contributions won't match financial summary line 11a because transfers (line 12) can contain individual contributions.

  • What is our source of truth? Just sum them up? /schedules/schedule_a/?is_individual=true? Add unitemized totals? We need to be able to cross-compare them.
  • Do we call it ​/schedules​/schedule_a​/by_state​/by_candidate​/totals and ​/schedules​/schedule_a​/by_size​/by_candidate​/totals?

@lbeaufort
Copy link
Member Author

lbeaufort commented Jun 5, 2019

Update - it looks like this endpoint already had election_full capability - I'm going to work on adding a totals endpoint and QAing the data.

Examples:

Election full = False
https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?api_key=DEMO_KEY&candidate_id=P80001571&cycle=2020&per_page=99&election_full=false&sort=-total
Florida is top state with $1,442,308.51

Election full = True
https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?api_key=DEMO_KEY&candidate_id=P80001571&cycle=2020&per_page=99&election_full=true&sort=-total
Texas is top state with:$2,740,405.08

@lbeaufort lbeaufort changed the title Change /schedules/schedule_a/by_state/by_candidate/ to have election_full=true capability Test /schedule_a/by_state/by_candidate data Jun 5, 2019
@lbeaufort
Copy link
Member Author

lbeaufort commented Jun 10, 2019

Findings:

  1. Most aggregate totals look good
  2. Senate candidates seem to have more than one 2020 full election row for by_size but the totals add up.
  3. Mismatch for two house candidates. It looks like these candidates have filed pre-special election reports.

**** Checking: H8NC09123, MCCREADY, DANIEL in prod. Election year is 2020. ****

By size: https://api.open.fec.gov/v1/schedules/schedule_a/by_size/by_candidate/?candidate_id=H8NC09123&cycle=2020&election_full=False&per_page=20&api_key=DEMO_KEY

By state: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id=H8NC09123&cycle=2020&election_full=False&api_key=DEMO_KEY

Candidate ID totals: https://api.open.fec.gov/v1/candidate/H8NC09123/totals?cycle=2020&full_election=False&api_key=DEMO_KEY
unitemized total: $707,665.35
by_size_total: $2,968,516.75
by_state_total: $2,260,851.40

**** Checking: H0PA12181, KELLER, FRED in prod. Election year is 2020. ****

By size: https://api.open.fec.gov/v1/schedules/schedule_a/by_size/by_candidate/?candidate_id=H0PA12181&cycle=2020&election_full=False&per_page=20&api_key=DEMO_KEY

By state: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id=H0PA12181&cycle=2020&election_full=False&api_key=DEMO_KEY

Candidate ID totals: https://api.open.fec.gov/v1/candidate/H0PA12181/totals?cycle=2020&full_election=False&api_key=DEMO_KEY
unitemized total: $29,658.69
by_size_total: $487,400.49
by_state_total: $457,741.80
{('H8NC09123', 'MCCREADY, DANIEL'), ('H0PA12181', 'KELLER, FRED')}

Script

import os
import requests
import click

api_key = os.environ.get("FEC_API_KEY")

url_lookup = {
    "local": "http://localhost:5000",
    "dev": "https://fec-dev-api.app.cloud.gov",
    "stage": "https://api-stage.open.fec.gov",
    "prod": "https://api.open.fec.gov",
}
by_size_by_candidate = (
    "/schedules/schedule_a/by_size/by_candidate/?candidate_id={}&cycle={}&election_full={}&per_page=20&api_key=" + api_key
)
by_state_by_candidate = "/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id={}&cycle={}&election_full={}&api_key=" + api_key
# full_election, not election_full
candidate_id_totals = (
    "/candidate/{}/totals?cycle={}&full_election={}"
    "&api_key=" + api_key
)

# Click can't take lists as args - must be strings
@click.command()
@click.option(
    "--office-types", default="H, S, P", help="Which offices to check. Format as H,S"
)
@click.option("--year", default=2020, help="Start year")
@click.option("--candidate-id", default=None, help="Check one candidate")
def compare_candidate_aggs(office_types, year, candidate_id):

    mismatch_list = set([])
    env = 'prod'
    base_url = url_lookup[env]
    endpoints = ["by size", "by state", "candidate ID totals"]

    for candidate_info in get_top_candidates(office_types, year, candidate_id):

        candidate = Candidate(candidate_info)

        by_size_url, by_state_url, id_totals_url = candidate.get_urls(base_url)

        print(f"\n**** Checking: {candidate.id}, {candidate.name} in {env}. " \
              f"Election year is {candidate.election}. ****\n")

        print(f"\nBy size: {get_printable(by_size_url)}")
        print(f"\nBy state: {get_printable(by_state_url)}")
        print(f"\nCandidate ID totals: {get_printable(id_totals_url)}")

        by_size_results = get_results(by_size_url)
        by_state_results = get_results(by_state_url)
        id_totals_results = get_results(id_totals_url)

        unitemized = id_totals_results[0].get("individual_unitemized_contributions")
        by_size_total = sum(result.get("total") for result in by_size_results) - unitemized
        by_state_total = sum(result.get("total") for result in by_state_results)

        by_state_total = round(by_state_total, 2)
        by_size_total = round(by_size_total, 2)
        print("unitemized total: ${:,.2f}".format(unitemized))
        print("by_size_total:  ${:,.2f}".format(by_size_total))
        print("by_state_total: ${:,.2f}".format(by_state_total))

        if not (by_size_total == by_state_total):
            mismatch_list.add((candidate.id, candidate.name))
            print("Mismatch list: {}".format(mismatch_list))

    print("Mismatch list: {}".format(mismatch_list))

def get_top_candidates(office_types, start_year, candidate_id):

    top_candidates = []
    candidate_url = (
        "https://api.open.fec.gov/v1/candidates/totals/?"
        "sort_hide_null=false&sort_nulls_last=true&is_active_candidate=True"
        "&election_full=true&sort=-receipts&page=1&api_key=" + api_key
    )
    if candidate_id:
        candidate_url += f"&candidate_id={candidate_id}"
    if "P" in office_types.upper():
        # Top 20 presidential
        top_candidates.extend(
            get_results(
                candidate_url + f"&election_year={start_year}&office=P&per_page=20"
            )
        )
    if "S" in office_types.upper():
        # Top 30 senate, 3 cycles
        # Start year plus next two elections
        for year in range(start_year, start_year + 5, 2):
            top_candidates.extend(
                get_results(
                    candidate_url + f"&election_year={year}&office=S&per_page=30"
                )
            )
    if "H" in office_types.upper():
        # Top 100 house
        top_candidates.extend(
            get_results(
                candidate_url + f"&election_year={start_year}&office=H&per_page=100"
            )
        )

    return top_candidates


class Results(object):
    """docstring for Results"""

    def __init__(self, env):
        self.env = env
        self.result_type = None

    def set(self, result_type, value):
        # 0.00 is Falsy
        # Do any cleanup here
        self.result_type = value

    def get(self, result_type, value):
        return self.result_type.get(value)

class Candidate(object):
    """docstring for Candidate"""

    def __init__(self, candidate):
        self.id = candidate.get("candidate_id")
        self.name = candidate.get("name")
        self.election = candidate.get("election_year")
        self.office = candidate.get("office")
        self.office_full = candidate.get("office_full")
        self.state = candidate.get("state")
        self.district = candidate.get("district")
        if self.office.upper() == 'H':
            self.election_full = False
        else:
            self.election_full = True

    def get_urls(self, base_url):

        by_size_url = (
            base_url
            + "/v1"
            + by_size_by_candidate.format(self.id, self.election, self.election_full)
        )
        by_state_url = (
            base_url + "/v1" + by_state_by_candidate.format(self.id, self.election, self.election_full)
        )
        candidate_id_totals_url = (
            base_url
            + "/v1"
            + candidate_id_totals.format(
                self.id, self.election, self.election_full
            )
        )
        return by_size_url, by_state_url, candidate_id_totals_url


def get_printable(url):
    return url.replace(api_key, "DEMO_KEY")


def get_results(url):
    return requests.get(url).json().get("results")


if __name__ == "__main__":
    compare_candidate_aggs()


@PaulClark2
Copy link
Contributor

PA-12 had a special general on 5/21. NC-09 is scheduled to have a special election 9/10.

PA-12

  • Pre-general report due 5/9
  • Post-general report due 6/20

NC-09

  • Pre-primary report due 5/2
  • July quarterly report due 7/15
  • Pre-general report due 8/29
  • Post-general report due 10/10
  • October quarterly report waived

@lbeaufort
Copy link
Member Author

lbeaufort commented Jun 11, 2019

@PaulClark2 it looks like the difference is due to the unitemized totals - maybe it's double-counting? I need to investigate further.

**** Checking: H0PA12181, KELLER, FRED in prod. Election year is 2020. ****

By size: https://api.open.fec.gov/v1/schedules/schedule_a/by_size/by_candidate/?candidate_id=H0PA12181&cycle=2020&election_full=False&per_page=20&api_key=DEMO_KEY

By state: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id=H0PA12181&cycle=2020&election_full=False&api_key=DEMO_KEY

Candidate ID totals: https://api.open.fec.gov/v1/candidate/H0PA12181/totals?cycle=2020&full_election=False&api_key=DEMO_KEY

unitemized total: $29,658.69
raw by_size_total: $517,059.18
adjusted by_size_total: $487,400.49
by_state_total: $457,741.80
...
difference: $-29,658.69

**** Checking: H8NC09123, MCCREADY, DANIEL in prod. Election year is 2020. ****

By size: https://api.open.fec.gov/v1/schedules/schedule_a/by_size/by_candidate/?candidate_id=H8NC09123&cycle=2020&election_full=False&per_page=20&api_key=DEMO_KEY

By state: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id=H8NC09123&cycle=2020&election_full=False&api_key=DEMO_KEY

Candidate ID totals: https://api.open.fec.gov/v1/candidate/H8NC09123/totals?cycle=2020&full_election=False&api_key=DEMO_KEY

unitemized total: $707,665.35
raw by_size_total: $3,676,182.10
adjusted by_size_total: $2,968,516.75
by_state_total: $2,260,851.40
...
difference: $-707,665.35

@lbeaufort
Copy link
Member Author

I split the remaining work into a follow-up ticket: #3816

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants