Test /schedule_a/by_state/by_candidate data #3777

lbeaufort · 2019-05-16T16:55:28Z

Change /schedules/schedule_a/by_state/by_candidate/ to have election_full=true capability
By state: cross-compare totals to other endpoints.
make sure default election election_full is false
Add tests.

Example: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?api_key=DEMO_KEY&candidate_id=P80001571&cycle=2020&per_page=99

fecjjeng · 2019-05-17T02:07:10Z

@lbeaufort @PaulClark2 I don't see /candidates/totals/by_state endpoint. Do you mean the endpoint /schedules/schedule_a/by_state/by_candidate/? Or this is going to be a new endpoint?

lbeaufort · 2019-05-17T02:12:34Z

@fecjjeng you are correct, can you update the issue?

lbeaufort · 2019-06-05T19:51:09Z

We need to be able to check these individual aggregate endpoints (by_state and by_size etc) so we should consider adding totals /schedule_a/individuals/by_candidate

one level up totals for
/schedules/schedule_a/by_size/by_candidate/
/schedules/schedule_a/by_state/by_candidate/
-> The breakdowns, when added up, should equal the totals

if feasible, we should be able to check this data for committees as well
committee/C00580100/schedules/schedule_a/by_state/
committee/C00580100/schedules/schedule_a/by_size/
committee/C00580100/schedules/schedule_a/by_employer/
committee/C00580100/schedules/schedule_a/by_occupation/
-> The breakdowns, when added up, should equal the totals

Individual contributions won't match financial summary line 11a because transfers (line 12) can contain individual contributions.

What is our source of truth? Just sum them up? /schedules/schedule_a/?is_individual=true? Add unitemized totals? We need to be able to cross-compare them.
Do we call it /schedules/schedule_a/by_state/by_candidate/totals and /schedules/schedule_a/by_size/by_candidate/totals?

lbeaufort · 2019-06-05T21:12:10Z

Update - it looks like this endpoint already had election_full capability - I'm going to work on adding a totals endpoint and QAing the data.

Examples:

Election full = False
https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?api_key=DEMO_KEY&candidate_id=P80001571&cycle=2020&per_page=99&election_full=false&sort=-total
Florida is top state with $1,442,308.51

Election full = True
https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?api_key=DEMO_KEY&candidate_id=P80001571&cycle=2020&per_page=99&election_full=true&sort=-total
Texas is top state with:$2,740,405.08

lbeaufort · 2019-06-10T21:31:17Z

Findings:

Most aggregate totals look good
Senate candidates seem to have more than one 2020 full election row for by_size but the totals add up.
Mismatch for two house candidates. It looks like these candidates have filed pre-special election reports.

**** Checking: H8NC09123, MCCREADY, DANIEL in prod. Election year is 2020. ****

By size: https://api.open.fec.gov/v1/schedules/schedule_a/by_size/by_candidate/?candidate_id=H8NC09123&cycle=2020&election_full=False&per_page=20&api_key=DEMO_KEY

By state: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id=H8NC09123&cycle=2020&election_full=False&api_key=DEMO_KEY

Candidate ID totals: https://api.open.fec.gov/v1/candidate/H8NC09123/totals?cycle=2020&full_election=False&api_key=DEMO_KEY
unitemized total: $707,665.35
by_size_total: $2,968,516.75
by_state_total: $2,260,851.40

**** Checking: H0PA12181, KELLER, FRED in prod. Election year is 2020. ****

By size: https://api.open.fec.gov/v1/schedules/schedule_a/by_size/by_candidate/?candidate_id=H0PA12181&cycle=2020&election_full=False&per_page=20&api_key=DEMO_KEY

By state: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id=H0PA12181&cycle=2020&election_full=False&api_key=DEMO_KEY

Candidate ID totals: https://api.open.fec.gov/v1/candidate/H0PA12181/totals?cycle=2020&full_election=False&api_key=DEMO_KEY
unitemized total: $29,658.69
by_size_total: $487,400.49
by_state_total: $457,741.80
{('H8NC09123', 'MCCREADY, DANIEL'), ('H0PA12181', 'KELLER, FRED')}

Script

import os
import requests
import click

api_key = os.environ.get("FEC_API_KEY")

url_lookup = {
    "local": "http://localhost:5000",
    "dev": "https://fec-dev-api.app.cloud.gov",
    "stage": "https://api-stage.open.fec.gov",
    "prod": "https://api.open.fec.gov",
}
by_size_by_candidate = (
    "/schedules/schedule_a/by_size/by_candidate/?candidate_id={}&cycle={}&election_full={}&per_page=20&api_key=" + api_key
)
by_state_by_candidate = "/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id={}&cycle={}&election_full={}&api_key=" + api_key
# full_election, not election_full
candidate_id_totals = (
    "/candidate/{}/totals?cycle={}&full_election={}"
    "&api_key=" + api_key
)

# Click can't take lists as args - must be strings
@click.command()
@click.option(
    "--office-types", default="H, S, P", help="Which offices to check. Format as H,S"
)
@click.option("--year", default=2020, help="Start year")
@click.option("--candidate-id", default=None, help="Check one candidate")
def compare_candidate_aggs(office_types, year, candidate_id):

    mismatch_list = set([])
    env = 'prod'
    base_url = url_lookup[env]
    endpoints = ["by size", "by state", "candidate ID totals"]

    for candidate_info in get_top_candidates(office_types, year, candidate_id):

        candidate = Candidate(candidate_info)

        by_size_url, by_state_url, id_totals_url = candidate.get_urls(base_url)

        print(f"\n**** Checking: {candidate.id}, {candidate.name} in {env}. " \
              f"Election year is {candidate.election}. ****\n")

        print(f"\nBy size: {get_printable(by_size_url)}")
        print(f"\nBy state: {get_printable(by_state_url)}")
        print(f"\nCandidate ID totals: {get_printable(id_totals_url)}")

        by_size_results = get_results(by_size_url)
        by_state_results = get_results(by_state_url)
        id_totals_results = get_results(id_totals_url)

        unitemized = id_totals_results[0].get("individual_unitemized_contributions")
        by_size_total = sum(result.get("total") for result in by_size_results) - unitemized
        by_state_total = sum(result.get("total") for result in by_state_results)

        by_state_total = round(by_state_total, 2)
        by_size_total = round(by_size_total, 2)
        print("unitemized total: ${:,.2f}".format(unitemized))
        print("by_size_total:  ${:,.2f}".format(by_size_total))
        print("by_state_total: ${:,.2f}".format(by_state_total))

        if not (by_size_total == by_state_total):
            mismatch_list.add((candidate.id, candidate.name))
            print("Mismatch list: {}".format(mismatch_list))

    print("Mismatch list: {}".format(mismatch_list))

def get_top_candidates(office_types, start_year, candidate_id):

    top_candidates = []
    candidate_url = (
        "https://api.open.fec.gov/v1/candidates/totals/?"
        "sort_hide_null=false&sort_nulls_last=true&is_active_candidate=True"
        "&election_full=true&sort=-receipts&page=1&api_key=" + api_key
    )
    if candidate_id:
        candidate_url += f"&candidate_id={candidate_id}"
    if "P" in office_types.upper():
        # Top 20 presidential
        top_candidates.extend(
            get_results(
                candidate_url + f"&election_year={start_year}&office=P&per_page=20"
            )
        )
    if "S" in office_types.upper():
        # Top 30 senate, 3 cycles
        # Start year plus next two elections
        for year in range(start_year, start_year + 5, 2):
            top_candidates.extend(
                get_results(
                    candidate_url + f"&election_year={year}&office=S&per_page=30"
                )
            )
    if "H" in office_types.upper():
        # Top 100 house
        top_candidates.extend(
            get_results(
                candidate_url + f"&election_year={start_year}&office=H&per_page=100"
            )
        )

    return top_candidates


class Results(object):
    """docstring for Results"""

    def __init__(self, env):
        self.env = env
        self.result_type = None

    def set(self, result_type, value):
        # 0.00 is Falsy
        # Do any cleanup here
        self.result_type = value

    def get(self, result_type, value):
        return self.result_type.get(value)

class Candidate(object):
    """docstring for Candidate"""

    def __init__(self, candidate):
        self.id = candidate.get("candidate_id")
        self.name = candidate.get("name")
        self.election = candidate.get("election_year")
        self.office = candidate.get("office")
        self.office_full = candidate.get("office_full")
        self.state = candidate.get("state")
        self.district = candidate.get("district")
        if self.office.upper() == 'H':
            self.election_full = False
        else:
            self.election_full = True

    def get_urls(self, base_url):

        by_size_url = (
            base_url
            + "/v1"
            + by_size_by_candidate.format(self.id, self.election, self.election_full)
        )
        by_state_url = (
            base_url + "/v1" + by_state_by_candidate.format(self.id, self.election, self.election_full)
        )
        candidate_id_totals_url = (
            base_url
            + "/v1"
            + candidate_id_totals.format(
                self.id, self.election, self.election_full
            )
        )
        return by_size_url, by_state_url, candidate_id_totals_url


def get_printable(url):
    return url.replace(api_key, "DEMO_KEY")


def get_results(url):
    return requests.get(url).json().get("results")


if __name__ == "__main__":
    compare_candidate_aggs()

PaulClark2 · 2019-06-11T12:26:33Z

PA-12 had a special general on 5/21. NC-09 is scheduled to have a special election 9/10.

PA-12

Pre-general report due 5/9
Post-general report due 6/20

NC-09

Pre-primary report due 5/2
July quarterly report due 7/15
Pre-general report due 8/29
Post-general report due 10/10
October quarterly report waived

lbeaufort · 2019-06-11T16:30:46Z

@PaulClark2 it looks like the difference is due to the unitemized totals - maybe it's double-counting? I need to investigate further.

**** Checking: H0PA12181, KELLER, FRED in prod. Election year is 2020. ****

By size: https://api.open.fec.gov/v1/schedules/schedule_a/by_size/by_candidate/?candidate_id=H0PA12181&cycle=2020&election_full=False&per_page=20&api_key=DEMO_KEY

By state: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id=H0PA12181&cycle=2020&election_full=False&api_key=DEMO_KEY

Candidate ID totals: https://api.open.fec.gov/v1/candidate/H0PA12181/totals?cycle=2020&full_election=False&api_key=DEMO_KEY

unitemized total: $29,658.69
raw by_size_total: $517,059.18
adjusted by_size_total: $487,400.49
by_state_total: $457,741.80
...
difference: $-29,658.69

**** Checking: H8NC09123, MCCREADY, DANIEL in prod. Election year is 2020. ****

By size: https://api.open.fec.gov/v1/schedules/schedule_a/by_size/by_candidate/?candidate_id=H8NC09123&cycle=2020&election_full=False&per_page=20&api_key=DEMO_KEY

By state: https://api.open.fec.gov/v1/schedules/schedule_a/by_state/by_candidate/?per_page=150&page=1&candidate_id=H8NC09123&cycle=2020&election_full=False&api_key=DEMO_KEY

Candidate ID totals: https://api.open.fec.gov/v1/candidate/H8NC09123/totals?cycle=2020&full_election=False&api_key=DEMO_KEY

unitemized total: $707,665.35
raw by_size_total: $3,676,182.10
adjusted by_size_total: $2,968,516.75
by_state_total: $2,260,851.40
...
difference: $-707,665.35

lbeaufort · 2019-06-12T15:11:23Z

I split the remaining work into a follow-up ticket: #3816

lbeaufort added this to the Sprint 9.1 milestone May 16, 2019

lbeaufort added Needs refinement Needs prioritization labels May 16, 2019

AmyKort removed the Needs refinement label May 16, 2019

lbeaufort removed the Needs prioritization label May 20, 2019

lbeaufort changed the title ~~Change /candidates/totals/by_state to have election_full=true capability~~ Change /schedules/schedule_a/by_state/by_candidate/ to have election_full=true capability May 20, 2019

PaulClark2 modified the milestones: Sprint 9.1, Sprint 9.2 May 20, 2019

JonellaCulmer mentioned this issue May 31, 2019

Build Where Contributions Come From Tool fecgov/fec-cms#2929

Closed

4 tasks

lbeaufort assigned lbeaufort and unassigned lbeaufort May 31, 2019

lbeaufort added Work: Back-end Work: Database labels Jun 4, 2019

dorothyyeager assigned hcaofec and lbeaufort and unassigned hcaofec Jun 4, 2019

lbeaufort mentioned this issue Jun 5, 2019

Add separate two-year-period dropdown for raising/spending election profile pages fecgov/fec-cms#2903

Closed

2 tasks

lbeaufort changed the title ~~Change /schedules/schedule_a/by_state/by_candidate/ to have election_full=true capability~~ Test /schedule_a/by_state/by_candidate data Jun 5, 2019

lbeaufort mentioned this issue Jun 12, 2019

Itemized contributions being double-counted for two candidates in "by size" breakdown #3816

Closed

lbeaufort closed this as completed Jun 12, 2019

This was referenced Mar 4, 2024

Feature: Where contributions come from fecgov/fec-epics#156

Closed

Epic: Visualizations fecgov/fec-epics#148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test /schedule_a/by_state/by_candidate data #3777

Test /schedule_a/by_state/by_candidate data #3777

lbeaufort commented May 16, 2019 •

edited

Loading

fecjjeng commented May 17, 2019

lbeaufort commented May 17, 2019

lbeaufort commented Jun 5, 2019 •

edited

Loading

lbeaufort commented Jun 5, 2019 •

edited

Loading

lbeaufort commented Jun 10, 2019 •

edited

Loading

PaulClark2 commented Jun 11, 2019

lbeaufort commented Jun 11, 2019 •

edited

Loading

lbeaufort commented Jun 12, 2019

Test /schedule_a/by_state/by_candidate data #3777

Test /schedule_a/by_state/by_candidate data #3777

Comments

lbeaufort commented May 16, 2019 • edited Loading

fecjjeng commented May 17, 2019

lbeaufort commented May 17, 2019

lbeaufort commented Jun 5, 2019 • edited Loading

lbeaufort commented Jun 5, 2019 • edited Loading

lbeaufort commented Jun 10, 2019 • edited Loading

Script

PaulClark2 commented Jun 11, 2019

lbeaufort commented Jun 11, 2019 • edited Loading

lbeaufort commented Jun 12, 2019

lbeaufort commented May 16, 2019 •

edited

Loading

lbeaufort commented Jun 5, 2019 •

edited

Loading

lbeaufort commented Jun 5, 2019 •

edited

Loading

lbeaufort commented Jun 10, 2019 •

edited

Loading

lbeaufort commented Jun 11, 2019 •

edited

Loading