Example 1: Disbursements missing from /schedule_b/by_recipient/ #3390

lbeaufort · 2018-09-19T12:40:53Z

Some disbursements are missing from /schedule_b/by_recipient/.

From an API user:

There are a few cases where the vendor names I’m searching with yield no results through the API but yield results through the manual search on the FEC website.

Example 1:
First, considering the vendor DRL Publications

The following request, which is trying to get a list of every committee that spent money on DRL Publications yields no results:
https://api.open.fec.gov/v1/schedules/schedule_b/by_recipient/?api_key=DEMO_KEY&cycle=2016&per_page=20&recipient_name=DRL%20PUBLICATIONS&page=1

However, the same search on /schedule_b/ and the front end returns one result:
https://api.open.fec.gov/v1/schedules/schedule_b/?api_key=DEMO_KEY&sort_hide_null=true&data_type=processed&recipient_name=drl+publications&two_year_transaction_period=2016&min_date=01%2F01%2F2015&max_date=12%2F31%2F2016&sort=-disbursement_date&per_page=30

https://www.fec.gov/data/disbursements/?two_year_transaction_period=2016&data_type=processed&recipient_name=drl+publications&min_date=01%2F01%2F2015&max_date=12%2F31%2F2016

Example 2 has a different cause: #3429

Make necessary python changes after DB work is done: Change sched_b_aggregate tables to include count and totals for memo entries #3431
Notify user of resolution (original email was sent to [email protected])

lbeaufort · 2018-10-05T14:11:34Z

The logic preventing this from appearing seems to be in disclosure.dsc_sched_b_aggregate_recipient and I'm looking into getting that logic (It's currently on the Oracle side).

I'm guessing at least for example 1, we may be excluding memos. Example 2 is on line 29 - I want to see what other filters are on the aggregate table.

lbeaufort · 2018-10-05T18:49:03Z

One possible solution @fecjjeng and I discussed is showing $0 totals for recipients that have only memoed transactions.

Example:
$100 Itemized:
Jean $50
Laura $50
Paul $50 X Memo

$100 By recipient aggregate:
Name, Amount, "Memo/Description"
Jean $50
Laura $50
Paul $0 Memo

logic for disclosure.dsc_sched_b_aggregate_recipient:

select cmte_id, rpt_yr + mod(rpt_yr, 2) as cycle,
              name as recipient_nm, sum(TRANSACTION_AMT) as total, count(TRANSACTION_AMT) as count
from DISCLOSURE.F_ITEM_RECEIPT_OR_EXP
where SCHED_TP_CD = 'SB' AND cmte_id = r1.cmte_id
  and (rpt_yr + mod(rpt_yr, 2)) = r1.cycle
  and TRANSACTION_AMT is not null and TRANSACTION_AMT <> 0
  and (memo_cd != 'X' or memo_cd is null)
  --TODO: only sum where not memoed
group by cmte_id, (rpt_yr + mod(rpt_yr, 2)), name);

lbeaufort · 2018-10-09T15:06:58Z

After giving it more thought, a separate column for memo_total and memo_count makes the most sense here.

SQL:

SELECT
   cmte_id,
   cycle,
   recipient_nm,
   sum(non_memo_amount) as total,
   sum(non_memo_count) as count,
   sum(memo_amount) as memo_total,
   sum(memo_count) as memo_count
FROM (
   SELECT cmte_id, rpt_yr + mod(rpt_yr, 2) as cycle, name as recipient_nm,
   CASE WHEN memo_cd = 'X' or memo_cd is not null then TRANSACTION_AMT ELSE 0 END AS memo_amount,
   CASE WHEN memo_cd = 'X' or memo_cd is not null then 1 ELSE 0 END AS memo_count,
   CASE WHEN memo_cd !='X' or memo_cd is null then TRANSACTION_AMT ELSE 0 END AS non_memo_amount,
   CASE WHEN memo_cd !='X' or memo_cd is null then 1 ELSE 0 END AS non_memo_count
            from DISCLOSURE.F_ITEM_RECEIPT_OR_EXP
           where SCHED_TP_CD = 'SB'
                 --and (rpt_yr + mod(rpt_yr, 2)) = r1.cycle
                 and TRANSACTION_AMT is not null and TRANSACTION_AMT <> 0
                 and cmte_id='C00575795'
                 and name='DRL PUBLICATIONS'
)
group by cmte_id, cycle, recipient_nm;

Output looks like:

cycle	cmte_id	recipient_nm	total	count	memo_total	memo_count
2016	C00575795	DRL PUBLICATIONS	0	0	1138.55	1

lbeaufort · 2018-10-09T16:21:10Z

For the second example, we're using filter_multi which is requiring an exact match. A python change to use filter_fulltext allows for partial text searches. I also added test coverage for this behavior.

fecjjeng · 2018-10-10T02:05:33Z

The original logic is to only include the non-memo entries (i.e. memo_cd != ‘X’ or memo_cd is null).

The proposed business logic updates separated the entries into non-memo and memo items, with two additional columns of memo_total and memo_count.
Non-memo:
memo_cd !='X' or memo_cd is null

Memo:
memo_cd = 'X' or memo_cd is not null
I will suggest to change the second logic (for memo entries) to memo_cd = ‘X’, omit the memo_cd is not null part. Otherwise if memo_cd is a not NULL value, but != ‘X’, it will be count in both places (I know it is extremely rare, but I seen several rows like that).

@lbeaufort @PaulClark2 @jwchumley This change will involve updates in Oracle procedures, in Java script, and in Postgresql database table structure, re-load the data, and API, we would like to be sure before we proceed.

We want to separate them into memo/non-memo and have extra columns with the business logic described above (with my minor modification).

If #1 holds true,
2. All the scheduled_b aggregate (sched_b_aggregate_recipient, schedule_b_aggregate_purpose, sched_b_aggregate_receipient_id) currently use the logic of only include the non-memo entries (i.e. memo_cd != ‘X’ or memo_cd is null). Do we want to follow the same pattern for all three of them?

If #2 holds true,
3. Should we implement all changes for these three tables in this ticket?
Or separate into 3 tickets, each one handle one table?
Or a separate ticket to do the API work and use the original ticket to do the API work?

lbeaufort · 2018-10-10T17:26:25Z

Thanks so much for taking a look, @fecjjeng! I agree with your SQL edit. It also makes sense for me to keep all three Schedule B aggregates consistent. I have bandwidth to write SQL for the other two tables, get testing examples, and make the API and SQL migration changes necessary this sprint.

fecjjeng · 2018-10-10T18:56:14Z

@lbeaufort

I can make the sql change, don't worry about it. I was just confirming that this is the business logic we want to follow.
Yes, some testing example would be helpful when you have time. Thank you.
I will open a ticket to do all the DB side works (including Oracle, Java, Postgresql) for the 3 SB aggregates tables.
Since the two problems mentioned in this ticket has different nature, do you think maybe we should split Example 1: Disbursements missing from /schedule_b/by_recipient/ #3390 into two tickets, one deal with the full_text search which has no data dependency and you already implement the API change; the other one that relying on this memo data. That way one of these two problems that does not need to wait for this database change can go in this Sprint. Thoughts?

lbeaufort · 2018-10-10T19:14:01Z

@fecjjeng I agree that this issue should be split into examples 1 and 2 because they have entirely different causes.

I created example 2 here: #3429 and linked it to my PR: #3423

fecjjeng · 2018-10-17T21:12:25Z

@lbeaufort
Updates on issue #3431:

Three "new" tables had been created in the DEV database:
disclosure.dsc_sched_b_aggregate_recipient_new
disclosure.dsc_sched_b_aggregate_recipient_id_new
disclosure.dsc_sched_b_aggregate_purpose_new

These tables had been loaded with data for the past three days. It can be used as template for the API work for issue #3390. Once API tasks had been worked out, a complete reload (which will take some time) will be executed to provide full sets of data.

These tables had renamed the following two columns:
total -> non_memo_total
count -> non_memo_count

And added two new columns
memo_total
memo_count

fecjjeng · 2018-10-23T17:51:34Z

We plan to do dual daily updates on both new and current sched_b_agg_xxxxx tables after the initial loading until API in all environments start using the new tables. Then another issue will be used to drop the "old" tables.

hcaofec · 2018-11-02T13:50:17Z

API change is completed but can't be merged until the CMS work is done. Because the variable names returned by endpoint have been changed, the CMS needs to capture this change. Otherwise, the page will broke.

http://127.0.0.1:8000/data/committee/C00580100/?tab=spending

JonellaCulmer · 2018-11-09T18:35:23Z

Issue has been merged, so closing.

lbeaufort added the Bug label Sep 19, 2018

lbeaufort added this to the Sprint 7.3 milestone Sep 27, 2018

dorothyyeager assigned lbeaufort Oct 2, 2018

lbeaufort mentioned this issue Oct 9, 2018

Use full text filter for aggregate resources #3423

Merged

dorothyyeager assigned fecjjeng Oct 9, 2018

lbeaufort mentioned this issue Oct 10, 2018

Example 2: disbursements missing from /schedule_b/by_recipient/ #3429

Closed

1 task

lbeaufort changed the title ~~Disbursements missing from /schedule_b/by_recipient/~~ Example 1: Disbursements missing from /schedule_b/by_recipient/ Oct 10, 2018

fecjjeng mentioned this issue Oct 11, 2018

Change sched_b_aggregate tables to include count and totals for memo entries #3431

Closed

5 tasks

lbeaufort modified the milestones: Sprint 7.3, Sprint 7.4 Oct 11, 2018

lbeaufort unassigned fecjjeng Oct 11, 2018

lbeaufort assigned hcaofec and unassigned lbeaufort Oct 24, 2018

This was referenced Oct 25, 2018

Drop old dsc_sched_b_aggregate_xxxxx tables after complete switch to new tables with update business logic #3459

Closed

add memo agg to sched b agg tables #3460

Merged

JonellaCulmer modified the milestones: Sprint 7.4, Sprint 7.5 Oct 30, 2018

hcaofec mentioned this issue Nov 2, 2018

Add 'memo_total' to disbursement aggregate page? fecgov/fec-cms#2490

Open

fec-jli changed the title ~~Example 1: Disbursements missing from /schedule_b/by_recipient/~~ [Don't Merge] Example 1: Disbursements missing from /schedule_b/by_recipient/ Nov 2, 2018

hcaofec changed the title ~~[Don't Merge] Example 1: Disbursements missing from /schedule_b/by_recipient/~~ Example 1: Disbursements missing from /schedule_b/by_recipient/ Nov 6, 2018

hcaofec mentioned this issue Nov 7, 2018

Switch to new schedule B aggrgeate tables with memo totals #3481

Merged

JonellaCulmer closed this as completed Nov 9, 2018

fecjjeng mentioned this issue Nov 21, 2018

drop original dsc_sched_b_aggregate tables #3502

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example 1: Disbursements missing from /schedule_b/by_recipient/ #3390

Example 1: Disbursements missing from /schedule_b/by_recipient/ #3390

lbeaufort commented Sep 19, 2018 •

edited by hcaofec

Loading

lbeaufort commented Oct 5, 2018

lbeaufort commented Oct 5, 2018

lbeaufort commented Oct 9, 2018 •

edited

Loading

lbeaufort commented Oct 9, 2018

fecjjeng commented Oct 10, 2018 •

edited

Loading

lbeaufort commented Oct 10, 2018 •

edited

Loading

fecjjeng commented Oct 10, 2018 •

edited

Loading

lbeaufort commented Oct 10, 2018

fecjjeng commented Oct 17, 2018 •

edited

Loading

fecjjeng commented Oct 23, 2018

hcaofec commented Nov 2, 2018

JonellaCulmer commented Nov 9, 2018

Example 1: Disbursements missing from /schedule_b/by_recipient/ #3390

Example 1: Disbursements missing from /schedule_b/by_recipient/ #3390

Comments

lbeaufort commented Sep 19, 2018 • edited by hcaofec Loading

lbeaufort commented Oct 5, 2018

lbeaufort commented Oct 5, 2018

lbeaufort commented Oct 9, 2018 • edited Loading

lbeaufort commented Oct 9, 2018

fecjjeng commented Oct 10, 2018 • edited Loading

lbeaufort commented Oct 10, 2018 • edited Loading

fecjjeng commented Oct 10, 2018 • edited Loading

lbeaufort commented Oct 10, 2018

fecjjeng commented Oct 17, 2018 • edited Loading

fecjjeng commented Oct 23, 2018

hcaofec commented Nov 2, 2018

JonellaCulmer commented Nov 9, 2018

lbeaufort commented Sep 19, 2018 •

edited by hcaofec

Loading

lbeaufort commented Oct 9, 2018 •

edited

Loading

fecjjeng commented Oct 10, 2018 •

edited

Loading

lbeaufort commented Oct 10, 2018 •

edited

Loading

fecjjeng commented Oct 10, 2018 •

edited

Loading

fecjjeng commented Oct 17, 2018 •

edited

Loading