Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link electronic filings to their most recent amendment; if they are amended. #2073

Closed
jontours opened this issue Dec 1, 2016 · 12 comments
Closed
Assignees
Milestone

Comments

@jontours
Copy link
Contributor

jontours commented Dec 1, 2016

Just doing some exploratory work. Since the filings are essentially a tree (although a tree with only one branch), a recursive function seemed like a good fit, thankfully and awesomely Postgres has good support this, the query is as follows [updated]:

with recursive oldest_filing as (
  (
    SELECT cmte_id, rpt_yr, rpt_tp, amndt_ind, receipt_dt, file_num, prev_file_num, mst_rct_file_num, array[file_num]::numeric[] as amendment_chain, 1 as depth, file_num as last
    FROM disclosure.nml_form_3p
    WHERE file_num = prev_file_num AND file_num = mst_rct_file_num

  )
  union
  select f3p.cmte_id, f3p.rpt_yr, f3p.rpt_tp, f3p.amndt_ind, f3p.receipt_dt, f3p.file_num, f3p.prev_file_num, f3p.mst_rct_file_num, (oldest.amendment_chain || f3p.file_num)::numeric(7,0)[], oldest.depth + 1, oldest.amendment_chain[1]
  from oldest_filing oldest, disclosure.nml_form_3p f3p
  where f3p.prev_file_num = oldest.file_num and f3p.rpt_tp = oldest.rpt_tp and f3p.file_num <> f3p.prev_file_num
),
--this joins the right sight to left having the effect that the max depth row will be null,
--the where statement then filters down to those rows.
--  Ref: http://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column
 most_recent_filing as (
    select a.*
    from oldest_filing a
      left outer join oldest_filing b
        on a.cmte_id = b.cmte_id and a.last = b.last and a.depth < b.depth
    where b.cmte_id is null
)
SELECT old_f.cmte_id,
  old_f.rpt_yr,
  old_f.rpt_tp,
  old_f.amndt_ind,
  old_f.receipt_dt,
  old_f.file_num,
  old_f.prev_file_num,
  mrf.file_num as mst_rct_file_num,
  old_f.amendment_chain
from oldest_filing old_f inner join most_recent_filing mrf on old_f.cmte_id = mrf.cmte_id and old_f.last = mrf.last;

This creates a table with path and depth for now. Path being the amendment chain from any row to its original type N filing. There is obviously a lot of work to do to integrate this but take a look at the query and run it in dev to verify it is working properly (I verified a couple).

Edit: this links any amendment to all the filings it amended. Some more work to be done to figure out how to build the chain to the most recent filing.

@LindsayYoung
Copy link
Contributor

Off the top of my head-
For electronic filings, going forward, we might want to do a do a join for most recent filing from the efiling table, we would have do back fill that with logic to replicate that. Sounds like we would also always use logic for paper.

@jontours
Copy link
Contributor Author

jontours commented Dec 1, 2016

Ah ok, that should be easy to add.

@jontours
Copy link
Contributor Author

jontours commented Dec 1, 2016

an example output of the query above ^^

  cmte_id  | rpt_yr | rpt_tp | amndt_ind |     receipt_dt      | file_num | prev_file_num | mst_rct_file_num |                             amendment_chain                             
-----------+--------+--------+-----------+---------------------+----------+---------------+------------------+-------------------------------------------------------------------------
 C00505800 |   2012 | 12G    | A         | 2014-04-21 00:00:00 |   921958 |        889821 |           932685 | {825219,863557,863623,889821,921958}
 C00431171 |   2012 | M10    | A         | 2014-08-11 00:00:00 |   944828 |        933473 |           944828 | {822044,855515,868064,871016,896745,933473,944828}
 C00431445 |   2007 | YE     | A         | 2008-08-30 00:00:00 |   360400 |        360363 |           360400 | {320718,338948,360363,360400}
 C00430975 |   2008 | YE     | A         | 2011-01-31 00:00:00 |   711878 |        711863 |           711878 | {396961,416499,418999,711863,711878}
 C00495622 |   2011 | Q3     | A         | 2013-02-25 00:00:00 |   858302 |        780355 |           992711 | {748593,767394,780355,858302}
 C00550111 |   2015 | YE     | A         | 2015-02-09 00:00:00 | -8832865 |      -8832865 |         -8832865 | {-8832865}
 C00505800 |   2012 | M5     | A         | 2014-04-21 00:00:00 |   921952 |        886326 |           932660 | {785708,860783,863554,886326,921952}
 C00505800 |   2012 | M9     | A         | 2014-04-21 00:00:00 |   921956 |        889017 |           932681 | {811430,860817,863556,863620,889017,921956}
 C00214999 |   2010 | Q2     | A         | 2010-07-22 00:00:00 |  -348935 |       -348935 |          -348935 | {-348935}
 C00431171 |   2012 | YE     | A         | 2014-06-27 00:00:00 |   933467 |        896738 |           934756 | {853797,859559,871131,896738,933467}
 C00431445 |   2007 | Q2     | A         | 2008-01-31 00:00:00 |   319100 |        305961 |           359395 | {295237,298186,305961,319100}
 C00431445 |   2007 | Q3     | A         | 2008-08-22 00:00:00 |   359397 |        325774 |           359397 | {307659,320482,325774,359397}
 C00430975 |   2007 | YE     | A         | 2008-06-20 00:00:00 |   345955 |        330180 |           711416 | {319734,329562,330178,330180,345955}
 C00430975 |   2008 | M3     | A         | 2011-01-31 00:00:00 |   711564 |        513061 |           711564 | {327988,330189,342310,346024,416484,416485,513061,711564}
 C00495622 |   2011 | YE     | A         | 2015-02-09 00:00:00 |   992712 |        858303 |           992712 | {761487,776337,858303,992712}
 C00431288 |   2009 | M3     | A         | 2009-11-30 00:00:00 |   441001 |        440968 |           441001 | {407336,440955,440956,440968,441001}
 C00431171 |   2012 | M8     | A         | 2013-12-04 00:00:00 |   896742 |        871092 |           944286 | {806252,821472,867925,871092,896742}
 C00430470 |   2007 | Q1     | A         | 2008-02-26 00:00:00 |   324756 |        315367 |           876670 | {284727,295364,299264,305650,309558,315367,324756}
 C00446104 |   2008 | M9     | A         | 2013-07-10 00:00:00 |   876564 |        405777 |           876564 | {363578,373044,381488,393585,405777,876564}
 C00432203 |   2012 | YE     | A         | 2013-02-13 00:00:00 | -4124823 |      -4124823 |         -4124823 | {-4124823}
 C00430470 |   2007 | Q1     | A         | 2013-07-10 00:00:00 |   876670 |        413814 |           876670 | {284727,295364,299264,305650,309558,315367,324756,329365,413814,876670}
 C00431916 |   2008 | M9     | A         | 2009-01-12 00:00:00 |   394903 |        394881 |           733772 | {363042,370590,381489,385124,394881,394903}
 C00431916 |   2007 | YE     | A         | 2008-09-19 00:00:00 |   363295 |        363291 |           733762 | {319383,320170,328008,329918,363142,363291,363295}
 C00505800 |   2012 | M7     | A         | 2014-06-20 00:00:00 |   932675 |        921954 |           932675 | {800348,863555,863617,886512,921954,932675}
 C00430470 |   2007 | Q1     | A         | 2007-10-26 00:00:00 |   309558 |        305650 |           876670 | {284727,295364,299264,305650,309558}
 C00431171 |   2012 | 12G    | A         | 2014-06-27 00:00:00 |   933475 |        896758 |           933475 | {827761,855531,868083,871022,896758,933475}
 C00431916 |   2008 | M9     | A         | 2011-07-08 00:00:00 |   733772 |        394903 |           733772 | {363042,370590,381489,385124,394881,394903,733772}
 C00431171 |   2012 | 30G    | A         | 2013-12-05 00:00:00 |   896785 |        891510 |           933479 | {840641,842943,859221,868141,871034,891510,896785}
 C00505800 |   2012 | M9     | A         | 2013-09-20 00:00:00 |   889017 |        863620 |           932681 | {811430,860817,863556,863620,889017}
 C00430512 |   2007 | Q2     | A         | 2008-05-05 00:00:00 |   337352 |        327103 |           337352 | {294449,299390,305118,319541,327103,337352}
 C00496497 |   2013 | Q1     | A         | 2014-04-10 00:00:00 |   915919 |        890372 |           915919 | {866711,878055,890372,915919}
 C00505800 |   2012 | M10    | A         | 2014-06-20 00:00:00 |   932682 |        921957 |           932682 | {823142,863622,889020,921957,932682}
 C00277525 |   2011 | YE     | A         | 2012-03-13 00:00:00 | -1094824 |      -1094824 |         -1094824 | {-1094824}
 C00446104 |   2008 | M9     | A         | 2009-02-27 00:00:00 |   405777 |        393585 |           876564 | {363578,373044,381488,393585,405777}
 C00430470 |   2007 | Q2     | A         | 2013-07-10 00:00:00 |   876700 |        324747 |           876700 | {295333,299409,305669,309559,315371,324747,876700}
 C00431445 |   2008 | 30G    | A         | 2009-03-03 00:00:00 |   406271 |        405707 |           754317 | {390969,397540,405707,406271}
 C00430470 |   2007 | YE     | A         | 2013-07-11 00:00:00 |   876794 |        390939 |           876794 | {318075,324403,329371,343845,390939,876794}
 C00575795 |   2016 | M3     | A         | 2016-08-11 00:00:00 |  1093618 |       1081059 |          1093618 | {1056807,1066653,1081059,1093618}
 C00431619 |   2007 | Q2     | A         | 2008-06-22 00:00:00 |   346132 |        322882 |           415318 | {295357,301187,306559,322882,346132}
 C00431916 |   2008 | M4     | A         | 2008-10-16 00:00:00 |   370480 |        363322 |           733767 | {334873,337707,363251,363322,370480}
 C00506055 |   2013 | Q1     | A         | 2013-05-02 00:00:00 | -4678830 |      -4678830 |         -4678830 | {-4678830}
 C00430470 |   2008 | M7     | A         | 2013-07-11 00:00:00 |   876937 |        364350 |           876937 | {353430,358618,364150,364350,876937}

@PaulClark2
Copy link
Contributor

This looks good. I like that we have the entire amendment chain.

We shouldn't show negative report IDs (file numbers). The negative file numbers are just place holders. There isn't a .fec file to link to for these.

@noahmanger noahmanger added this to the Sprint 2 milestone Dec 6, 2016
@noahmanger
Copy link

noahmanger commented Dec 6, 2016

Remaining questions:

  • What does the link go to?
  • Figure out the amendments for paper

@jontours
Copy link
Contributor Author

jontours commented Dec 8, 2016

@PaulClark2 @LindsayYoung Do you know of any interesting test cases for a paper filer? I've written out some logic and it would be great to test it for validity. thanks!

@PaulClark2
Copy link
Contributor

Try the DSCC and NRSC. They file monthly and often file amendments.

DSCC - C00042366
NRSC - C00027466

@jontours
Copy link
Contributor Author

jontours commented Dec 9, 2016

Thanks @PaulClark2, the logic for paper has proven to be a bit challenging. But here is a snippet of where I'm at right now with query for paper filers:

 cmte_id  | rpt_yr | rpt_tp | amndt_ind |     receipt_dt      | file_num | prev_file_num | mst_rct_file_num |       amendment_chain        |                             date_chain                              | depth |   last   
-----------+--------+--------+-----------+---------------------+----------+---------------+------------------+------------------------------+---------------------------------------------------------------------+-------+----------
 C00042366 |   2012 | 30G    | A         | 2013-02-05 00:00:00 | -3942832 |      -3548823 |         -3942832 | {-2941831,-3548823,-3942832} | {"2012-12-06 00:00:00","2013-01-18 00:00:00","2013-02-05 00:00:00"} |     3 | -2941831
 C00042366 |   2012 | 30G    | A         | 2013-02-05 00:00:00 | -3942832 |      -3548823 |         -3942832 | {-2941824,-3548823,-3942832} | {"2012-12-06 00:00:00","2013-01-18 00:00:00","2013-02-05 00:00:00"} |     3 | -2941824
 C00042366 |   2012 | 30G    | A         | 2013-02-05 00:00:00 | -3942832 |      -3548823 |         -3942832 | {-2941825,-3548823,-3942832} | {"2012-12-06 00:00:00","2013-01-18 00:00:00","2013-02-05 00:00:00"} |     3 | -2941825
 C00042366 |   2012 | 30G    | A         | 2013-02-05 00:00:00 | -3942832 |      -3548823 |         -3942832 | {-2941832,-3548823,-3942832} | {"2012-12-06 00:00:00","2013-01-18 00:00:00","2013-02-05 00:00:00"} |     3 | -2941832
 C00042366 |   2014 | M10    | A         | 2015-07-24 00:00:00 | -9295839 |      -9122834 |         -9295839 | {-8441871,-9122834,-9295839} | {"2014-10-20 00:00:00","2015-04-23 00:00:00","2015-07-24 00:00:00"} |     3 | -8441871
 C00042366 |   2014 | M10    | A         | 2015-07-24 00:00:00 | -9295839 |      -9122834 |         -9295839 | {-8441872,-9122834,-9295839} | {"2014-10-20 00:00:00","2015-04-23 00:00:00","2015-07-24 00:00:00"} |     3 | -8441872

Note that DSCC does indeed seem to be an interesting test case, something in my logic is off because it appears that multiple amendments are amending different original filings. Obviously this is not true, in my logic I say something amends an original N amendment indicator if the amendments matches rpt_yr, rpt_tp, cmte_id, and it's receipt date is greater than the original filing. So that's why some of these chains are intersecting.

So the question is what other type column can indicate which original filing those duplicate amendments should point to? I can post the logic if needed.

@jontours
Copy link
Contributor Author

jontours commented Dec 9, 2016

@LindsayYoung @PaulClark2 Well, after looking at some of the image pdfs for those original 30G filings it seems as though those different rows for 30G, rpt yr 2012 are really the same filing (I think). But the page count is of monster size so are they spanned across multiple filings? Seems to be a bit complicated and in that case would it be more correct to point the amendments to the filings first in the series? Sorry to lob so many questions.

@LindsayYoung
Copy link
Contributor

@jontours are you matching report type too? That could be amended but would generally be correct and if we could do a manual override in rare cases that things are really messed up, even better.

If getting the amendment chain it too challenging, making sure the outdated forms can point to the most recent form is the top priority. Maybe getting the full chain could then be a separate task.

@PaulClark2 is there a way we could get the time on the receipt date field? That would be handy when more than one amendment comes in during one day.

@jontours
Copy link
Contributor Author

jontours commented Dec 9, 2016

Getting the most recent amendment is probably the easiest task, so yeah that would definitely be an option to move forward.

@noahmanger
Copy link

Resolved by https://github.com/18F/openFEC/pull/2093. Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants