[presidential map] decide on synchronization between FECP and Postgres #4172

jason-upchurch · 2020-01-31T16:16:11Z

Summary

currently data needed to support presidential map is manually loaded into PostgreSQL. We would like to identify a way that data will be routinely loaded/synchronized from FECP --> PostgreSQL

We need to get a better idea about how difficult this task is and if we have the appropriate data for this work to reflect correctly in the endpoint.

Completion criteria

process identified
process implemented and ready for execution

Technical considerations

what components are involved in a possible pipeline to achieve this automatically?

fecjjeng · 2020-02-27T19:55:15Z

Data needed to support presidential map is "specially processed raw data". That is, it had not been through the regular coding process. Data expert (Paul) need to review and give a "go" signal before it can be published. The final set of tables were in FECP database only. The previous ticket had added these tables to our cloud database, and initial load of these tables had been done. Now at each filing deadline, after the final set of required FECP tables had been refreshed, their counterpart in our cloud PostgreSQL database will be refresh.

fecjjeng · 2020-02-27T20:07:00Z

This final set of tables can be divided into two categories: summary data and detail schedule data. Due to the different nature of these tables, the refresh process will be different. (the process to refresh 2016 data will be the same described below for 2020 data, IF data changes)

The following summary tables will be completely refreshed. The amount of data need to be refreshed is small and a total refresh is clean and simple.
"pres_ca_cm_sched_a_join_20d"
"pres_ca_cm_sched_link_sum_20d"
"pres_ca_cm_sched_state_20d"
"pres_f3p_totals_ca_cm_link_20d"
"pres_nml_ca_cm_link_20d"
"pres_nml_f3p_totals_20d"
"pres_nml_form_3p_20d"

The following details tables has much more data. An incremental refresh process (only update the changed data) will be used.
"pres_nml_sched_a_20d"
"pres_nml_sched_b_20d"

fecjjeng · 2020-02-27T20:09:05Z

For the complete total refresh, a python program will read from the source FECP database and re-load data into their postgreSQL counterpart in the cloud databases.

fecjjeng · 2020-02-27T20:16:15Z

For the two detail schedule tables, the record of changes are captured in an audit table in the intermediate database. A separate Python program, which use this audit table as the "driver", to grab the changed rows in FECP and insert into ((or delete from) their postgreSQL counterpart in the cloud database, in multi sessions in case the amount of changed rows are large.

fecjjeng · 2020-02-27T20:25:45Z

These are the sample result:

fecjjeng · 2020-02-27T21:13:59Z

Materialized view in intermediate database:
DISCLOSURE.PRES_NML_SCHED_A_16
DISCLOSURE.PRES_NML_SCHED_A_20
DISCLOSURE.PRES_NML_SCHED_B_16
DISCLOSURE.PRES_NML_SCHED_B_20

Package in intermediate database:
DISCLOSURE.DC4_PRES_REFRESH

Python program and shell scripts used
load_pres_data20.ksh
pres_loading20.py
PresNmlSchedA20d.py
PresNmlSchedB20d.py

load_pres_data16.ksh
pres_loading16.py
PresNmlSchedA16.py
PresNmlSchedB16.py

jason-upchurch added Work: Back-end Needs refinement labels Jan 31, 2020

jason-upchurch added this to the Sprint 11.5 milestone Jan 31, 2020

lbeaufort added Work: Database and removed Work: Back-end labels Jan 31, 2020

lbeaufort changed the title ~~[presidential map] decide on synchronization between FECP and Postgre~~ [presidential map] decide on synchronization between FECP and Postgres Feb 3, 2020

JonellaCulmer removed the Needs refinement label Feb 13, 2020

JonellaCulmer assigned fecjjeng Feb 18, 2020

pkfec modified the milestones: Sprint 11.5, Sprint 11.6 Mar 3, 2020

pkfec closed this as completed Mar 3, 2020

patphongs mentioned this issue Mar 4, 2024

Epic: Presidential map fecgov/fec-epics#174

Closed

42 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[presidential map] decide on synchronization between FECP and Postgres #4172

[presidential map] decide on synchronization between FECP and Postgres #4172

jason-upchurch commented Jan 31, 2020 •

edited by fecjjeng

Loading

fecjjeng commented Feb 27, 2020

fecjjeng commented Feb 27, 2020

fecjjeng commented Feb 27, 2020

fecjjeng commented Feb 27, 2020

fecjjeng commented Feb 27, 2020 •

edited

Loading

fecjjeng commented Feb 27, 2020

[presidential map] decide on synchronization between FECP and Postgres #4172

[presidential map] decide on synchronization between FECP and Postgres #4172

Comments

jason-upchurch commented Jan 31, 2020 • edited by fecjjeng Loading

Summary

Completion criteria

Technical considerations

fecjjeng commented Feb 27, 2020

fecjjeng commented Feb 27, 2020

fecjjeng commented Feb 27, 2020

fecjjeng commented Feb 27, 2020

fecjjeng commented Feb 27, 2020 • edited Loading

fecjjeng commented Feb 27, 2020

jason-upchurch commented Jan 31, 2020 •

edited by fecjjeng

Loading

fecjjeng commented Feb 27, 2020 •

edited

Loading