Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[presidential map] decide on synchronization between FECP and Postgres #4172

Closed
2 of 3 tasks
jason-upchurch opened this issue Jan 31, 2020 · 6 comments
Closed
2 of 3 tasks
Assignees
Milestone

Comments

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Jan 31, 2020

Summary

currently data needed to support presidential map is manually loaded into PostgreSQL. We would like to identify a way that data will be routinely loaded/synchronized from FECP --> PostgreSQL

We need to get a better idea about how difficult this task is and if we have the appropriate data for this work to reflect correctly in the endpoint.

Completion criteria

  • process identified
  • process implemented and ready for execution

Technical considerations

  • what components are involved in a possible pipeline to achieve this automatically?
@jason-upchurch jason-upchurch added this to the Sprint 11.5 milestone Jan 31, 2020
@lbeaufort lbeaufort changed the title [presidential map] decide on synchronization between FECP and Postgre [presidential map] decide on synchronization between FECP and Postgres Feb 3, 2020
@fecjjeng
Copy link
Contributor

Data needed to support presidential map is "specially processed raw data". That is, it had not been through the regular coding process. Data expert (Paul) need to review and give a "go" signal before it can be published. The final set of tables were in FECP database only. The previous ticket had added these tables to our cloud database, and initial load of these tables had been done. Now at each filing deadline, after the final set of required FECP tables had been refreshed, their counterpart in our cloud PostgreSQL database will be refresh.

@fecjjeng
Copy link
Contributor

This final set of tables can be divided into two categories: summary data and detail schedule data. Due to the different nature of these tables, the refresh process will be different. (the process to refresh 2016 data will be the same described below for 2020 data, IF data changes)

The following summary tables will be completely refreshed. The amount of data need to be refreshed is small and a total refresh is clean and simple.
"pres_ca_cm_sched_a_join_20d"
"pres_ca_cm_sched_link_sum_20d"
"pres_ca_cm_sched_state_20d"
"pres_f3p_totals_ca_cm_link_20d"
"pres_nml_ca_cm_link_20d"
"pres_nml_f3p_totals_20d"
"pres_nml_form_3p_20d"

The following details tables has much more data. An incremental refresh process (only update the changed data) will be used.
"pres_nml_sched_a_20d"
"pres_nml_sched_b_20d"

@fecjjeng
Copy link
Contributor

For the complete total refresh, a python program will read from the source FECP database and re-load data into their postgreSQL counterpart in the cloud databases.

@fecjjeng
Copy link
Contributor

For the two detail schedule tables, the record of changes are captured in an audit table in the intermediate database. A separate Python program, which use this audit table as the "driver", to grab the changed rows in FECP and insert into ((or delete from) their postgreSQL counterpart in the cloud database, in multi sessions in case the amount of changed rows are large.

@fecjjeng
Copy link
Contributor

fecjjeng commented Feb 27, 2020

These are the sample result:
Screen Shot 2020-02-27 at 3 23 46 PM

Screen Shot 2020-02-27 at 3 28 56 PM

Screen Shot 2020-02-27 at 3 24 08 PM

@fecjjeng
Copy link
Contributor

Materialized view in intermediate database:
DISCLOSURE.PRES_NML_SCHED_A_16
DISCLOSURE.PRES_NML_SCHED_A_20
DISCLOSURE.PRES_NML_SCHED_B_16
DISCLOSURE.PRES_NML_SCHED_B_20

Package in intermediate database:
DISCLOSURE.DC4_PRES_REFRESH

Python program and shell scripts used
load_pres_data20.ksh
pres_loading20.py
PresNmlSchedA20d.py
PresNmlSchedB20d.py

load_pres_data16.ksh
pres_loading16.py
PresNmlSchedA16.py
PresNmlSchedB16.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants