Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lt. Gov Elections - ETL for VISTA replacement #359

Open
3 tasks done
nathankota opened this issue Mar 26, 2024 · 15 comments
Open
3 tasks done

Lt. Gov Elections - ETL for VISTA replacement #359

nathankota opened this issue Mar 26, 2024 · 15 comments
Assignees
Labels
status: waiting Typically waiting on the business owner

Comments

@nathankota
Copy link
Contributor

nathankota commented Mar 26, 2024

Benefit

The Lt. Gov's Election Office is working with a company (BPro Inc) to build a new system (TotalVote) to replace VISTA. The system requires some data sets (addresses, boundaries of precincts, cities, zip, counties) in particular schemas and formats. Lt Gov's Office requests UGRC's assistance in writing scripts to ETL data from our data services to the required schema and format for the TotalVote system.

Acceptance Criteria

  • Review TotalVote documentation, meet and work as a team to determine best solution
  • Build scripts/process to ETL UGRC data to TotalVote schema
  • Scripts are stored in the UGRC github repository and peer-reviewed
  • Explore: Enable replication of the ETL process by TotalVote or Lt Gov Elections for future ETL needs?

Notes

No response

Risks

  • Inability to fulfill the schema requirements of the TotalVote system with existing UGRC data
  • Inability to simplify the use of the scripts to hand over to vendor and/or Lt. Gov Elections
  • Lack of ownership by the vendor and/or Lt Gov Elections to run the script themselves in the future
  • Client's scope and expectation changes

Issue Reference

refs #

@nathankota
Copy link
Contributor Author

nathankota commented Mar 26, 2024

Sprint 1

  • Review documentation
  • Meet and discuss a plan

@nathankota
Copy link
Contributor Author

nathankota commented Apr 11, 2024

Sprint 1 Notes

The documentation review and internal meeting resulted in some questions. Some meeting notes here. Sent the following message to Lt. Gov. Elections Office. They will set up a meeting this week with UGRC, Elections, and the vendor to answer these questions.

Our current understanding is that we are expected to build scripts that extract data from statewide spatial data layers and create files in the proper format and schema to be loaded into the TotalAddress/TotalVote system. We understand that there are three primary datasets needed:

  • Spatial data: TotalAddress Foundational Map Layers (precincts/ballot areas, city, zip, counties)
  • Tabular data: TotalAddress Data Validation Lists: csv files of counties, cities, zip, precincts/ballot areas, and "settings"
  • Spatial data: TotalVote Address data import

Some of our initial questions, requests, and assumptions are:

  • Request: May we obtain system architecture and data flow diagrams to better understand how, when, and where the data elements we are expected to provide fit into the system?
  • Request: Please provide a specific data schema/model that shows the required field names and data types for the addresses. There is a table in the documentation that shows the "components", but those are not the same as expected shapefile field names and data types (e.g. int, long, text, and so forth)
  • Question: What is the relationship between the TotalAddress address data, and the TotalVote voter registration or residence data? As we mentioned in the first meeting, the "VISTA" addresses/residences are independent of the statewide Utah Address Point data that UGRC manages. The TotalAddress documentation suggests that the imported address data will provide the spatial reference for the tabular voter residence data but, again, the two data sets are currently independent of each other. We want to better understand the expected relationship between the two data sets within the system before we provide Utah address points.
  • Assumption: Once the data are extracted and files are created, someone from the Elections Office team will load the data into the TotalAddress/TotalVote system.
  • Assumption: The Elections Office will define the values and create the file for the Data Validation Settings List. It looks like UGRC can provide the State Coordinate Lat and Lon values, if necessary, but the remaining data values appear to be settings that should be defined by the Elections Office, according to their needs and preferences for the system.
  • Question: We intend to build scripts and deliver them to the Elections Office and vendor, such that they can be run at any time that a data update is needed. Who from the Elections Office, or the vendor, will be running the scripts and importing/uploading data as updates are needed?

@nathankota
Copy link
Contributor Author

nathankota commented Apr 16, 2024

Sprint 2

  • Meet with Lt. Gov Elections and vendor about the questions/comments above

Sprint 2 NOTES

Vendor answered our questions/concerned and we are clear about the specs/docs they provided for the data they need for the system.

@nathankota
Copy link
Contributor Author

nathankota commented Apr 22, 2024

Sprint 3

Build scripts according to specs.

@ZachBeck
Copy link
Member

ZachBeck commented Jun 4, 2024

Sprint 5 Notes

June 4

  • Address point csv has been submitted to be reviewed, waiting for feedback.

June 5

  • Script loaded into github and reviewed by Jake

June 6

  • Met with Elections Office and TotalVote team about a small adjustment to the precints (aka vista ballot areas) ETL script. Need to output the VistaID (which includes both the PrecinctID and the SubPrecinctID) rather than the PrecinctID

@agrc agrc deleted a comment from ZachBeck Jun 6, 2024
@steveoh
Copy link
Member

steveoh commented Jun 10, 2024

  • Script loaded into github

Do you have a link?

@ZachBeck
Copy link
Member

@ZachBeck
Copy link
Member

ZachBeck commented Jul 1, 2024

Precincts json with updated precinct ids sent for review.

@ZachBeck
Copy link
Member

ZachBeck commented Jul 2, 2024

waiting to hear back about data sent for review

@ZachBeck ZachBeck closed this as completed Jul 2, 2024
@ZachBeck ZachBeck reopened this Jul 2, 2024
@nathankota nathankota added carry forward This task was carried forward into the next quarter status: waiting Typically waiting on the business owner labels Jul 2, 2024
@nathankota
Copy link
Contributor Author

Carried-forward to FY25 Q1

@nathankota nathankota removed carry forward This task was carried forward into the next quarter status: waiting Typically waiting on the business owner labels Jul 17, 2024
@ZachBeck
Copy link
Member

###FY25 Sprint 2 NOTES
Emailed response to Total Address feedback:

Hi Ali,
I'm slowly working through the list of errors. Here is what I have so far from your previously attached corrections doc.

Invalid Zip Codes:
There are some sliver areas where an address is being assigned an incorrect zip that I'll be able to fix in the coming weeks. However, there are large pockets of addresses that I think are being flagged as errors incorrectly. All the flagged addresses that I checked with the USPS in the Hooper, Ballard, and Woodland areas have the correct zip. 5865 W 5500 S, Hooper, 84315, for example was flagged. Let me know what is causing the error and I can get it fixed.

Invalid Street Names:
These should be fixed now and an updated dataset can be found at:
https://drive.google.com/file/d/18aEvLa1PYySaOBMJaKGn0-yNZSVRoY4K/view?usp=sharing

Original address does not equal the final:
I'll have this fixed in the coming weeks. I didn't realize this field needed the address with the city and zip included (1595 E HIGHLAND TRAILS RD ENOCH 84721) where I was leaving that off (1595 E HIGHLAND TRAILS RD).

I'll keep you posted on my progress. In the future, how often would you like to have address updates?

@steveoh
Copy link
Member

steveoh commented Jul 31, 2024

Requirement: The scripts must be reviewed by another team member and live in the UGRC GitHub repository

ZachBeck/total-address-etl

Nathan's request was to load the code into the agrc org's github. In the repo settings you can request to transfer it.

@ZachBeck
Copy link
Member

Jake has already reviewed them.

@nathankota
Copy link
Contributor Author

https://docs.github.com/en/repositories/creating-and-managing-repositories/transferring-a-repository

To transfer a repository that you own to an organization, you must have permission to create a repository in the target organization.

@ZachBeck do you have permissions to create a repo in the agrc org and, if not, @steveoh can he be granted permission temporarily to make the transfer?

@ZachBeck
Copy link
Member

It looks like I have permissions.

@ZachBeck ZachBeck added the status: waiting Typically waiting on the business owner label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting Typically waiting on the business owner
Projects
Status: Done
Development

No branches or pull requests

5 participants