Document how to create initial data for tables #36

fyliu · 2022-09-01T18:04:37Z

Overview

Since some of our database tables need to be pre-populated with data, we need to decide on a way to do it and document it.

Action Items

Research ways to add initial data
Decide on a way that makes sense for us
Document it
remove dependency in Eliminate external JSON export script #144 and move it into New Issue Review

Resources/Instructions

3 options for django model initial data set up is a pretty comprehensive guide to the ways to create initial data
django dumpdata script generates json from existing data
django data migrations for reference on how to write migration files
dumpscript from django-extensions generates python code from existing data
~~django-seed only generates random data, so it's not useful to us~~
exportjson.js blog post and script

In any solution, we need to be able to connect it to django's migration system or in a script so it can be auto-inserted on clean build.

Data-creation code directly inside the migration file. OR just use as standalone script.
1. From spreadsheet
  1. Use App Script to generate json file
  2. Use a script to generate the python code for adding the data
2. From existing table data
  1. Use dumpscript to generate the python code for adding the data
Load SQL script from django migration file
Place the sql script in <app>/sql/<app>.sql
1. From spreadsheet
  1. Use App Script to generate json file
  2. Use a script to generate the SQL commands from json
2. From existing table data
  1. Use DBeaver to generate insert statements
Load fixture from django migration file
Place the migration file in <app>/fixtures/<app>.json
1. From spreadsheet
  1. Use App Script to generate json file
  2. Use a script to rewrite json in django migration format
2. From existing table data
  1. Use dumpdata to generate json from data already in the db

The text was updated successfully, but these errors were encountered:

fyliu · 2022-09-01T19:00:02Z

I've looked into options 1 and 3ii. They are both viable options and I think the fixture option is the better one since we can also use it for testing, although with the django_db decorator will automatically apply all the migrations, so all the options will work in that case. We don't have any non-django_db tests but we might eventually. So the fixture option is still preferable if other factors are equal.

In a call with @ExperimentsInHonesty, I learned about the possibility of generating json from G sheets using an app script. I later tried option 3i. It also works well. The advantage of it is that it may be a more direct path from the initial spreadsheet to the final migration file. The options that dumps data from existing tables have the extra step of first inserting the data into the system through the admin site. That's a potentially step to introduce user errors.

To compare the work involved between from spreadsheet and from table data

both
- same migrations file setup
from sheet
- more direct, one less step for potential copy/paste error
  - there's a single copy/paste and save as json file
- need extra script to reshape json from spreadsheet to the proper format
from table data
- need to add data into the db first, from admin site, which is relatively user-friendly
- the amount of this manual work depends on how many rows need to be inserted. Most tables are very simple. SDG-related table rows can be a chore.
- need a script or manual work to remove automatic fields like uuid

Now that it's laid out, starting from sheets data has more opportunities for automation, so it feels more future-proof.

The next step will be to evaluate if the option 3i route I've done will be adequate for our needs.

problem: one thing I noticed is that not providing created_at and updated_at values causes an error for a non-nullable field. Making them nullable gets around this but then there's no timestamps in those rows.

fyliu · 2022-09-01T22:41:45Z

Another possibility

This one imports the data from a CSV file. The code is custom to each model but it can likely be made more generic.

fyliu · 2022-09-02T05:26:16Z

To answer the null issue from 2 comments ago, loading from fixture is meant to not activate the auto timestamp behavior defined in the model. The load from fixture functionality is meant to be for dumping and loading data that was already in the database at one point and not for loading initial data.
See this django ticket for details about why they won't change this behavior.

So we now have to choose:

Generate the fixtures from spreadsheet and allow the timestamps fields to be null. It may not be a disadvantage to not have it for the pre-populated data.
Manually insert the data into django and use dumpdata like it's intended to be used. This will write the timestamps from the db.
Generate python code from spreadsheet so that it'll go through the normal route to adding data to the db and have the timestamps.

fyliu · 2022-09-27T18:21:01Z

This is documented here and linked from the wiki.

The strategy is to export the JSON from the spreadsheet, convert it into a python script that can insert the data, then create a migration file that will call the script when it's processed.

The advantages of using a python script as opposed to importing the data in JSON form using loaddata:

it will generate timestamps where as loaddata needs the JSON file to contain timestamps for each record
loaddata requires the data to be in the database and dumped using dumpdata
manually (via script) creating the JSON data dump is possible, but it won't have the necessary timestamps
Making timestamps optional will get around the requirement, but that's not good practice
manually (via script) adding timestamps is possible, but it would look weird to have some data that's way before the website is launched. The JSON file would need to be saved to revision control, which is also weird.

fyliu · 2022-10-03T23:07:25Z

I finished this and then realized there's a better way to do this. The initial data shouldn't be made into migrations if this project is supposed to be generic. They should be left as individual scripts to be run on initial database setup by us, and customized by other organizations for their needs.

fyliu · 2023-03-30T19:31:18Z

I have a draft PR #141 for this. It's a draft because it's dependent on #140 to be merged.

fyliu · 2023-04-06T22:34:12Z

I modified the PR to not depend on sphinx so we can move this forward.

fyliu · 2023-04-10T19:27:49Z

There's no model created that has a need for initial data. This makes it more difficult to review the database insertion part of #141. I will go implement #24.

fyliu · 2023-04-12T18:24:55Z

Looks like #24 has the same structure as #35 and we don't have a direction for working on #35 yet. So I can't work on #24.

The other thing I can do is pull the SOC_Major table issue out of the ice box, but I would rather not do that since it's going to be prioritized in the far future.

Blocker: I need to implement a table with initial data, and there's no good one to do. This is blocking on #35.

Create Table: user_status_type #35

fyliu · 2024-01-17T23:16:14Z

This issue was implemented by PR #226

Initial data migration 36 #226

fyliu added documentation Improvements or additions to documentation research Issue involving doing research labels Sep 1, 2022

fyliu mentioned this issue Sep 27, 2022

Create Table: user_status_type #35

Closed

19 tasks

fyliu self-assigned this Oct 5, 2022

ExperimentsInHonesty added this to the 5 - Team Workflow milestone Mar 4, 2023

ExperimentsInHonesty added size: missing role: missing feature: missing s: PD team stakeholder: People Depot Team labels Mar 5, 2023

ExperimentsInHonesty added size: 2pt Can be done in 7-12 hours feature: docs: team guide Same as PD team documentation? role: technical writing and removed role: missing size: missing feature: missing labels Mar 16, 2023

This was referenced Mar 23, 2023

Initial data guide 36 #141

Closed

People Depot Team Meeting Agenda 2024 #103

Open

Eliminate external JSON export script #144

Closed

ExperimentsInHonesty added the dependency Issue has dependencies label May 5, 2023

ExperimentsInHonesty unassigned fyliu May 5, 2023

ethanstrominger mentioned this issue Nov 6, 2023

Obsolete: Initial data migration 36 #223

Closed

fyliu closed this as completed Jan 17, 2024

shmonks added this to P: PD: Project Board Jun 7, 2024

shmonks moved this to Done in P: PD: Project Board Jun 7, 2024

ExperimentsInHonesty added the complexity: missing label Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document how to create initial data for tables #36

Document how to create initial data for tables #36

fyliu commented Sep 1, 2022 •

edited

Loading

fyliu commented Sep 1, 2022

fyliu commented Sep 1, 2022 •

edited

Loading

fyliu commented Sep 2, 2022 •

edited

Loading

fyliu commented Sep 27, 2022

fyliu commented Oct 3, 2022

fyliu commented Mar 30, 2023

fyliu commented Apr 6, 2023

fyliu commented Apr 10, 2023

fyliu commented Apr 12, 2023 •

edited

Loading

fyliu commented Jan 17, 2024

Document how to create initial data for tables #36

Document how to create initial data for tables #36

Comments

fyliu commented Sep 1, 2022 • edited Loading

Overview

Action Items

Resources/Instructions

fyliu commented Sep 1, 2022

fyliu commented Sep 1, 2022 • edited Loading

fyliu commented Sep 2, 2022 • edited Loading

fyliu commented Sep 27, 2022

fyliu commented Oct 3, 2022

fyliu commented Mar 30, 2023

fyliu commented Apr 6, 2023

fyliu commented Apr 10, 2023

fyliu commented Apr 12, 2023 • edited Loading

fyliu commented Jan 17, 2024

fyliu commented Sep 1, 2022 •

edited

Loading

fyliu commented Sep 1, 2022 •

edited

Loading

fyliu commented Sep 2, 2022 •

edited

Loading

fyliu commented Apr 12, 2023 •

edited

Loading