-
-
Notifications
You must be signed in to change notification settings - Fork 729
Data initialization bootstrapping, seeding, and importing data
This page is for collecting thoughts, questions, possibilities, and links for implementing data initialization. It is part of the Developer Guide.
THIS PAGE IS A WORK IN PROGRESS.
Edit and add to it to record issues, questions, and decisions made. Be Bold!
Data Initialization covers the functionality for:
- loading the initial data required to get the system to run (the boostrapping data),
- importing and creating seed data,
- loading different sets of sample data
- db agnostic
- be able to read in different data sets so that:
- different countries can load data applicable to them (currently the system loads Australia-specific data)
- different demos can be run (showing different scenarios, setups; for different audiences)
- different use-cases can be represented
- for development and testing
- initially, will be done via command-line or rake task; will assume the user has enough knowledge to do that (later versions might have a UI; see below)
The core complexity is handling referential integrity of the database tables (ensuring that each entry that should refer to another does so with a valid id (e.g. a state always refers to a country, and the country it refers to exists in the country table)
Different gems handle it differently: (TBD - list those that do referential integrity checks, those that don't)
Advantages of ignoring referential integrity during data import:
- much quicker to import; much simpler to develop Disadvantages of ignoring referential integrity during import:
- if invalid data is entered, it may not be apparent immediately -- it may cause some failure of the system that may not obviously be traced to invalid data
Bootstrapping: The system has to load a minimal set of data just so it can be started. This minimal amount of data is the boostrapping data.
Because some data is dependent on others, some data definitely has to be read in/exist before others. ex: The country has to exist before the states can be loaded; the states info has to exist before the suburbs can be loaded.
- what are the dependencies ? What is the ordering?
- how much do we have to worry about it?
- may depend on the the gems & classes we use ?
- if we let ActiveRecord do validating, then the dependency checking is handled for us during validation
- how much do we have to worry about it?
Where should the data be kept? What directory should be the standard/convention?
- some gems and systems use "db/seed" [TBD need reference [AE/wsd])
- some use lib/ xxx (TBD - need reference [AE/wsd])
- we just need to make a decision to start with
- perhaps the directory is passed in as an option, but having a convention for where it 'should' live will keep the system neat and clean(er)
Should the file names be standardized? Yes, at least initially. [PM & AE/wsd 2 Apr 14]
- Rails/Fixtures (ActiveRecord::....) assumes that the file has the same name as the table it will be imported into.
- We will adopt that standard.
- potentially, a future version could allow for different file names that could be specified (e.g. given as an option)
- We will adopt that standard.
Where are the lines drawn between "initial bootstrap data," "seed data," and "sample data" ?
- no matter the category, it all should be able to be read in
- perhaps the initial bootstrap data is a config file, and everything else is YML? (or maybe the config file points to YML needed?)
- Spree will dictate some of this
- "initial bootstrap data" = the absolute minimum amount of information required to start the system
- the default country and/or the country to use ?
- are the differences between "seed data" and "sample data" conventions that we define, or is there some technical or domain-driven reason?
-
existing spree_country table includes the 3 letter ISO-3166-1 (alpha-3) abbrv. for the country (see http://en.wikipedia.org/wiki/ISO_3166-1_alpha-3)
-
Ashley will write up some notes about what she's learned so far about the different gems and classes that could be used, including the approaches and assumptions they use, and other issues she's come across so far.
- Seed-Fu: no validation
-
Will lib/tasks/dev.rake be completely replaced by this?
-
work with different file formats in addition to YML (CSV, JSON, etc.).
- CSV = allows users to work with the data in spreadsheets -- tools that non-tech people might be comfortable with
- JSON = more secure than YML (can't inject code as can be done with YML))
- should be developed now so that this could be possible later
-
would be good to be able to generate empty CSV files based on the current db schema -- so that users have some kind of structure they can fill in. (a sort of simple template for each table/file needed)
- dump data from current db into YAML/CSV. Good not only to keep historical snapshots during dev/testing/demos, etc. -- also so that the data can be used as a starting place for creating a different data set.
-
output to a directory that is given as an option
- how is this different from rake db:schema:dump ? (is certainly related...)
- should be db agnostic
- how is this different from rake db:schema:dump ? (is certainly related...)
Development environment setup
- Pipeline development process
- Bug severity
- Feature template (epic)
- Internationalisation (i18n)
- Dependency updates
Development
- Developer Guidelines
- The process of review, test, merge and deploy
- Making a great commit
- Making a great pull request
- Code Conventions
- Database migrations
- Testing and Rspec Tips
- Automated Testing Gotchas
- Rubocop
- Angular and OFN
- Feature toggles
- Stimulus and Turbo
Testing
- Testing process
- OFN Testing Documentation (Handbooks)
- Continuous Integration
- Parallelized test suite with knapsack
- Karma
Releasing
Specific features
Data and APIs
Instance-specific configuration
External services
Design