Extract and load.
- In GitHub, click "Use this template", then "Create a new repository".
- Enter a repository name, such as syncsurveycto-xyz, where xyz is the name of the SurveyCTO server to be synced, and click "Create repository".
- Clone the new repo to a directory on your machine of choice.
- Rename syncsurveycto-rsurveycto.Rproj to match the new repo's name.
- In the first line of README.md, change the two instances of "agency-fund/syncsurveycto-rsurveycto" to match the new repo's organization and name.
- In the repo's main directory, create a folder called secrets.
- Start an R session in the new repo's main directory. You should see a message that a project was loaded by renv.
- In R, run the following commands to upgrade the repo to the latest version of renv, restore the local project library, update the packages, and record the state of the library.
renv::upgrade() renv::restore() renv::update() renv::snapshot()
- In the secrets folder, create a file called scto_auth.txt containing the server name on the first line, username on the second, and password on the third. This username must have permission to download data and "allow server API access" must be enabled.
- If you haven't already, create a Google Cloud project named something like xyz-raw, where xyz is the name of your organization.
- Select the project and enable the BigQuery API.
- Create a service account named syncsurveycto-user.
- Create a JSON key for the service account, and download the JSON file to the secrets folder.
- For the project, give the service account (whose email address will be something like [email protected]) the role BigQuery User.
- Create two BigQuery datasets named surveycto and surveycto_dev in the desired region.
- For each of the two datasets, click Share, then Manage Permissions, then Add Principal, then add the information below (changing xyz-raw as appropriate), and click Save.
- Add principals: [email protected]
- Assign roles: BigQuery Data Editor
- Update the params/warehouse.yaml file, changing xyz-raw as appropriate:
- auth_file: Name of the JSON file for the service account
- project (for prod and dev environments): xyz-raw
- Seriously consider not using Postgres as a data warehouse, and instead using BigQuery.
- That is all for now.
- In the new GitHub repo, click Settings, then "Secrets and variables", then Actions.
- Click "New repository secret", enter the information below, then click "Add secret".
- Name: SCTO_AUTH
- Secret: Content of scto_auth.txt, with no trailing line break.
- Once again, click "New repository secret", enter the information below, then click "Add secret".
- Name: WH_AUTH
- Secret: Content of the auth_file specified in params/warehouse.yaml.
- In the git repo on your local machine, make a git commit and push the changes to GitHub.
- In the GitHub repo, go to Actions and ensure that a GitHub Actions run has started and that it completes without error.
- Once the run completes, click on sync_surveycto to see the job details, and ensure that the final line of the "Run script" step says "No ids to sync."
Sync mode | Airbyte equivalent | Supported for forms | Supported for datasets |
---|---|---|---|
overwrite | Full Refresh Overwrite | ✓ | ✓* |
append | Full Refresh Append | ✓ | ✓ |
incremental | Incremental Append | ✓ | - |
deduped | Incremental Append Deduped | ✓* | - |
* Recommended
- On your local machine, ensure you are not on the main branch.
- Update params/surveycto.yaml by adding lines to the streams list of the dev section, as indicated below. Take care with indentation.
The default value for review_status for forms is approved. To specify multiple values, use an underscore-separated string, e.g., approved_pending.
- id: best_id_ever sync_mode: chosen_sync_mode # review_status: approved # optional for forms, not allowed for datasets
- In the terminal, run
Rscript code/main.R
and ensure that the syncs succeed. - Go to the part of your warehouse specified in the dev environment of the params/warehouse.yaml file and ensure that the columns, number of rows, and content of the new table(s) look(s) right.
- Carefully copy the new lines from the dev section and paste them into the streams list of the prod section. Remember, this is Sparta.
- Make a git commit and push the changes to GitHub.
- On GitHub, create a sensibly named pull request and add someone as a reviewer.
- Ensure that the syncs initiated by the pull request and run on GitHub Actions succeed.
- Ensure that the table(s) in dev still look(s) right.
- Wait for the reviewer to approve and merge the pull request.
- Once the pull request is merged, ensure that the syncs run on GitHub Actions succeed.
- Ensure that the table(s) in prod look(s) right.
- On your local machine, ensure you are not on the main branch.
- Update params/surveycto.yaml by deleting or commenting out the relevant lines from the streams lists of the dev and the prod sections.
- In the terminal, run
Rscript code/main.R
and ensure that the syncs succeed. - Make a git commit and push the changes to GitHub.
- On GitHub, create a sensibly named pull request and add someone as a reviewer.
- Ensure that the syncs initiated by the pull request and run on GitHub Actions succeed.
- Wait for the reviewer to approve and merge the pull request.
- Once the pull request is merged, ensure that the syncs run on GitHub Actions succeed.
- The table(s) for the form or dataset will remain in BigQuery, but will not be updated.
- On your local machine, ensure you are not on the main branch.
- In R, run
renv::update()
to update all packages orrenv::update('agency-fund/syncsurveycto')
to update syncsurveycto. - In R, run
renv::snapshot()
. - If the renv.lock file has changed, continue with the steps below. If not, stop here.
- In the terminal, run
Rscript code/main.R
and ensure that the syncs succeed and that the tables in dev still look right. - Make a git commit and push the changes to GitHub.
- On GitHub, create a sensibly named pull request and add someone as a reviewer.
- Ensure that the syncs initiated by the pull request and run on GitHub Actions succeed.
- Ensure that the tables in dev still look right.
- Wait for the reviewer to approve and merge the pull request.
- Once the pull request is merged, ensure that the syncs run on GitHub Actions succeed.
- Ensure that the tables in prod still look right.
- On your local machine, ensure you are not on the main branch.
- In the .github/workflows/sync_surveycto.yaml file, edit the
cron
item. See details here. - Make a git commit and push changes to GitHub.
- On GitHub, create a sensibly named pull request and add someone as a reviewer.
- Ensure that the syncs initiated by the pull request and run on GitHub Actions succeed.
- Wait for the reviewer to approve and merge the pull request.
- Once the pull request is merged, ensure that the syncs run on GitHub Actions succeed and that they run on the intended schedule.