Skip to content
This repository has been archived by the owner on Sep 10, 2020. It is now read-only.

Populate raw schema in development VM #89

Merged
merged 5 commits into from
Nov 30, 2016

Conversation

redshiftzero
Copy link
Contributor

This PR adds the population of the raw schema to the Ansible playbook that provisions the development VM. This is done such that the feature generation and machine learning codes can be run more easily in the development VM - i.e. without having to run the crawler (can take a while) or connect to the production database (A Bad Idea). The data that is populating the raw schema in the VM is derived from our real data (with some anonymization). I also add the notebook where I construct this dataset for future reference / modification.

Upon request, I have also created a version of the data used to populate each individual table here for people to play with in a single file roles/crawler/files/raw-data/test_data.csv without needing to worry about joins.

This PR also bumps the version of Tor Browser since our download link in the Ansible play was old and the download link was 404ing

@coveralls
Copy link

Coverage Status

Coverage remained the same at 72.727% when pulling 5a73245 on populate-raw-schema-in-vm into b183c0c on master.

always_run: true
changed_when: false

- debug: var=raw_schema_population_result.results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still want these debug line in here? FYI recent versions of Ansible support the verbosity parameter on debug tasks, so the associated message will only display with e.g. -vvv if verbosity: 3. http://docs.ansible.com/debug_module.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay thanks for pointing that out, it's not necessary so I will remove this real quick

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snipped that line out

@redshiftzero redshiftzero force-pushed the populate-raw-schema-in-vm branch from 5a73245 to d8c2c2d Compare November 30, 2016 23:28
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling d8c2c2d on populate-raw-schema-in-vm into ** on master**.

@redshiftzero redshiftzero merged commit 7538b1b into master Nov 30, 2016
@psivesely psivesely deleted the populate-raw-schema-in-vm branch February 7, 2017 00:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants