Skip to content

Latest commit

 

History

History
136 lines (90 loc) · 6.84 KB

File metadata and controls

136 lines (90 loc) · 6.84 KB

Presidential employment stimulus

This repo provides a webpage that is embedded in stateofthenation.gov.za. This webpage is published at pres-employment.openup.org.za. An embedded preview is available at sona-shell.netlify.app.

Development

Data processing is done using Python, website UX design in Webflow, and website dynamics using jQuery and D3.js.

Structure of the spreadsheet file

NOTE: Read this is you are updating the spreadsheet used as input for the website.

The basic structure of the spreadsheet is as follows:

  1. Targets - a sheet listing all programmes and their target number of beneficiaries. This stays the same for a phase, as targets are set once.

  2. Trends - a sheet listing programme outcomes. As the spreadsheet is updated, columns are added to this sheet.

  3. Provincial (beneficiaries) - the by-province breakdown of programmes - each province gets a column

  4. Demographic data - all non-province breakdowns: gender, youth, etc.

  5. Implementation status - the implementation status of each programme

  6. Department Descriptions - the descriptions and blurbs ("lead" and "paragraph") for each department

General rules for the spreadsheet:

  1. Keep it rectangular: the code expects a grid of rows and columns, so there must not be any merged cells, etc.

  2. Pay attention to naming: the programme names need to be exactly the same throughout the spreadsheet

  3. Whitespace matters: "Educational Assistants" is different to "Educational Assistants" and "Educational Assistants "

  4. Each change needs a new version: To make it clear which version of which, make sure that each time you change the spreadsheet you give the file a new name and store it in the appropriate place on Google Drive.

Updating data

NOTE: Read this if you are running the data update code.

Data is processed by the Python script in python-src/update_all_data.py. The previously used Jupyter Notebook is deprecated. The Python script has these parameters:

usage: update_all_data.py [-h] [--phase1_excel PHASE1_EXCEL] [--phase2_excel PHASE2_EXCEL] [--output_dir OUTPUT_DIR] [--output_filename OUTPUT_FILENAME]

options:
  -h, --help            show this help message and exit
  --phase1_excel PHASE1_EXCEL
  --phase2_excel PHASE2_EXCEL
  --output_dir OUTPUT_DIR
  --output_filename OUTPUT_FILENAME

The default output filename is data/all_data.json file.

Commits made to the data-updates branch are visible at https://data-updates--presidency-employment-stimulus.netlify.app/ and the staging branch updates to https://staging--presidency-employment-stimulus.netlify.app/.

Adding months

The list of valid months and corresponding columns in the Trends sheet is in python-src/presidential\_employment/__init__.py lines 14-117. The months should correspond to the number of columns in the Trends sheet - no more, no less. For lookup on the web interface, the data/lookups.json should be updated.

Update the end date of phases in src/index.html. There are in class="feature-value__phase-label" and class="phase-legend__text" on lines 269 and 202.

If the "number of direct participants" needs to be changed, this is in src/js/viz-phased.js (line 58).

Phases, Sections, Metrics, Dimensions: how the data breakdown works

The data from the spreadsheet is read into an Overview and a list of Departments. Within each of these, there are Phases, which in turn contain Sections. Sections are essentially the top-level page breakdowns, for example, the Programme Achievements in the Overview is a section, and the Programme Targets for a Department is another. Within each Department there is a Section for each type of opportunity: jobs created, livelihoods supported and jobs retained. Each Section has multiple Metrics. These are, for example, programmes like the DBE's Education Assistants programme. Each Metric has overall values and targets and zero or more Dimensions. The Dimensions are the breakdowns by time and by various demographics. Each Dimension has an associated visualation type and a set of values and targets.

The JSON is generated by the patched version of the dataclasses_json module (the patch is in this PR) and the classes are defined in the python-src/presidential_employment.py file in this repository.

Dimensions are parsed into their Python class representation by the code in compute_all_data_departments (in the above-mentioned Python file) and the make_dim function (for Dimensions that are represented by their own sheet, e.g. the Provincial breakdown) and also the code in that function that looks for columns (e.g. the gender ones) in the Demographics sheet. Demographic information is aggregated (for use in the Overview) by the compute_breakdowns function.

Import Webflow export

To update the website with a Webflow export, save the Webflow export to /webflow-export.zip, then run:

npm run webflow-import

Deployment

Commits to main are deployed to presidency-employment-stimulus.netlify.app by Netlify. The site pres-employment.openup.org.za points at this site.

Data structure and dependencies

Dependencies:

python>=3.9
dataclasses-json>=0.5.7
pandas>=1.4.1
numpy>=1.21.5

The data structures in use are:

Everything -> Overview
           -> List[Department]
           
Overview -> List[PhaseDates] # this describes the start and end dates of the phases
         -> List[Sections]   # the sections are for different types of beneficiary or other top-level divisions e.g. totals vs breakdowns

# when used in Overview
Section -> List[PhasedMetrics] # Metrics are both the top level summary (budget, total beneficiaries) and the different breakdowns
                               # a "PhasedMetric" has a list of total values and target values
PhasedMetric -> List[Dimension] # Dimensions hold the data displayed as line charts, bar charts, etc. i.e. breakdowns of a Metric
Dimension -> List[MultiMetricValue] # MultiMetricValues are used for dimensions that need values that map phase_num -> value

# when used in Department - in this case the phases are split apart at the top level as Phases, not via PhasedMetrics
Department -> Phase
Phase -> List[Section]
         List[Beneficiary]
         List[ImplementationDetail] # when the implementation status is stored on the Department level
Section -> List[Metric]
Metric -> List[Dimension]
          ImplementationDetail # when we have implementation status for a programme
Dimension -> List[MetricValue] # where the MetricValue stores value and value_target e.g. by time, by gender, etc