Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review CI system #2867

Closed
jonmaddock opened this issue Aug 21, 2023 · 2 comments
Closed

Review CI system #2867

jonmaddock opened this issue Aug 21, 2023 · 2 comments
Assignees

Comments

@jonmaddock
Copy link
Contributor

Now on github, we should review the CI system and think about any future tasks. This relates to #2862 .

@timothy-nunn
Copy link
Contributor

timothy-nunn commented Aug 30, 2023

PROCESS CI Plan

Introduction

The PROCESS CI system contains several core components that work together to ensure comprehensive testing and tracking of the PROCESS systems code.

  • Building a Docker container to run tests
  • Code quality checks
  • Building PROCESS artefacts
  • Testing PROCESS (unit, integration, regression)
  • Tracking PROCESS MFiles
  • Building documentation

PROCESS regression testing consists of two jobs: a no-tolerance job, and a 5% tolerance job. The no-tolerance job will fail if any of the checked quantities in the MFile are not exactly identical. The 5% job will only fail if any of the checked quantities change by more than 5%. Any new model or model change is expected to change the optima which we converge to, and thus a lot of quantities can change as the result of a change. Generally, a reviewer will look at the change in iteration variables' final values to validate these differences.

History

On GitLab, PROCESS had a less-than-ideal solution to changing test artefacts: each main pipeline would make a commit back to main that overwrite the MFile test assets with the results from main. Branches that were made off of main then benchmarked their changes against the MFile on main at the time the branch was created. This workflow was not implemented into GitHub actions due to the hacky nature.

Aim

To provide a framework that maintains accountability for a branches' changes. That is, the regression test differences should only reflect the changes made on a feature branch.

The proposal

flowchart TB

    subgraph "Feature 1"
        commit_f1_ba(("Commit BA"))
        commit_f1_bb(("Commit BB"))

        commit_f1_ba --- commit_f1_bb
    end

    subgraph main
        commit_a((("Commit A")))
        commit_b((("Commit B")))
        commit_c((("Commit C")))
        commit_d((("Commit D")))

        commit_a --- commit_b
        commit_b --- commit_c
        commit_c --- commit_d
    end

    commit_b --- commit_f1_ba

    subgraph "Feature 2"
        commit_f2_ca(("Commit CA"))
        commit_f2_cb(("Commit CB"))

        commit_f2_ca --- commit_f2_cb
    end

    commit_c --- commit_f2_ca

    subgraph legend
        legend_circle((" ")) ~~~ |A commit with no tracking data| legend_dcircle
        legend_dcircle(((" "))) ~~~ |A commit with tracking data| null:::hidden

        classDef hidden display: none;
    end
Loading

The above graph models a typical Git setup of the PROCESS repository. Commits A-D are features which have been merged into the main branch. Two modelers have created feature branches that come off of main at different commits. Feature 1 needs to be tested against the changes it makes to the code with respect to the state of the code at Commit B. Feature 2 needs to be tested against the changes it makes to the code with respect to the state of the code at Commit C.

The proposal is that each commit (merge commit) onto main will be tracked. Tracking means we will keep the data (MFile) from each main pipeline in some external data repository. If we want to run the regression tests on our feature branch, for example Commit CB:

  1. We recurse back through the Git history until we find the latest commit in our history that has tracking data, Commit C.
  2. We download the tracking data for Commit C and compare our result of running PROCESS against the downloaded data.

Benefits:

  • Data is managed separately from the core code.
  • Eliminates a CI system commiting to itself.
  • Maintain accountability for each branch's changes.
  • Will enable integration of 'the tracker' and testing.

Drawbacks and considerations:

  • Testing will require internet.
  • Won't work if main's history is ever changed.
  • MFile's could be cached locally to avoid constantly downloading data (which could be slow).
  • What happens if some of the regression tests do not converge on main?

Actions

Immediate:

  • Begin tracking MFiles in some form to ensure we have a history if required.
  • Document our proposed changes.

Short term:

  • Create a v3.0 release of PROCESS.
  • Implement the above proposal likely using GitHub as the data repository.

Medium term:

  • Use a proper data repository to store tracked data.
  • Integrate the tracker and testing to use one source of data.

@timothy-nunn
Copy link
Contributor

New issues created to complete the short and medium term features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants