Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert tests to use tracked MFiles #2915

Closed
timothy-nunn opened this issue Sep 7, 2023 · 2 comments · Fixed by #3089
Closed

Convert tests to use tracked MFiles #2915

timothy-nunn opened this issue Sep 7, 2023 · 2 comments · Fixed by #3089
Assignees

Comments

@timothy-nunn
Copy link
Contributor

This issue implements the workflow for testing described in #2867. #2870 introduced the tracking of MFiles for each merge commit onto main.

This issue implements the recursive search for regression test assets, downloading them to be used in testing.

Considerations when implementing this feature, which may be deferred to other issues, are caching of test assets.

@timothy-nunn
Copy link
Contributor Author

timothy-nunn commented Feb 14, 2024

Copy of the plan:

PROCESS CI Plan

Introduction

The PROCESS CI system contains several core components that work together to ensure comprehensive testing and tracking of the PROCESS systems code.

  • Building a Docker container to run tests
  • Code quality checks
  • Building PROCESS artefacts
  • Testing PROCESS (unit, integration, regression)
  • Tracking PROCESS MFiles
  • Building documentation

PROCESS regression testing consists of two jobs: a no-tolerance job, and a 5% tolerance job. The no-tolerance job will fail if any of the checked quantities in the MFile are not exactly identical. The 5% job will only fail if any of the checked quantities change by more than 5%. Any new model or model change is expected to change the optima which we converge to, and thus a lot of quantities can change as the result of a change. Generally, a reviewer will look at the change in iteration variables' final values to validate these differences.

History

On GitLab, PROCESS had a less-than-ideal solution to changing test artefacts: each main pipeline would make a commit back to main that overwrite the MFile test assets with the results from main. Branches that were made off of main then benchmarked their changes against the MFile on main at the time the branch was created. This workflow was not implemented into GitHub actions due to the hacky nature.

Aim

To provide a framework that maintains accountability for a branches' changes. That is, the regression test differences should only reflect the changes made on a feature branch.

The proposal

flowchart TB

    subgraph "Feature 1"
        commit_f1_ba(("Commit BA"))
        commit_f1_bb(("Commit BB"))

        commit_f1_ba --- commit_f1_bb
    end

    subgraph main
        commit_a((("Commit A")))
        commit_b((("Commit B")))
        commit_c((("Commit C")))
        commit_d((("Commit D")))

        commit_a --- commit_b
        commit_b --- commit_c
        commit_c --- commit_d
    end

    commit_b --- commit_f1_ba

    subgraph "Feature 2"
        commit_f2_ca(("Commit CA"))
        commit_f2_cb(("Commit CB"))

        commit_f2_ca --- commit_f2_cb
    end

    commit_c --- commit_f2_ca

    subgraph legend
        legend_circle((" ")) ~~~ |A commit with no tracking data| legend_dcircle
        legend_dcircle(((" "))) ~~~ |A commit with tracking data| null:::hidden
        classDef hidden display: none;
    end
Loading

The above graph models a typical Git setup of the PROCESS repository. Commits A-D are features which have been merged into the main branch. Two modelers have created feature branches that come off of main at different commits. Feature 1 needs to be tested against the changes it makes to the code with respect to the state of the code at Commit B. Feature 2 needs to be tested against the changes it makes to the code with respect to the state of the code at Commit C.

The proposal is that each commit (merge commit) onto main will be tracked. Tracking means we will keep the data (MFile) from each main pipeline in some external data repository. If we want to run the regression tests on our feature branch, for example Commit CB:

  1. We recurse back through the Git history until we find the latest commit in our history that has tracking data, Commit C.
  2. We download the tracking data for Commit C and compare our result of running PROCESS against the downloaded data.

Benefits:

  • Data is managed separately from the core code.
  • Eliminates a CI system commiting to itself.
  • Maintain accountability for each branch's changes.
  • Will enable integration of 'the tracker' and testing.

Drawbacks and considerations:

  • Testing will require internet.
  • Won't work if main's history is ever changed.
  • MFile's could be cached locally to avoid constantly downloading data (which could be slow).
  • What happens if some of the regression tests do not converge on main?

Actions

Immediate:

  • Begin tracking MFiles in some form to ensure we have a history if required.
  • Document our proposed changes.

Short term:

  • Create a v3.0 release of PROCESS.
  • Implement the above proposal likely using GitHub as the data repository.

Medium term:

  • Use a proper data repository to store tracked data.
  • Integrate the tracker and testing to use one source of data.

@timothy-nunn
Copy link
Contributor Author

timothy-nunn commented Feb 21, 2024

A running track of questions I have asked myself, and answered:

What about a non-linear history while a branch is being developed?
The non-linear bits of the branch will have not been on main, so won't have any test assets. The most recent tracked commit on main will always be reached first by the recursive algorithm (in practice it will actually be iterative, but it is conceptually recursive).

What about people wanting to run tests who haven't cloned our repository?
Who will do that? Running tests should be done on a development install of PROCESS which will almost always have required git to install, so its a safe bet everyone wanting to run the tests will:

  1. Have git installed.
  2. Be inside of the PROCESS git repository when running the tests.

What about people with no internet?
They can't test PROCESS, but most developers of PROCESS should be on internet, and if they can't be for a sustained period of time, probably have bigger issues than whether or not PROCESS works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants