From a5cfc2d68a76964aa3dbf48a0db78f4ff4d300b6 Mon Sep 17 00:00:00 2001 From: amyheather Date: Tue, 10 Dec 2024 14:10:56 +0000 Subject: [PATCH] docs(reflections): reorganised to match order and categories in article --- pages/reflections.qmd | 766 ++++++++++++++++++++++-------------------- 1 file changed, 409 insertions(+), 357 deletions(-) diff --git a/pages/reflections.qmd b/pages/reflections.qmd index 7432e01..390c41f 100644 --- a/pages/reflections.qmd +++ b/pages/reflections.qmd @@ -4,7 +4,12 @@ echo: False bibliography: ../references.bib --- -This page describes the facilitators for each reproduction (and, conversely, the barriers to the reproduction, in their absence). With each section, I have created a table which evaluates whether the facilitators were fully met (✅), partially met (🟡), not met (❌) or not applicable (N/A) for each study. +This page describes the facilitators and barriers encountered in each reproduction, presented as a series of recommendations. These are grouped into two themes: + +* Recommendations to support reproduction +* Recommendations to facilitate troubleshooting + +With each section, I have created a table which evaluates whether the facilitators were fully met (✅), partially met (🟡), not met (❌) or not applicable (N/A) for each study. Links to each study: @@ -56,11 +61,47 @@ criteria_wide = pd.DataFrame(criteria_dict, index=col).T eval_chart(criteria_wide).show(config={'displayModeBar': False}) ``` -## Environment +## Recommendations to support reproduction. + +### Set-up + +::: {.callout-note icon=false collapse=true} + +## Share code with an open licence + +| @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | +| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | +| ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | + +Absence of licence prevents legal reuse of code for reproduction. + +::: + +::: {.callout-note icon=false collapse=true} + +## Link publication to a specific version of the code + +| @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | +| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | +| | | | | | ❌ | | | + + + +@johnson_cost_2021: Not met. Had commits to their GitHub repository after the publication date. It wasn't clear which version aligned with the publication. However, the most recent commits add clear README instructions to the repository. We decided to use the latest version of the repository, but it would have beneficial to have releases/versions/a change log that would help to outline the commit history in relation to the publication and any subsequent changes. + +::: ::: {.callout-note icon=false collapse=true} -## List required packages +## List required packages and versions + +### Packages and versions + +| @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | +| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | +| ❌ | ❌ | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | + +### Packages | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | @@ -92,27 +133,7 @@ eval_chart(criteria_wide).show(config={'displayModeBar': False}) * If there are local dependencies (e.g. other GitHub repositories), make sure to (a) mention and link to these repositories, so it is clear they are also required, and (b) include licence/s in those repositories also, so they can be used. * This was a common issue. -::: - -::: {.callout-note icon=false collapse=true} - -## Be aware of potential system dependencies - -There can also be system dependencies, which will vary between systems, and may not be obvious if researchers already have these installed. We identified these when setting up the docker environments (which act like "fresh installs"): - -* @shoaib_simulation_2021, @lim_staff_2020, @anagnostou_facs-charm_2022 - no dependencies -* @huang_optimizing_2019, @kim_modelling_2021, @johnson_cost_2021 and @wood_value_2021 - libcurl4-openssl-dev, libssl-dev, libxml2-dev, libglpk-dev, libicu-dev - as well as `tk` for Johnson et al. 2021 -* @hernandez_optimal_2015 - wget, build-essential, libssl-dev, libffi-dev, libbz2-dev, libreadline-dev, libsqlite3-dev, zlib1g-dev, libncurses5-dev, libgdbm-de, libnss3-dev, tk-dev, liblzma-dev, libsqlite3-dev, lzma, ca-certificates, curl, git - -Although it would be unreasonable for authors to be aware of and list all system dependencies, given they may not be aware of them, this does show the benefit of creating something like docker in identifying them and making note of them within the docker files. - -This issue was specific to (a) R studies, and (b) the study with an unsupported version of Python that required building it from source in the docker file. - -::: - -::: {.callout-note icon=false collapse=true} - -## Provide versions +### Versions | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | @@ -149,481 +170,518 @@ This issue was specific to (a) R studies, and (b) the study with an unsupported ::: -## Structure of code and scripts +### Running the model -::: {.callout-warning icon=false collapse=true} +::: {.callout-tip icon=false collapse=true} -## Model is provided in a "runnable" format +## Provide code for all scenarios and/or sensitivity analyses | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| ❌ | ❌ | ❌ | ❌ | N/A | 🟡 | ❌ | ✅ | -@shoaib_simulation_2021: Fully met. Provided as a single `.py` file which ran model with function `main()`. +@shoaib_simulation_2021: Not met. There were several instances where it took quite a while to understand how and where to modify the code in order to run scenarios (e.g. no arrivals, transferring admin work, reducing doctor intervention in deliveries). -@huang_optimizing_2019: Not met. The model code was provided within the code for a web application, but the paper was not focused on this application, and instead on specific model scenarios. I had to extract the model code and transform it into a format that was "runnable" as an R script/notebook. +@huang_optimizing_2019: Not met. Set up a notebook to programmatically run the model scenarios. It took alot of work to modify and write code that could run the scenarios, and I often made mistakes in my interpretation for the implementation of scenarios, which could be avoided if code for those scenarios was provided. -@lim_staff_2020: Fully met. Provided as a single `.py` file which ran the model with a for loop. +@lim_staff_2020: Not met. Several parameters or scenarios were not incorporated in the code, and had to be added (e.g. with conditional logic to skip or change code run, removing hard-coding, adding parameters to existing). -@kim_modelling_2021: Fully met. Has seperate `.R` scripts for each scenario which ran the model by calling functions from elsewhere in repository. +@kim_modelling_2021: Not met. Took alot of work to change model from for loop to function, to set all parameters as inputs (some were hard coded), and add conditional logic of scenarios when required. -@anagnostou_facs-charm_2022: Fully met. Can run model from command line. +@anagnostou_facs-charm_2022: Not applicable. No scenarios. -@johnson_cost_2021: Fully met. Model provided as a package (which is an R interface for the C++ model). +@johnson_cost_2021: Partially met. Has all base case scenarios, but not sensitivity analysis. -@hernandez_optimal_2015: Fully met. The model (python code) can be run from `main.py`. +@hernandez_optimal_2015: Not met. Took a while to figure out how to implement scenarios. -@wood_value_2021: Fully met. Provided as a single `.R` file which ran the model with a for loop. +@wood_value_2021: Fully met. **Reflections:** -* If you are presenting the results of a model, then provide the code for that model in a "runnable" format. -* This was an **uncommon** issue. +* Common issue +* Time consuming and tricky to resolve +* Tom: This is a headline. Also, links to importance of reproducible analytical pipelines (RAP) for simulation. ::: -::: {.callout-warning icon=false collapse=true} +::: {.callout-tip icon=false collapse=true} -## Model is designed to be run programmatically (i.e. can run model with different parameters without needing to change the model code) +## Ensure model parameters are correct | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| 🟡 | ❌ | 🟡 | ✅ | ✅ | ✅ | ❌ | ✅ | -@shoaib_simulation_2021: Not met. The model is set up as classes and run using a function. However, it is not designed to allow any variation in inputs. Everything uses default inputs, and it designed in such a way that - if you wish to vary model parameters - you need to directly change these in the script itself. +@shoaib_simulation_2021: Partially met. Script is set with parameters for base configuration 1, with the exception of number of replications. -@huang_optimizing_2019: Fully met. Model was set up as a function, with many of the required parameters already set as "changeable" inputs to that function. +@huang_optimizing_2019: Not met. The baseline model in the script did not match the baseline model (or any scenario) in the paper, so had to modify parameters. -@lim_staff_2020: Fully met. The model is created from a series of functions and run with a for loop that iterates through different parameters. As such, the model is able to be run programmatically (within that for loop, which varied e.g. staff per shift and so on and re-ran the model). +@lim_staff_2020: Partially met. The included parameters were corrected, but the baseline scenario included varying staff strength to 2, and the provided code only varied 4 and 6. I had to add some code that enabled it to run with staff strength 2 (as there were an error that occured if you tried to set that). -@kim_modelling_2021: Fully met. Each scenario is an R script which states different parameters and then calls functions to run model. +@kim_modelling_2021: Fully met. -@anagnostou_facs-charm_2022: Fully met. Change inputs in input `.csv` files. +@anagnostou_facs-charm_2022: Fully met. -@johnson_cost_2021: Fully met. Creates a list of `input` which are then used by a `run()` function. +@johnson_cost_2021: Fully met. Base case parameters all correct. -@hernandez_optimal_2015: Fully met. Model created from classes, which accept some inputs and can run the model. +@hernandez_optimal_2015: Not met. As agreed with the author, this is likely the primary reason for the discrepancy in these results - they are very close, and we see similar patterns, but not reproduced. Unfortunately, several parameters were wrong, and although we changed those we spotted, we anticipate there could be others we hadn't spotted that might explain the remaining discrepancies. -@wood_value_2021: Fully met. Changes inputs to run all scenarios from a single `.R` file. +@wood_value_2021: Fully met. **Reflections:** -* Design model so that you can re-run it with different parameters without needing to make changes to the model code itself. - * This allows you to run multiple versions of the model with the same script. - * It also reduces the likelihood of missing errors (e.g. if miss changing an input parameter somewhere, or input the wrong parameters and don't realise). -* This was an uncommon issue. -* Note, this just refers to the basic set-up, with items below like hard coding parameters also being very important in this context. +* At least provide a script that can run the baseline model as in the paper (even if not providing the scenarios) +* This can introduce difficulties - when some parameters are wrong, you rely on the paper to check which parameters are correct or not, but if the paper doesn't mention every single parameter (which is reasonably likely, as this includes those not varied by scenarios), then you aren't able to be sure that the model you are running is correct. +* This can make a really big difference, and be likely cause of managing to reproduce everything v.s. nothing, if it impacts all aspects of the results. +* Tom: I think this comes back to minimum verification as well. I think the "at least for one scenario" idea of yours is excellent. ::: -::: {.callout-warning icon=false collapse=true} +::: {.callout-note icon=false collapse=true} -## Don't hard code parameters that you will want to change for scenario analyses +## Control randomness | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | 🟡 | ❌ | ✅ | N/A | ✅ | 🟡 | ✅ | +| ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | -@shoaib_simulation_2021: Not met. Although some parameters sat "outside" of the model within the `main()` function (and hence were more "changeable", even if not "changeable" inputs to that function, but changed directly in script). However, many of the other parameters were hard-coded within the model itself. It took time to spot where these were and correctly adjust them to be modifiable inputs. +@shoaib_simulation_2021: Not met. The lack of seeds wasn't actually a barrier to the reproduction though due to the replication number. I later add seeds so my results could be reproduced, and found that the ease of setting seeds with salabim was a greater facilitator to the work. I only had to change one or two lines of code to then get consistent results between runs (unlike other simulation software like SimPy where you have to consider the use of seeds by different sampling functions). Moreover, by default, salabim would have set a seed (although overridden by original authors to enable them to run replications). -@huang_optimizing_2019: Partially met. Pretty much all of the parameters that we wanted to change were not hard coded and were instead inputs to the model function `simulate_nav()`. However, I did need to add an `exclusive_use` scenario which conditionally changed `ir_resources`, but that is the only exception. I also add `ed_triage` as a changeable input but didn't end up needing that to reproduce any results (was just part of troubleshooting). I also +@huang_optimizing_2019: Not met. It would have been beneficial to include seeds, as there was a fair amount of variability, so with seeds I could then I could be sure that my results do not differ from the original simply due to randomness. -@lim_staff_2020: Not met. Some parameters were not hard coded within the model, but lots of them were not. +@lim_staff_2020: Not met. The results obtained looked very similar to the original article, with minimal differences that I felt to be within the expected variation from the model stochasticity. However, if seeds had been present, we would have been able to say with certainty. I did not feel I needed to add seeds during the reproduction to get the same results. -@kim_modelling_2021: Fully met. All model parameters could be varied from "outside" the model code itself, as they were provided as changeable inputs to the model. +@kim_modelling_2021: Fully met. Included a seed, although I don't get identical results as I had to reduce number of people in simulation. -@anagnostou_facs-charm_2022: N/A as no scenarios. +@anagnostou_facs-charm_2022: Fully met. The authors included a random seed so the results I got were identical to the original (so no need for any subjectivity in deciding whether its similar enough, as I could perfectly reproduce). -@johnson_cost_2021: Fully met. All model parameters could be varied from "outside" the model code itself, as they were provided as changeable inputs to the model. +@johnson_cost_2021: Fully met. At start of script, authors `set.seed(333)`. -@hernandez_optimal_2015: Partially met. Did not hard code runs, population, generations, and percent pre-screened. However, did hard code other parameters like bi-objective v.s tri-objective model and bounding. Also, it was pretty tricky to change percent pre-screened, as it assumed you provided a `.txt` file for each %. +@hernandez_optimal_2015: Fully met. This ensured consistent results between runs of the script, which was really helpful. -@wood_value_2021: Fully met. All model parameters for the scenarios/sensitivity analysis could be varied from "outside" the model code itself. +@wood_value_2021: Fully met. Sets seed based on replication number. **Reflections:** -* It can be quite difficult to change parameters that are hard coded into the model. Ideally, all the parameters that a user might want to change should be easily changeable and not hard coded. -* This is a relatively common issue. -* There is overlap between this and whether the code for scenarios is provided (as typically, the code for scenario is conditionally changing parameter values, although this can be facilitated by not hard coding the parameters, so you call need to change the values from "outside" the model code, rather than making changes to the model functions themselves). Hence, have included as two seperate reflections. -* Important to note that we evaluate this in the context of reproduction - and have not checked for hard-coded parameters outside the specified scenario analyses, but that someone may wish to alter if reusing the model for a different analysis/context/purpose. +* Depending on your model and the outputs/type of output you are looking at, the lack of seeds can have varying impacts on the appearance of your results, and can make the subjective judgement of whether results are consistent harder (if discrepancies could be attributed to not having consistent seeds or not). +* It can be really quite simple to include seeds. +* Over half of the studies did include seed control in their code. +* Tom: There seems little argument against doing this. worth noting that commerical software does this for you and possibly explains why authors didn't do this themselves if that was their background (lack of knowledge?). +* Tom: Note simpy is independent of any sampling mechanism. We could just use python's random module and set a single seed if needed (although you lose CRN) and we can setup our models so that we only need to set a single seed. +* Tom: A key part of STARS 2.0 for reproducibility ::: -::: {.callout-warning icon=false collapse=true} +### Outputs -## Use relative file paths +::: {.callout-tip icon=false collapse=true} + +## Include code to calculate all required model outputs | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ✅ | N/A | N/A | ✅ | ✅ | N/A | ✅ | ✅ | +| ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ | -@shoaib_simulation_2021: Fully met. Just provides file path, so file is saved into current/run directory. +@shoaib_simulation_2021: Not met. Had to add some outputs and calculations (e.g. proportion of childbirth cases referred, standard deviation) -@huang_optimizing_2019: Not applicable. All inputs defined within script. Outputs were not saved to file/s. +@huang_optimizing_2019: Not met. It has a complicated output (standardised density of patient in queue) that I was never certain on whether I correctly calculated. Although it outputs the columns required to calculate it, due its complexity, I feel this was not met, as it feels like a whole new output in its own right (and not just something simple like a mean). -@lim_staff_2020: Not applicable. All inputs defined within script. Outputs were not saved to file/s. +@lim_staff_2020: Not met. The model script provided was only set up to provide results from days 7, 14 and 21. The figures require daily results, so I needed to modify the code to output that. -@kim_modelling_2021: Fully met. Uses relative file paths for sourcing model and input parameters (gets current directory, then navigates from there). +@kim_modelling_2021: Not met. Had to write code to find aorta sizes of people with AAA-related deaths. -@anagnostou_facs-charm_2022: Fully met. Uses relative imports of local code files. +@anagnostou_facs-charm_2022: Fully met. Although worth noting this only had one scenario/version of model and one output to reproduce. -@johnson_cost_2021: Not applicable. All inputs defined within script. Outputs are not specifically saved to a file (just that the .md and image files were automatically saved when the .Rmd file was knit). EpicR is package import. +@johnson_cost_2021: Note met. It has an output that is in "per 1000" and, although it outputs the columns required to calculate this, I found it very tricky to work out which columns to use and how to transform them to get this output, and so feel this is not met (as feels like a seperate output, and not something simple like a mean, and as it felt so tricky to work out). -@hernandez_optimal_2015: Fully met. Creates folder in current working directory based on date/time to store results. +@hernandez_optimal_2015: Fully met. -@wood_value_2021: Fully met. Although I then changed things a bit as reorganised repository and prefer not to work with `setwd()`, these were set up in such a way that it would be really easy to correct file path, just by setting working directory at start of script. +@wood_value_2021: Fully met. **Reflections:** -* This was not an issue for any studies - but included to note this was a "facilitator", as would have needed to amend if they weren't (and Tom noted that this is a common problem that he runs into elsewhere). +* Calculate and provide all the outputs required +* Appreicate this can be a bit "ambiguous" (e.g. if its just plotting a mean or simple calculation, then didn't consider that here) (however, combined with other criteria, we do want them to provide code to calculate outputs, so we would want them to provide that anyway) +* Tom: This is a headline. I suspect we can find supporting citations elsewhere from other fields. Its a reporting guideline thing too, but in natural language things can get very ambiguous still! Would be good to make that point as well I think. ::: ::: {.callout-warning icon=false collapse=true} -## Avoid large amounts of code duplication +## Include code to generate the tables, figures and other reported results + +### Summary | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | +| ❌ | ❌ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | ✅ | -@shoaib_simulation_2021: Not met. The model often contained very similar blocks of code before or after warm-up. +### Provide code to process results into tables -@huang_optimizing_2019: Fully met. +| @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | +| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | +| ❌ | N/A | 🟡 | ❌ | N/A | ❌ | ❌ | ✅ | -@lim_staff_2020: Fully met. +@shoaib_simulation_2021: Not met. -@kim_modelling_2021: Not met. There was alot of duplication when running each scenario (e.g. repeated calls to `Eventsandcosts`, and repeatedly defining the same parameters). This meant, if changing a parameter that you want to be consistent between all the scripts (e.g. number of persons), you had to change each of the scripts one by one. +@huang_optimizing_2019: Not applicable. No tables in scope. -@anagnostou_facs-charm_2022: Fully met. +@lim_staff_2020: Partially met. It outputs the results in a similar structure to the paper (like a section of a table). However, it doesn't have the full code to produce a table outright, for any of the tables, so additional processing still required. -@johnson_cost_2021: Not met. There was alot of duplication when running each scenario. This meant, when amending these for the sensitivity analysis, I would need to change the same parameter 12 times within the script, and for changes to all, changing it 12 times in 14 duplicate scripts. Hence, it was simpler to write an R script to do this than change it directly, but for base case, I had to make sure I carefully changed everything in both files. +@kim_modelling_2021: Not met. Had to write code to generate tables, which included correctly implementing calculation of excess e.g. deaths, scaling to population size, and identify which columns provide the operation outcomes. -@hernandez_optimal_2015: Fully met. +@anagnostou_facs-charm_2022: Not applicable. No tables in scope. -@wood_value_2021: Fully met. +@johnson_cost_2021: Not met. Had to write code to generate tables, which took me a while as I got confused over thinks like which tables / columns / scenarios to use. -**Reflections:** Large amounts of code duplication are non-ideal as they can: +@hernandez_optimal_2015: Not met. -* Make code less readable -* Make it trickier to change universal parameters -* Increase the likelihood of introducing mistakes -* Make it trickier to set up scenarios/sensitivity analyses +@wood_value_2021: Fully met. -::: +**Reflections:** -::: {.callout-warning icon=false collapse=true} +* It can take a bit of time to do this processing, and it can be tricky/confusing to do correctly, so very handy for it to be provided. +* Common issue. -## Include sufficient comments in the code +### Provide code to process results into figures | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | ✅ | 🟡 | 🟡 | ❌ | 🟡 | ❌ | - -@shoaib_simulation_2021 and @huang_optimizing_2019: Not met. Would have benefitted from more comments, as it took some time to ensure I have correctly understood code, particularly if they used lots of abbreviations. +| ❌ | ❌ | ❌ | ❌ | ❌ | 🟡 | 🟡 | ✅ | -@lim_staff_2020: Fully met. There were lots of comments in the code (including doc-string-style comments at the start of functions) that aided understanding of how it worked. +@shoaib_simulation_2021: Not met. -@kim_modelling_2021: Partially met. Didn't have any particular issues in working out the code. There are sufficient comments in the scenario scripts and at the start of the model scripts, although within the model scripts, there were sometimes quite dense sections of code that would likely benefit from some additional comments. +@huang_optimizing_2019: Not met. Had to write code from scratch. For one of the figures, it would have been handy if informed that plot was produced by a simmer function (as didn’t initially realise this). It also took a bit of time for me to work out how to transform the figure axes as this was not mentioned in the paper (and no code was provided for these). It was also unclear and a bit tricky to work out how to standardise the density in the figures (since it is only described in the text and no formula/calculations are provided there or in the code). -@anagnostou_facs-charm_2022: Partially met. Didn't have to delve into the code much, so can't speak from experience as to whether the comments were sufficient. From looking through the model code, several scripts have lots of comments and docstrings for each function, but some do not. +@lim_staff_2020, @kim_modelling_2021 and @anagnostou_facs-charm_2022: Not met. However, the simplicity and repetition of the figures was handy. -@johnson_cost_2021: Not met. Very few comments in the `Case_Detection_Results...Rmd` files, which were the code files provided. +@johnson_cost_2021: Partially met. For Figure 3, most of the required code for the figure was provided, which was super helpful. However, this wasn't complete, and for all others figures, I had to start from scratch writing the code. -@hernandez_optimal_2015: Partially met. There are some comments and doc-strings, but not comprehensively. +@hernandez_optimal_2015: Partially met. Provides a few example `ggplot`s, but these are not all the plots, nor exactly matching article, nor including any of the pre-processing required before the plots, and so could only serve as a starting point (though that was still really handy). -@wood_value_2021: Not met. Very few comments, so for the small bit of the code that I did delve into, took a bit of working out what different variables referred to. +@wood_value_2021: Fully met. Figures match article, with one minor exception that I had to add smoothing to the lines on one of the figures. **Reflections:** -* With increasing code complexity, the inclusion of sufficient comments becomes increasingly important, as it can otherwise be quite time consuming to figure out how to fix and change sections of code -* Define abbreviations used within the code -* Good to have consistent comments and docstrings throughout (i.e. on all scripts, on not just some of them) -* Common issue -* Tom: I guess this one isn't strictly necessary for reproducibility. The main issue was that the studies required a fair bit of manual work to get them to reproduce the results from teh mixed issues you listed above. This is sort of a "failsafe option" for reproducibility or perhaps more relevant for reuse/adaptation. - -::: - -## Run time and memory usage - -::: {.callout-important icon=false collapse=true} +* It can take a bit of time to do this processing, particularly if the figure involves any transformations (and less so if the figure is simple), so very handy for it to be provided. +* Also, handy if the full code can be provided for all figures (although partial code is more helpful than none at all). +* Common issue. -## Quicker models are easier to work with +### Provide code to calculate in-text results -I have not evaluated like as a criteria, as a long run time is not inherently a bad thing. However, I definitely found that the run time of models had a big impact on how easy it was to reproduce results as longer run times meant it was tricky (or even impossible) to run in the first place, or tricky to re-run. +By "in-text results", I am referred to results that are mentioned in the text but not included in/cannot be deduced from any of the tables or figures. -The studies where I made adjustments were: +| @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | +| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | +| ❌ | ❌ | N/A | ❌ | N/A | N/A | N/A | N/A | -* @shoaib_simulation_2021: Add parallel processing and ran fewer replications -* @huang_optimizing_2019: No changes made. -* @lim_staff_2020: Add parallel processing -* @kim_modelling_2021: Reduced number of people in simulation, and switched from serial to the provided parallel option. -* @anagnostou_facs-charm_2022: Model was super quick which made it really easy to run and re-run each time -* @johnson_cost_2021: Experimented with using a fewer number of agents for troubleshooting (although ultimately had to run with full number to reproduce results), and ran the scripts in parallel by opening seperate terminals simultaneously. Note: Long run time also meant it took a longer time to do this reproduction - although we excluded computation time in our timings, it just meant e.g. when I made a mistake in coding of scenario analysis and had to re-run, I had to wait another day or two for that to finish before I could resume. -* @hernandez_optimal_2015: Add parallel processing, did not run one of the scenarios (it was very long, and hadn't managed to reproduce other parts of same figure regardless), and experimented with reducing parameters for evolutionary algorithm (but, in the end, ran with full parameters, though lower were helpful while working through and troubleshooting). -* @wood_value_2021: No changes made, but unlike other reproduction, didn't try to run at smaller amounts - just set it to run as-is over the weekend. +@shoaib_simulation_2021, @huang_optimizing_2019, @kim_modelling_2021: Not met. -In one of the studies, there was a minor error which needed fixing, which we anticipated to likely be present due to long run times meaning the model wasn't all run in sequence at the end. +@lim_staff_2020, @anagnostou_facs-charm_2022, @johnson_cost_2021, @hernandez_optimal_2015, @wood_value_2021: Not applicable (no in-text results). **Reflections:** -* Reduce model run time if possible as it makes it easier to work with, and facilitates doing full re-runs of all scenarios (which can be important with code changes, etc). - * Relatedly, it is good practice to re-run all scripts before finishing up, as then you can spot any errors like the one mentioned for @kim_modelling_2021 -* Common issue (to varying degrees - i.e. taking 20 minutes, up to taking several hours or even day/s). -* Tom: Long run times are inevitable for some models, but this does suggest that some extra work to build confidence the model is working is expected is beneficial, like one or a small set of verification scenarios that are quick to run. +* Provide code to calculate in-text results +* Universal issue, for those with in-text results not otherwise captured in tables and figures ::: -::: {.callout-important icon=false collapse=true} +## Recommendations to support troubleshooting and reuse + +### Design + +::: {.callout-warning icon=false collapse=true} -## For slow models, state the expected run time +## Separate model code from applications | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ✅ | ❌ | ❌ | ❌ | N/A | ✅ | 🟡 | 🟡 | +| ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -@shoaib_simulation_2021: Fully met. Run time stated in paper (but not repository). +@shoaib_simulation_2021: Fully met. Provided as a single `.py` file which ran model with function `main()`. -@huang_optimizing_2019: Not met. +@huang_optimizing_2019: Not met. The model code was provided within the code for a web application, but the paper was not focused on this application, and instead on specific model scenarios. I had to extract the model code and transform it into a format that was "runnable" as an R script/notebook. -@lim_staff_2020: Not met. +@lim_staff_2020: Fully met. Provided as a single `.py` file which ran the model with a for loop. -@kim_modelling_2021: Not met. A prior paper describing the model development mentions the run time, but not the current paper or repository, so this is easily missed. +@kim_modelling_2021: Fully met. Has seperate `.R` scripts for each scenario which ran the model by calling functions from elsewhere in repository. -@anagnostou_facs-charm_2022: Not applicable. Very quick! Seconds! So not particularly relevant - although, you could argue, potentially still important if there were some error that made it look like the model were running continuously (e.g. stuck in a loop) - although this is relatively unlikely. +@anagnostou_facs-charm_2022: Fully met. Can run model from command line. -@johnson_cost_2021: Fully met. In the README, they state the the run time with 100 million agents is 16 hours, which was very handy to know, as I then just got stuck in running with fewer agents while troubleshooting. +@johnson_cost_2021: Fully met. Model provided as a package (which is an R interface for the C++ model). -@hernandez_optimal_2015: Partially met. Some of the run times are mentioned in the paper, but not all, although this did help indicate that we would anticipate other s scenarios to similarly take hours to run. +@hernandez_optimal_2015: Fully met. The model (python code) can be run from `main.py`. -@wood_value_2021: Partially met. In the paper, they state that it takes less than five minutes for each scenario, but this feels like half the picture, given the total run time was 48 hours. +@wood_value_2021: Fully met. Provided as a single `.R` file which ran the model with a for loop. **Reflections:** -* For long models with no statement, it can take a while to realise that it's not an error in the code or anything, but actually just a long run time! And hard to know how long to expect, and whether it is without the capacities of your machine and so on. -* Ideally include statement of run time in repository as well as paper. -* Ideally include run time of all components of analysis (e.g. all scenarios). -* Common issue. -* Tom: This supports the inclusion of section 5.4 in the STRESS-DES guidelines - * Response: But think it is also important that this is in the repository itself, and not just the paper. +* If you are presenting the results of a model, then provide the code for that model in a "runnable" format. +* This was an **uncommon** issue. ::: -::: {.callout-important icon=false collapse=true} +::: {.callout-warning icon=false collapse=true} -## For computationally expensive models, state memory usage and provide alternatives for lower spec machines +## Avoid hard-coded parameters -As I felt this was relatively subjective, as depending on what I felt to be "computationally expensive", as I didn't record the memory usage of all models, it felt unfair to do this as a checklist, and so have just informally noted below: +Don't hard code parameters that you will want to change for scenario analyses -* @shoaib_simulation_2021, @huang_optimizing_2019, and @lim_staff_2020: Not applicable. Didn't find it to be too computationally expensive for my machine. -* @kim_modelling_2021: Unable to run on my machine (`serial` took too long to run (would have to leave laptop on for many many hours which isn't feasible), and `parallel` was too computationally expensive and crashed the machine (with the original number of people)). This is not mentioned in the repository or paper, but only referred to in a prior publication. Would've been handy if it included suggestions like reducing number of people and so on (which is what I had to do to feasibly run it). -* @anagnostou_facs-charm_2022: Not applicable. Runs in seconds. -* @johnson_cost_2021: It becomes more computationally expensive if try to run lots at once in simultaneous terminals. Didn't try running one on local machine with full parameter due to long run time making it infeasible, but knowing my system specs, it should have been able to if did. -* @hernandez_optimal_2015: This had long run times but I don't know if it was computationally expensive or not - I just know that I didn't run into any issues (but I didn't record memory usage, so its possible a lower-specced machine might). -* @wood_value_2021: Not applicable. As stated in their prior paper, the model is constrained by processing time, not computer memory. +| @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | +| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | +| ❌ | 🟡 | ❌ | ✅ | N/A | ✅ | 🟡 | ✅ | + +@shoaib_simulation_2021: Not met. Although some parameters sat "outside" of the model within the `main()` function (and hence were more "changeable", even if not "changeable" inputs to that function, but changed directly in script). However, many of the other parameters were hard-coded within the model itself. It took time to spot where these were and correctly adjust them to be modifiable inputs. + +@huang_optimizing_2019: Partially met. Pretty much all of the parameters that we wanted to change were not hard coded and were instead inputs to the model function `simulate_nav()`. However, I did need to add an `exclusive_use` scenario which conditionally changed `ir_resources`, but that is the only exception. I also add `ed_triage` as a changeable input but didn't end up needing that to reproduce any results (was just part of troubleshooting). I also + +@lim_staff_2020: Not met. Some parameters were not hard coded within the model, but lots of them were not. + +@kim_modelling_2021: Fully met. All model parameters could be varied from "outside" the model code itself, as they were provided as changeable inputs to the model. + +@anagnostou_facs-charm_2022: N/A as no scenarios. + +@johnson_cost_2021: Fully met. All model parameters could be varied from "outside" the model code itself, as they were provided as changeable inputs to the model. + +@hernandez_optimal_2015: Partially met. Did not hard code runs, population, generations, and percent pre-screened. However, did hard code other parameters like bi-objective v.s tri-objective model and bounding. Also, it was pretty tricky to change percent pre-screened, as it assumed you provided a `.txt` file for each %. + +@wood_value_2021: Fully met. All model parameters for the scenarios/sensitivity analysis could be varied from "outside" the model code itself. **Reflections:** -* Some models are so computationally expensive that it may be simply impossible to run it a feasible length of time without a high powered machine. -* Handy to mention memory requirements so someone with lower spec machine can ensure they would be able to run it. -* If a model is computationally expensive, it would be good to provide suggested alternatives that allow it to be run on lower spec machines -* Not a common problem - only relevant to computationally expensive models -* Tom: Agree it makes sense to report this, and is captured in reporting guidelines like STRESS-DES. +* It can be quite difficult to change parameters that are hard coded into the model. Ideally, all the parameters that a user might want to change should be easily changeable and not hard coded. +* This is a relatively common issue. +* There is overlap between this and whether the code for scenarios is provided (as typically, the code for scenario is conditionally changing parameter values, although this can be facilitated by not hard coding the parameters, so you call need to change the values from "outside" the model code, rather than making changes to the model functions themselves). Hence, have included as two seperate reflections. +* Important to note that we evaluate this in the context of reproduction - and have not checked for hard-coded parameters outside the specified scenario analyses, but that someone may wish to alter if reusing the model for a different analysis/context/purpose. ::: -## Parameters, scenarios and outputs - -::: {.callout-tip icon=false collapse=true} +::: {.callout-warning icon=false collapse=true} -## Provide code for all scenarios +## Minimise code duplication | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | ❌ | ❌ | N/A | 🟡 | ❌ | ✅ | +| ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | -@shoaib_simulation_2021: Not met. There were several instances where it took quite a while to understand how and where to modify the code in order to run scenarios (e.g. no arrivals, transferring admin work, reducing doctor intervention in deliveries). +@shoaib_simulation_2021: Not met. The model often contained very similar blocks of code before or after warm-up. -@huang_optimizing_2019: Not met. Set up a notebook to programmatically run the model scenarios. It took alot of work to modify and write code that could run the scenarios, and I often made mistakes in my interpretation for the implementation of scenarios, which could be avoided if code for those scenarios was provided. +@huang_optimizing_2019: Fully met. -@lim_staff_2020: Not met. Several parameters or scenarios were not incorporated in the code, and had to be added (e.g. with conditional logic to skip or change code run, removing hard-coding, adding parameters to existing). +@lim_staff_2020: Fully met. -@kim_modelling_2021: Not met. Took alot of work to change model from for loop to function, to set all parameters as inputs (some were hard coded), and add conditional logic of scenarios when required. +@kim_modelling_2021: Not met. There was alot of duplication when running each scenario (e.g. repeated calls to `Eventsandcosts`, and repeatedly defining the same parameters). This meant, if changing a parameter that you want to be consistent between all the scripts (e.g. number of persons), you had to change each of the scripts one by one. -@anagnostou_facs-charm_2022: Not applicable. No scenarios. +@anagnostou_facs-charm_2022: Fully met. -@johnson_cost_2021: Partially met. Has all base case scenarios, but not sensitivity analysis. +@johnson_cost_2021: Not met. There was alot of duplication when running each scenario. This meant, when amending these for the sensitivity analysis, I would need to change the same parameter 12 times within the script, and for changes to all, changing it 12 times in 14 duplicate scripts. Hence, it was simpler to write an R script to do this than change it directly, but for base case, I had to make sure I carefully changed everything in both files. -@hernandez_optimal_2015: Not met. Took a while to figure out how to implement scenarios. +@hernandez_optimal_2015: Fully met. @wood_value_2021: Fully met. -**Reflections:** +**Reflections:** Large amounts of code duplication are non-ideal as they can: -* Common issue -* Time consuming and tricky to resolve -* Tom: This is a headline. Also, links to importance of reproducible analytical pipelines (RAP) for simulation. +* Make code less readable +* Make it trickier to change universal parameters +* Increase the likelihood of introducing mistakes +* Make it trickier to set up scenarios/sensitivity analyses ::: -::: {.callout-tip icon=false collapse=true} +### Clarity -## All the required outputs are calculated/provided +::: {.callout-warning icon=false collapse=true} + +## Comment sufficiently | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ | - -@shoaib_simulation_2021: Not met. Had to add some outputs and calculations (e.g. proportion of childbirth cases referred, standard deviation) +| ❌ | ❌ | ✅ | 🟡 | 🟡 | ❌ | 🟡 | ❌ | -@huang_optimizing_2019: Not met. It has a complicated output (standardised density of patient in queue) that I was never certain on whether I correctly calculated. Although it outputs the columns required to calculate it, due its complexity, I feel this was not met, as it feels like a whole new output in its own right (and not just something simple like a mean). +@shoaib_simulation_2021 and @huang_optimizing_2019: Not met. Would have benefitted from more comments, as it took some time to ensure I have correctly understood code, particularly if they used lots of abbreviations. -@lim_staff_2020: Not met. The model script provided was only set up to provide results from days 7, 14 and 21. The figures require daily results, so I needed to modify the code to output that. +@lim_staff_2020: Fully met. There were lots of comments in the code (including doc-string-style comments at the start of functions) that aided understanding of how it worked. -@kim_modelling_2021: Not met. Had to write code to find aorta sizes of people with AAA-related deaths. +@kim_modelling_2021: Partially met. Didn't have any particular issues in working out the code. There are sufficient comments in the scenario scripts and at the start of the model scripts, although within the model scripts, there were sometimes quite dense sections of code that would likely benefit from some additional comments. -@anagnostou_facs-charm_2022: Fully met. Although worth noting this only had one scenario/version of model and one output to reproduce. +@anagnostou_facs-charm_2022: Partially met. Didn't have to delve into the code much, so can't speak from experience as to whether the comments were sufficient. From looking through the model code, several scripts have lots of comments and docstrings for each function, but some do not. -@johnson_cost_2021: Note met. It has an output that is in "per 1000" and, although it outputs the columns required to calculate this, I found it very tricky to work out which columns to use and how to transform them to get this output, and so feel this is not met (as feels like a seperate output, and not something simple like a mean, and as it felt so tricky to work out). +@johnson_cost_2021: Not met. Very few comments in the `Case_Detection_Results...Rmd` files, which were the code files provided. -@hernandez_optimal_2015: Fully met. +@hernandez_optimal_2015: Partially met. There are some comments and doc-strings, but not comprehensively. -@wood_value_2021: Fully met. +@wood_value_2021: Not met. Very few comments, so for the small bit of the code that I did delve into, took a bit of working out what different variables referred to. **Reflections:** -* Calculate and provide all the outputs required -* Appreicate this can be a bit "ambiguous" (e.g. if its just plotting a mean or simple calculation, then didn't consider that here) (however, combined with other criteria, we do want them to provide code to calculate outputs, so we would want them to provide that anyway) -* Tom: This is a headline. I suspect we can find supporting citations elsewhere from other fields. Its a reporting guideline thing too, but in natural language things can get very ambiguous still! Would be good to make that point as well I think. +* With increasing code complexity, the inclusion of sufficient comments becomes increasingly important, as it can otherwise be quite time consuming to figure out how to fix and change sections of code +* Define abbreviations used within the code +* Good to have consistent comments and docstrings throughout (i.e. on all scripts, on not just some of them) +* Common issue +* Tom: I guess this one isn't strictly necessary for reproducibility. The main issue was that the studies required a fair bit of manual work to get them to reproduce the results from teh mixed issues you listed above. This is sort of a "failsafe option" for reproducibility or perhaps more relevant for reuse/adaptation. ::: -::: {.callout-tip icon=false collapse=true} +::: {.callout-caution icon=false collapse=true} -## Include correct parameters in the script (even if just for one scenario) +## Ensure clarity and consistency in the model results tables | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| 🟡 | ❌ | 🟡 | ✅ | ✅ | ✅ | ❌ | ✅ | - -@shoaib_simulation_2021: Partially met. Script is set with parameters for base configuration 1, with the exception of number of replications. - -@huang_optimizing_2019: Not met. The baseline model in the script did not match the baseline model (or any scenario) in the paper, so had to modify parameters. +| ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | -@lim_staff_2020: Partially met. The included parameters were corrected, but the baseline scenario included varying staff strength to 2, and the provided code only varied 4 and 6. I had to add some code that enabled it to run with staff strength 2 (as there were an error that occured if you tried to set that). +@shoaib_simulation_2021: Not met. There were two alternative results spreadsheets with some duplicate metrics but sometimes differing results between them, which made it a bit confusing to work out what to use. -@kim_modelling_2021: Fully met. +@huang_optimizing_2019, @lim_staff_2020, and @anagnostou_facs-charm_2022: Fully met. Didn't experience issues interpreting the contents of the output table/s. -@anagnostou_facs-charm_2022: Fully met. +@kim_modelling_2021: Not met. It took me a little while to work out what surgery columns I needed, and to realise I needed to combine two of them. This required looking at what inputs genreated this, and referring to a input data dictionary. -@johnson_cost_2021: Fully met. Base case parameters all correct. +@johnson_cost_2021: Not met. I had mistakes and confusion figuring out which results tables I needed, which columns to use, and which scenarios to use from the tables. -@hernandez_optimal_2015: Not met. As agreed with the author, this is likely the primary reason for the discrepancy in these results - they are very close, and we see similar patterns, but not reproduced. Unfortunately, several parameters were wrong, and although we changed those we spotted, we anticipate there could be others we hadn't spotted that might explain the remaining discrepancies. +@hernandez_optimal_2015: Fully met. Straightforward with key information provided. -@wood_value_2021: Fully met. +@wood_value_2021: Fully met. I didn't need to work with the output tables, but from looking at them now, they make sense. **Reflections:** -* At least provide a script that can run the baseline model as in the paper (even if not providing the scenarios) -* This can introduce difficulties - when some parameters are wrong, you rely on the paper to check which parameters are correct or not, but if the paper doesn't mention every single parameter (which is reasonably likely, as this includes those not varied by scenarios), then you aren't able to be sure that the model you are running is correct. -* This can make a really big difference, and be likely cause of managing to reproduce everything v.s. nothing, if it impacts all aspects of the results. -* Tom: I think this comes back to minimum verification as well. I think the "at least for one scenario" idea of yours is excellent. +* Don't provide alternative results for the same metrics +* Make it clear what each colum/category in the results table means, if it might not be immediately clear. +* Make differences between seperate results tables clear. +* Tom: In a RAP for simulation world we have a env + model + script that gets you to the exact results table you see in the paper and this isn't a problem (although more time consuiming to setup). ::: -::: {.callout-tip icon=false collapse=true} +::: {.callout-important icon=false collapse=true} -## Provide all the required parameters +## Include run instructions | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | +| ❌ | ❌ | ❌ | 🟡 | ✅ | ✅ | ❌ | ❌ | -@shoaib_simulation_2021: Some parameters that could not be calculated were not provided - ie. what consultation boundaries to use when mean length of doctor consultation was 2.5 minutes +@shoaib_simulation_2021: Not met. No instructions, although is just a single script that you run. -@huang_optimizing_2019: Not met. In this case, patient arrivals and resource numbers were listed in the paper, and there were several discprepancies between this and the provided code. However, for many of the model parameters like length of appointment, these were not mentioned in the paper, and so it was not possible to confirm whether or not those were correct. Hence, marked as not met, as the presence of discrepenancies for several other parameters puts these into doubt. +@huang_optimizing_2019: Not met. Not provided in runnable form but, regardless, no instructions for running it as it is provided (as a web application - i.e. no info on how to get that running). -@lim_staff_2020: Not met. For Figure 5, had to guess the value for `staff_per_shift`. +@lim_staff_2020: Not met. No instructions, although is just a single script that you run. -@kim_modelling_2021: Fully met. +@kim_modelling_2021: Partially met. README tells you which folder has the scripts you need, although nothing further. Although all you need to do is run them. -@anagnostou_facs-charm_2022: Fully met. +@anagnostou_facs-charm_2022: Fully met. Clear README with instructions on how to run the model was really helpful. -@johnson_cost_2021: Fully met. Could determine appropriate parameters for sensitivity analysis from figures in article. +@johnson_cost_2021: Fully met. README has mini description of model and clear instructions on how to install and run the model. -@hernandez_optimal_2015: Not met. The results have a large impact by the bounding set, but this was not mentioned in the paper or repository, and required me looking at the numbers in results and GitHub commit history to estimate the appropriate bounds to use. +@hernandez_optimal_2015: Not met. -@wood_value_2021: Fully met. +@wood_value_2021: Not met. No instructions, although it was fairly self explanatory (single script `master.R` to run, then processing scripts named after items in article e.g. `fig7.R`). **Reflections:** -* Provide all required parameters -* Tom: Evidence to support STRESS-DES 3.3 +* Even if as simple as running a script, include instructions on how to do so +* In simpler projects (e.g. single script), this can be less of a problem. +* Common issue +* Tom: Evidence for STARS essential component of minimum documentation. ::: -::: {.callout-tip icon=false collapse=true} +::: {.callout-important icon=false collapse=true} -## If not provided in the script, then *clearly* present all parameters in the paper +## State run times and machine specifications | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | 🟡 | N/A | N/A | ✅ | 🟡 | N/A | +| 🟡 | ❌ | 🟡 | ❌ | ❌ | 🟡 | 🟡 | 🟡 | -@shoaib_simulation_2021: Not met. Although there was a scenario table, this did not include all the parameters I would need to change. It was more challenging to identify parameters that were only described in the body of the article. There were also some discrepancies in parameters between the main text of the article, and the tables and figures. Some scenarios were quite ambiguous/unclear from their description in the text, and I initially misunderstood the required parameters for the scenarios. +### State run time -@huang_optimizing_2019: Not met. As described above, paper didn't adequately describe all parameters. +| @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | +| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | +| ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | 🟡 | 🟡 | -@lim_staff_2020: Partially met. Nearly all parameters are in the paper table, and others are described in the article. However, didn't provide information for the `staff_per_shift` for Figure 5. +@shoaib_simulation_2021: Fully met. Run time stated in paper (but not repository). -@kim_modelling_2021 and @anagnostou_facs-charm_2022: Not applicable. All provided. +@huang_optimizing_2019: Not met. -@johnson_cost_2021: Fully met. All parameters clearly in the two figures presenting the sensitivity analysis, and didn't have to look elsewhere beyond that. +@lim_staff_2020: Not met. -@hernandez_optimal_2015: Most parameters are relatively easily identified from the text or figure legends (though would be easier if provided in a table or similar). Parameter for bounding was not provided in paper. +@kim_modelling_2021: Not met. A prior paper describing the model development mentions the run time, but not the current paper or repository, so this is easily missed. -@wood_value_2021: Not applicable. All provided. +@anagnostou_facs-charm_2022: Not met! Although it only took seconds, you could argue that stating this is still important if there were some error that made it look like the model were running continuously (e.g. stuck in a loop) - and as it helps someone identify that they are able to run it on their machine -**Reflections:** +@johnson_cost_2021: Fully met. In the README, they state the the run time with 100 million agents is 16 hours, which was very handy to know, as I then just got stuck in running with fewer agents while troubleshooting. -* Provide parameters in a table (including for each scenario), as it can be difficult/ambiguous to interpret them from the text, and hard to spot them too. - * Tom: The ambiguity of the natural language for scenarios was an important finding. -* Be sure to mention every parameter that gets changed (e.g. for @lim_staff_2020, as there wasn't a default `staff_per_shift` across all scenarios, but not stated for the scenario, had to guess it). -* Tom: Evidence to support STRESS-DES 3.3 +@hernandez_optimal_2015: Partially met. Some of the run times are mentioned in the paper, but not all, although this did help indicate that we would anticipate other s scenarios to similarly take hours to run. -::: +@wood_value_2021: Partially met. In the paper, they state that it takes less than five minutes for each scenario, but this feels like half the picture, given the total run time was 48 hours. -::: {.callout-tip icon=false collapse=true} +**Reflections:** -## If will need to process parameters, provide required calculations +* For long models with no statement, it can take a while to realise that it's not an error in the code or anything, but actually just a long run time! And hard to know how long to expect, and whether it is without the capacities of your machine and so on. +* Ideally include statement of run time in repository as well as paper. +* Ideally include run time of all components of analysis (e.g. all scenarios). +* Common issue. +* Tom: This supports the inclusion of section 5.4 in the STRESS-DES guidelines + * Response: But think it is also important that this is in the repository itself, and not just the paper. + +### State machine specification | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ✅ | N/A | N/A | N/A | N/A | N/A | N/A | +| ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | -@shoaib_simulation_2021: Not met. It was unclear how to estimate inter-arrival time. +Regarding the one that did describe it: -@huang_optimizing_2019: Fully met. The calculations for inter-arrival times were provided in the code, and the inputs to the code were the number of arrivals, as reported in the paper, and so making it easy to compare those parameters and check if numbers were correct or not. +* @lim_staff_2020: Describe in article ("desktop computer (Intel Core i5 3.5 GHz, 8 GB RAM)") -@lim_staff_2020: Not applicable. The parameter not provided is not one that you would calculate. +### Computationally expensive models -@kim_modelling_2021 and @anagnostou_facs-charm_2022: Not applicable. All provided. +Regarding whether models were computationally expensive... -@johnson_cost_2021: Not applicable. No processing of parameters required. +* @shoaib_simulation_2021, @huang_optimizing_2019, and @lim_staff_2020, @anagnostou_facs-charm_2022: No issues +* @kim_modelling_2021: Unable to run on my machine (`serial` took too long to run (would have to leave laptop on for many many hours which isn't feasible), and `parallel` was too computationally expensive and crashed the machine (with the original number of people)). This is not mentioned in the repository or paper, but only referred to in a prior publication. Would've been handy if it included suggestions like reducing number of people and so on (which is what I had to do to feasibly run it). +* @johnson_cost_2021: It becomes more computationally expensive if try to run lots at once in simultaneous terminals. Didn't try running one on local machine with full parameter due to long run time making it infeasible, but knowing my system specs, it should have been able to if did. +* @hernandez_optimal_2015: This had long run times but I don't know if it was computationally expensive or not - I just know that I didn't run into any issues (but I didn't record memory usage, so its possible a lower-specced machine might). +* @wood_value_2021: Not applicable. As stated in their prior paper, the model is constrained by processing time, not computer memory. -@hernandez_optimal_2015: Not applicable. +**Reflections:** -@wood_value_2021: Not applicable. All provided. +* Some models are so computationally expensive that it may be simply impossible to run it a feasible length of time without a high powered machine. +* Handy to mention memory requirements so someone with lower spec machine can ensure they would be able to run it. +* If a model is computationally expensive, it would be good to provide suggested alternatives that allow it to be run on lower spec machines +* Not a common problem - only relevant to computationally expensive models +* Tom: Agree it makes sense to report this, and is captured in reporting guidelines like STRESS-DES. + +::: + +### Functionality + +::: {.callout-important icon=false collapse=true} + +## Optimise model run time + +The run time of models had a big impact on how easy it was to reproduce results as longer run times meant it was tricky (or even impossible) to run in the first place, or tricky to re-run. The studies where I made adjustments were: + +* @shoaib_simulation_2021: Add parallel processing and ran fewer replications +* @huang_optimizing_2019: No changes made. +* @lim_staff_2020: Add parallel processing +* @kim_modelling_2021: Reduced number of people in simulation, and switched from serial to the provided parallel option. +* @anagnostou_facs-charm_2022: Model was super quick which made it really easy to run and re-run each time +* @johnson_cost_2021: Experimented with using a fewer number of agents for troubleshooting (although ultimately had to run with full number to reproduce results), and ran the scripts in parallel by opening seperate terminals simultaneously. Note: Long run time also meant it took a longer time to do this reproduction - although we excluded computation time in our timings, it just meant e.g. when I made a mistake in coding of scenario analysis and had to re-run, I had to wait another day or two for that to finish before I could resume. +* @hernandez_optimal_2015: Add parallel processing, did not run one of the scenarios (it was very long, and hadn't managed to reproduce other parts of same figure regardless), and experimented with reducing parameters for evolutionary algorithm (but, in the end, ran with full parameters, though lower were helpful while working through and troubleshooting). +* @wood_value_2021: No changes made, but unlike other reproduction, didn't try to run at smaller amounts - just set it to run as-is over the weekend. + +In one of the studies, there was a minor error which needed fixing, which we anticipated to likely be present due to long run times meaning the model wasn't all run in sequence at the end. **Reflections:** -* If you are going to be mentioning the "pre-processed" values at all, then its important to include the calculation (ideally in the code, as that is the clearest demonstration of exactly what you did) -* Tom: This is a very good point for RAP. +* Reduce model run time if possible as it makes it easier to work with, and facilitates doing full re-runs of all scenarios (which can be important with code changes, etc). + * Relatedly, it is good practice to re-run all scripts before finishing up, as then you can spot any errors like the one mentioned for @kim_modelling_2021 +* Common issue (to varying degrees - i.e. taking 20 minutes, up to taking several hours or even day/s). +* Tom: Long run times are inevitable for some models, but this does suggest that some extra work to build confidence the model is working is expected is beneficial, like one or a small set of verification scenarios that are quick to run. ::: -## Output format - ::: {.callout-caution icon=false collapse=true} ## Saves output to a file @@ -654,36 +712,21 @@ As I felt this was relatively subjective, as depending on what I felt to be "com ::: {.callout-caution icon=false collapse=true} -## Understandable output tables +## Avoid excessive output files | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | - -@shoaib_simulation_2021: Not met. There were two alternative results spreadsheets with some duplicate metrics but sometimes differing results between them, which made it a bit confusing to work out what to use. - -@huang_optimizing_2019, @lim_staff_2020, and @anagnostou_facs-charm_2022: Fully met. Didn't experience issues interpreting the contents of the output table/s. - -@kim_modelling_2021: Not met. It took me a little while to work out what surgery columns I needed, and to realise I needed to combine two of them. This required looking at what inputs genreated this, and referring to a input data dictionary. - -@johnson_cost_2021: Not met. I had mistakes and confusion figuring out which results tables I needed, which columns to use, and which scenarios to use from the tables. - -@hernandez_optimal_2015: Fully met. Straightforward with key information provided. +| N/A | N/A | N/A | N/A | N/A | N/A | ❌ | N/A | -@wood_value_2021: Fully met. I didn't need to work with the output tables, but from looking at them now, they make sense. +For @hernandez_optimal_2015 the default behaviour of the script was to output lots of files from each round (so you could easily have 90, 100, 200+ files), which were then not used in analysis (as it just depended on an aggregate results file). Although these individual files might be useful during quality control, as a default behaviour of the script, it could easily make the repository quite busy/littered with files. -**Reflections:** - -* Don't provide alternative results for the same metrics -* Make it clear what each colum/category in the results table means, if it might not be immediately clear. -* Make differences between seperate results tables clear. -* Tom: In a RAP for simulation world we have a env + model + script that gets you to the exact results table you see in the paper and this isn't a problem (although more time consuiming to setup). +* Tom reflected that this is more of a general housekeeping issue. He agrees and says its ok to do this, but perhaps they need a run mode that does not produce these verification files ::: ::: {.callout-caution icon=false collapse=true} -## Avoid large file sizes if possible +## Address large file sizes I have not evaluated like as a criteria, as a large file size is not inherently a bad thing, and might be difficult to avoid. However, when files are very large, this can make things trickier, such as with requiring compression and use of GitHub Large File Storage (LFS) for tracking, which has limits on the free tier. @@ -707,164 +750,179 @@ Regarding file sizes in each study: ::: -## Seeds +## Other recommendations ::: {.callout-note icon=false collapse=true} -## Use seeds to control stochasticity +## Be aware of potential system dependencies + +There can also be system dependencies, which will vary between systems, and may not be obvious if researchers already have these installed. We identified these when setting up the docker environments (which act like "fresh installs"): + +* @shoaib_simulation_2021, @lim_staff_2020, @anagnostou_facs-charm_2022 - no dependencies +* @huang_optimizing_2019, @kim_modelling_2021, @johnson_cost_2021 and @wood_value_2021 - libcurl4-openssl-dev, libssl-dev, libxml2-dev, libglpk-dev, libicu-dev - as well as `tk` for Johnson et al. 2021 +* @hernandez_optimal_2015 - wget, build-essential, libssl-dev, libffi-dev, libbz2-dev, libreadline-dev, libsqlite3-dev, zlib1g-dev, libncurses5-dev, libgdbm-de, libnss3-dev, tk-dev, liblzma-dev, libsqlite3-dev, lzma, ca-certificates, curl, git + +Although it would be unreasonable for authors to be aware of and list all system dependencies, given they may not be aware of them, this does show the benefit of creating something like docker in identifying them and making note of them within the docker files. + +This issue was specific to (a) R studies, and (b) the study with an unsupported version of Python that required building it from source in the docker file. + +::: + +::: {.callout-warning icon=false collapse=true} + +## Model is designed to be run programmatically (i.e. can run model with different parameters without needing to change the model code) | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | +| ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -@shoaib_simulation_2021: Not met. The lack of seeds wasn't actually a barrier to the reproduction though due to the replication number. I later add seeds so my results could be reproduced, and found that the ease of setting seeds with salabim was a greater facilitator to the work. I only had to change one or two lines of code to then get consistent results between runs (unlike other simulation software like SimPy where you have to consider the use of seeds by different sampling functions). Moreover, by default, salabim would have set a seed (although overridden by original authors to enable them to run replications). +@shoaib_simulation_2021: Not met. The model is set up as classes and run using a function. However, it is not designed to allow any variation in inputs. Everything uses default inputs, and it designed in such a way that - if you wish to vary model parameters - you need to directly change these in the script itself. -@huang_optimizing_2019: Not met. It would have been beneficial to include seeds, as there was a fair amount of variability, so with seeds I could then I could be sure that my results do not differ from the original simply due to randomness. +@huang_optimizing_2019: Fully met. Model was set up as a function, with many of the required parameters already set as "changeable" inputs to that function. -@lim_staff_2020: Not met. The results obtained looked very similar to the original article, with minimal differences that I felt to be within the expected variation from the model stochasticity. However, if seeds had been present, we would have been able to say with certainty. I did not feel I needed to add seeds during the reproduction to get the same results. +@lim_staff_2020: Fully met. The model is created from a series of functions and run with a for loop that iterates through different parameters. As such, the model is able to be run programmatically (within that for loop, which varied e.g. staff per shift and so on and re-ran the model). -@kim_modelling_2021: Fully met. Included a seed, although I don't get identical results as I had to reduce number of people in simulation. +@kim_modelling_2021: Fully met. Each scenario is an R script which states different parameters and then calls functions to run model. -@anagnostou_facs-charm_2022: Fully met. The authors included a random seed so the results I got were identical to the original (so no need for any subjectivity in deciding whether its similar enough, as I could perfectly reproduce). +@anagnostou_facs-charm_2022: Fully met. Change inputs in input `.csv` files. -@johnson_cost_2021: Fully met. At start of script, authors `set.seed(333)`. +@johnson_cost_2021: Fully met. Creates a list of `input` which are then used by a `run()` function. -@hernandez_optimal_2015: Fully met. This ensured consistent results between runs of the script, which was really helpful. +@hernandez_optimal_2015: Fully met. Model created from classes, which accept some inputs and can run the model. -@wood_value_2021: Fully met. Sets seed based on replication number. +@wood_value_2021: Fully met. Changes inputs to run all scenarios from a single `.R` file. **Reflections:** -* Depending on your model and the outputs/type of output you are looking at, the lack of seeds can have varying impacts on the appearance of your results, and can make the subjective judgement of whether results are consistent harder (if discrepancies could be attributed to not having consistent seeds or not). -* It can be really quite simple to include seeds. -* Over half of the studies did include seed control in their code. -* Tom: There seems little argument against doing this. worth noting that commerical software does this for you and possibly explains why authors didn't do this themselves if that was their background (lack of knowledge?). -* Tom: Note simpy is independent of any sampling mechanism. We could just use python's random module and set a single seed if needed (although you lose CRN) and we can setup our models so that we only need to set a single seed. -* Tom: A key part of STARS 2.0 for reproducibility +* Design model so that you can re-run it with different parameters without needing to make changes to the model code itself. + * This allows you to run multiple versions of the model with the same script. + * It also reduces the likelihood of missing errors (e.g. if miss changing an input parameter somewhere, or input the wrong parameters and don't realise). +* This was an uncommon issue. +* Note, this just refers to the basic set-up, with items below like hard coding parameters also being very important in this context. ::: -## Code to produce article results - -This is a common problem across all item types, and a key part of STARS 2.0 for reproducibility. - ::: {.callout-warning icon=false collapse=true} -## Provide code to process results into tables +## Use relative file paths | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | N/A | 🟡 | ❌ | N/A | ❌ | ❌ | ✅ | +| ✅ | N/A | N/A | ✅ | ✅ | N/A | ✅ | ✅ | -@shoaib_simulation_2021: Not met. +@shoaib_simulation_2021: Fully met. Just provides file path, so file is saved into current/run directory. -@huang_optimizing_2019: Not applicable. No tables in scope. +@huang_optimizing_2019: Not applicable. All inputs defined within script. Outputs were not saved to file/s. -@lim_staff_2020: Partially met. It outputs the results in a similar structure to the paper (like a section of a table). However, it doesn't have the full code to produce a table outright, for any of the tables, so additional processing still required. +@lim_staff_2020: Not applicable. All inputs defined within script. Outputs were not saved to file/s. -@kim_modelling_2021: Not met. Had to write code to generate tables, which included correctly implementing calculation of excess e.g. deaths, scaling to population size, and identify which columns provide the operation outcomes. +@kim_modelling_2021: Fully met. Uses relative file paths for sourcing model and input parameters (gets current directory, then navigates from there). -@anagnostou_facs-charm_2022: Not applicable. No tables in scope. +@anagnostou_facs-charm_2022: Fully met. Uses relative imports of local code files. -@johnson_cost_2021: Not met. Had to write code to generate tables, which took me a while as I got confused over thinks like which tables / columns / scenarios to use. +@johnson_cost_2021: Not applicable. All inputs defined within script. Outputs are not specifically saved to a file (just that the .md and image files were automatically saved when the .Rmd file was knit). EpicR is package import. -@hernandez_optimal_2015: Not met. +@hernandez_optimal_2015: Fully met. Creates folder in current working directory based on date/time to store results. -@wood_value_2021: Fully met. +@wood_value_2021: Fully met. Although I then changed things a bit as reorganised repository and prefer not to work with `setwd()`, these were set up in such a way that it would be really easy to correct file path, just by setting working directory at start of script. **Reflections:** -* It can take a bit of time to do this processing, and it can be tricky/confusing to do correctly, so very handy for it to be provided. -* Common issue. +* This was not an issue for any studies - but included to note this was a "facilitator", as would have needed to amend if they weren't (and Tom noted that this is a common problem that he runs into elsewhere). ::: -::: {.callout-warning icon=false collapse=true} +::: {.callout-tip icon=false collapse=true} -## Provide code to process results into figures +## Provide all the required parameters | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | ❌ | ❌ | ❌ | 🟡 | 🟡 | ✅ | +| ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | -@shoaib_simulation_2021: Not met. +@shoaib_simulation_2021: Some parameters that could not be calculated were not provided - ie. what consultation boundaries to use when mean length of doctor consultation was 2.5 minutes -@huang_optimizing_2019: Not met. Had to write code from scratch. For one of the figures, it would have been handy if informed that plot was produced by a simmer function (as didn’t initially realise this). It also took a bit of time for me to work out how to transform the figure axes as this was not mentioned in the paper (and no code was provided for these). It was also unclear and a bit tricky to work out how to standardise the density in the figures (since it is only described in the text and no formula/calculations are provided there or in the code). +@huang_optimizing_2019: Not met. In this case, patient arrivals and resource numbers were listed in the paper, and there were several discprepancies between this and the provided code. However, for many of the model parameters like length of appointment, these were not mentioned in the paper, and so it was not possible to confirm whether or not those were correct. Hence, marked as not met, as the presence of discrepenancies for several other parameters puts these into doubt. -@lim_staff_2020, @kim_modelling_2021 and @anagnostou_facs-charm_2022: Not met. However, the simplicity and repetition of the figures was handy. +@lim_staff_2020: Not met. For Figure 5, had to guess the value for `staff_per_shift`. -@johnson_cost_2021: Partially met. For Figure 3, most of the required code for the figure was provided, which was super helpful. However, this wasn't complete, and for all others figures, I had to start from scratch writing the code. +@kim_modelling_2021: Fully met. -@hernandez_optimal_2015: Partially met. Provides a few example `ggplot`s, but these are not all the plots, nor exactly matching article, nor including any of the pre-processing required before the plots, and so could only serve as a starting point (though that was still really handy). +@anagnostou_facs-charm_2022: Fully met. -@wood_value_2021: Fully met. Figures match article, with one minor exception that I had to add smoothing to the lines on one of the figures. +@johnson_cost_2021: Fully met. Could determine appropriate parameters for sensitivity analysis from figures in article. + +@hernandez_optimal_2015: Not met. The results have a large impact by the bounding set, but this was not mentioned in the paper or repository, and required me looking at the numbers in results and GitHub commit history to estimate the appropriate bounds to use. + +@wood_value_2021: Fully met. **Reflections:** -* It can take a bit of time to do this processing, particularly if the figure involves any transformations (and less so if the figure is simple), so very handy for it to be provided. -* Also, handy if the full code can be provided for all figures (although partial code is more helpful than none at all). -* Common issue. +* Provide all required parameters +* Tom: Evidence to support STRESS-DES 3.3 ::: -::: {.callout-warning icon=false collapse=true} - -## Provide code to calculate in-text results +::: {.callout-tip icon=false collapse=true} -By "in-text results", I am referred to results that are mentioned in the text but not included in/cannot be deduced from any of the tables or figures. +## If not provided in the script, then *clearly* present all parameters in the paper | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | N/A | ❌ | N/A | N/A | N/A | N/A | +| ❌ | ❌ | 🟡 | N/A | N/A | ✅ | 🟡 | N/A | -@shoaib_simulation_2021, @huang_optimizing_2019, @kim_modelling_2021: Not met. +@shoaib_simulation_2021: Not met. Although there was a scenario table, this did not include all the parameters I would need to change. It was more challenging to identify parameters that were only described in the body of the article. There were also some discrepancies in parameters between the main text of the article, and the tables and figures. Some scenarios were quite ambiguous/unclear from their description in the text, and I initially misunderstood the required parameters for the scenarios. -@lim_staff_2020, @anagnostou_facs-charm_2022, @johnson_cost_2021, @hernandez_optimal_2015, @wood_value_2021: Not applicable (no in-text results). +@huang_optimizing_2019: Not met. As described above, paper didn't adequately describe all parameters. + +@lim_staff_2020: Partially met. Nearly all parameters are in the paper table, and others are described in the article. However, didn't provide information for the `staff_per_shift` for Figure 5. + +@kim_modelling_2021 and @anagnostou_facs-charm_2022: Not applicable. All provided. + +@johnson_cost_2021: Fully met. All parameters clearly in the two figures presenting the sensitivity analysis, and didn't have to look elsewhere beyond that. + +@hernandez_optimal_2015: Most parameters are relatively easily identified from the text or figure legends (though would be easier if provided in a table or similar). Parameter for bounding was not provided in paper. + +@wood_value_2021: Not applicable. All provided. **Reflections:** -* Provide code to calculate in-text results -* Universal issue, for those with in-text results not otherwise captured in tables and figures +* Provide parameters in a table (including for each scenario), as it can be difficult/ambiguous to interpret them from the text, and hard to spot them too. + * Tom: The ambiguity of the natural language for scenarios was an important finding. +* Be sure to mention every parameter that gets changed (e.g. for @lim_staff_2020, as there wasn't a default `staff_per_shift` across all scenarios, but not stated for the scenario, had to guess it). +* Tom: Evidence to support STRESS-DES 3.3 ::: -## Documentation - -::: {.callout-important icon=false collapse=true} +::: {.callout-tip icon=false collapse=true} -## Include instructions on how to run the model/script +## If will need to process parameters, provide required calculations | @shoaib_simulation_2021 | @huang_optimizing_2019 | @lim_staff_2020 | @kim_modelling_2021 | @anagnostou_facs-charm_2022 | @johnson_cost_2021 | @hernandez_optimal_2015 | @wood_value_2021 | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | -| ❌ | ❌ | ❌ | 🟡 | ✅ | ✅ | ❌ | ❌ | - -@shoaib_simulation_2021: Not met. No instructions, although is just a single script that you run. +| ❌ | ✅ | N/A | N/A | N/A | N/A | N/A | N/A | -@huang_optimizing_2019: Not met. Not provided in runnable form but, regardless, no instructions for running it as it is provided (as a web application - i.e. no info on how to get that running). +@shoaib_simulation_2021: Not met. It was unclear how to estimate inter-arrival time. -@lim_staff_2020: Not met. No instructions, although is just a single script that you run. +@huang_optimizing_2019: Fully met. The calculations for inter-arrival times were provided in the code, and the inputs to the code were the number of arrivals, as reported in the paper, and so making it easy to compare those parameters and check if numbers were correct or not. -@kim_modelling_2021: Partially met. README tells you which folder has the scripts you need, although nothing further. Although all you need to do is run them. +@lim_staff_2020: Not applicable. The parameter not provided is not one that you would calculate. -@anagnostou_facs-charm_2022: Fully met. Clear README with instructions on how to run the model was really helpful. +@kim_modelling_2021 and @anagnostou_facs-charm_2022: Not applicable. All provided. -@johnson_cost_2021: Fully met. README has mini description of model and clear instructions on how to install and run the model. +@johnson_cost_2021: Not applicable. No processing of parameters required. -@hernandez_optimal_2015: Not met. +@hernandez_optimal_2015: Not applicable. -@wood_value_2021: Not met. No instructions, although it was fairly self explanatory (single script `master.R` to run, then processing scripts named after items in article e.g. `fig7.R`). +@wood_value_2021: Not applicable. All provided. **Reflections:** -* Even if as simple as running a script, include instructions on how to do so -* In simpler projects (e.g. single script), this can be less of a problem. -* Common issue -* Tom: Evidence for STARS essential component of minimum documentation. +* If you are going to be mentioning the "pre-processed" values at all, then its important to include the calculation (ideally in the code, as that is the clearest demonstration of exactly what you did) +* Tom: This is a very good point for RAP. ::: -## Other - **Grid lines**. Include tick marks/grid lines on figures, so it is easier to read across and judge whether a result is above or below a certain Y value. **Data dictionaries**. @anagnostou_facs-charm_2022: Included data dictionary for input parameters. Although I didn’t need this, this would have been great if I needed to change the input parameters at all. @@ -883,12 +941,6 @@ Tom: This is interesting - and you wonder if it would still be possible (given t **Original results files**. @hernandez_optimal_2015: Included some original results files, which was invaluable in identifying some of the parameters in the code that needed to be fixed. -**Excessive number of files**. For @hernandez_optimal_2015 the default behaviour of the script was to output lots of files from each round (so you could easily have 90, 100, 200+ files), which were then not used in analysis (as it just depended on an aggregate results file). Although these individual files might be useful during quality control, as a default behaviour of the script, it could easily make the repository quite busy/littered with files. - -* Tom reflected that this is more of a general housekeeping issue. He agrees and says its ok to do this, but perhaps they need a run mode that does not produce these verification files - **Classes**. @hernandez_optimal_2015: Structured code into classes, which was nice to work with/amend. -**Version history and releases**. @johnson_cost_2021 had commits to their GitHub repository after the publication date. It wasn't clear which version aligned with the publication. However, the most recent commits add clear README instructions to the repository. We decided to use the latest version of the repository, but it would have beneficial to have releases/versions/a change log that would help to outline the commit history in relation to the publication and any subsequent changes. - ## References \ No newline at end of file