15 modelling relevant testing #214

Haashim-ONS · 2024-12-10T11:03:24Z

Updated with Susan's feedback and removed extra detail on python code.

zarbspace

This is a great start. My main comments are around the use of technical jargon and making sure the user is supported with extra resources.

zarbspace · 2024-12-17T17:10:15Z

book/testing_code.md


 •	Operational Acceptance Testing (OAT): Ensures the system is operationally ready, including backup, recovery, and maintenance. This involves testing the model's performance under different operational conditions to ensure it can handle various scenarios.

+


Using technical terminology like accuracy, precision, recall, F1 scores and Area Under Curve (I don't know what the last two refer to, for example) means we need to link off the terms to explanatory material, as these topics are outside the scope of the Duck Book. I would prefer it if the paragraph leads with the problem to be solved and then perhaps refers to examples or other resources if this makes sense. I would suggest:

"Evaluating model performance using metrics is essential. You should choose metrics that align with the specific goals of the project and provide meaningful insights into the performance of the model in this context. For a general discussion of metrics that measure model performance see...."

The short summaries of approaches (e.g. Stress Testing) are good - they stick to the general and explain what the approach is useful for.

Where you mention terms which assume a level of technical familiarity on the part of the reader you should link these to wider reading hyperlinks. For example, the discussion of cross validation goes straight into complicated sounding k-fold methods. Better to start with a general summary of what these techniques are for (your last sentence in 535 starts to do this) and then follow up with detail and links to more information. Watch out for jargon - you talk about over-fitting, for example. How many readers know what that is, and if they don't what should they do? A link to a discussion of what it is could help here, if you really need to include it.

zarbspace · 2024-12-17T17:15:55Z

book/testing_code.md


 ### Sensitivity Analysis

-Sensitivity analysis tests how sensitive the model's outputs are to changes in input data or parameters. This analysis helps understand the model's behaviour and identify potential weaknesses. Sensitivity analysis involves systematically varying the input data or model parameters and measuring the impact on the model's predictions. This helps in identifying critical factors that influence the model's performance and making necessary adjustments.
+Sensitivity analysis tests how sensitive the model's outputs are to changes in input data or parameters. This analysis helps understand the model's behaviour and identify potential weaknesses. Sensitivity analysis involves systematically varying the input data or model parameters and measuring the impact on the model's outputs. This helps in identifying critical factors that influence the model's performance and making necessary adjustments.


This is nice and clear.

zarbspace · 2024-12-17T17:18:04Z

book/testing_code.md


 ### Model Interpretability

-Implementing methods to make the model's predictions interpretable is essential for building trust with stakeholders. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain the model's decisions. These methods provide insights into how different features contribute to the model's predictions, making it easier for analysts and stakeholders to understand and trust the model's outputs.
+Implementing methods to make the model's outputs interpretable is essential for building trust with stakeholders. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain the model's decisions. These methods provide insights into how different features contribute to the model's outputs, making it easier for analysts and stakeholders to understand and trust the model's outputs.


Again, I have no idea what these SHAP and LIME techniques are. Can we link to them? Also, we probably need to explain what we mean by"interpretable" .... The last sentence is nice and clear - that's the level to aim for.

zarbspace · 2024-12-17T17:23:37Z

book/testing_code.md


 ### Model Interpretability

-Implementing methods to make the model's predictions interpretable is essential for building trust with stakeholders. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain the model's decisions. These methods provide insights into how different features contribute to the model's predictions, making it easier for analysts and stakeholders to understand and trust the model's outputs.
+Implementing methods to make the model's outputs interpretable is essential for building trust with stakeholders. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain the model's decisions. These methods provide insights into how different features contribute to the model's outputs, making it easier for analysts and stakeholders to understand and trust the model's outputs.

 ### Model Optimisation


Again, start with the general. I don't know what grid search and hyperparameter tuning are. You could rework this para to make it clearer:

Optimisation is used to adjust adjusting the model's parameters to achieve the best overall performance. Continuous optimisation ensures that the model remains effective and efficient over time as inputs change. There are lots of techniques available to optimise performance. Most are designed to help find the best parameters for the model to enhance its accuracy and efficiency.

Examples of optimisation techniques in the context of machine learning include grid search and parameter tuning. [Grid search involves systematically searching through a predefined set of hyperparameters, while hyperparameter tuning adjusts the model's parameters to achieve the best possible performance. ] NOTE: These are very jargon heavy.... most people won't know what they mean....

The Use fixtures section seems to apply more widely than just to modelling. I think this para should be moved to the wider discussion of testing. Again, beware of introducing jargon terms (fixtures, parameterised tests) that people may not understand without resources to support them.

Haashim-ONS added 4 commits November 21, 2024 12:55

Model-relevant testing draft

ef8eec2

edit

8efd1f2

Re-write after feedback

c538fd6

Final edits complete, ready for review

5ecb856

Haashim-ONS requested review from zarbspace and sarahcollyer December 10, 2024 11:03

ellie-o linked an issue Dec 17, 2024 that may be closed by this pull request

Add subsection on modelling-relevant testing #15

Open

zarbspace requested changes Dec 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

15 modelling relevant testing #214

15 modelling relevant testing #214

Haashim-ONS commented Dec 10, 2024

zarbspace left a comment

zarbspace Dec 17, 2024

zarbspace Dec 17, 2024

zarbspace Dec 17, 2024

zarbspace Dec 17, 2024


		• Operational Acceptance Testing (OAT): Ensures the system is operationally ready, including backup, recovery, and maintenance. This involves testing the model's performance under different operational conditions to ensure it can handle various scenarios.

15 modelling relevant testing #214

Are you sure you want to change the base?

15 modelling relevant testing #214

Conversation

Haashim-ONS commented Dec 10, 2024

zarbspace left a comment

Choose a reason for hiding this comment

zarbspace Dec 17, 2024

Choose a reason for hiding this comment

zarbspace Dec 17, 2024

Choose a reason for hiding this comment

zarbspace Dec 17, 2024

Choose a reason for hiding this comment

zarbspace Dec 17, 2024

Choose a reason for hiding this comment