Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

15 modelling relevant testing #214

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Haashim-ONS
Copy link

Updated with Susan's feedback and removed extra detail on python code.

@ellie-o ellie-o linked an issue Dec 17, 2024 that may be closed by this pull request
Copy link
Contributor

@zarbspace zarbspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start. My main comments are around the use of technical jargon and making sure the user is supported with extra resources.


• Operational Acceptance Testing (OAT): Ensures the system is operationally ready, including backup, recovery, and maintenance. This involves testing the model's performance under different operational conditions to ensure it can handle various scenarios.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using technical terminology like accuracy, precision, recall, F1 scores and Area Under Curve (I don't know what the last two refer to, for example) means we need to link off the terms to explanatory material, as these topics are outside the scope of the Duck Book. I would prefer it if the paragraph leads with the problem to be solved and then perhaps refers to examples or other resources if this makes sense. I would suggest:

"Evaluating model performance using metrics is essential. You should choose metrics that align with the specific goals of the project and provide meaningful insights into the performance of the model in this context. For a general discussion of metrics that measure model performance see...."

The short summaries of approaches (e.g. Stress Testing) are good - they stick to the general and explain what the approach is useful for.

Where you mention terms which assume a level of technical familiarity on the part of the reader you should link these to wider reading hyperlinks. For example, the discussion of cross validation goes straight into complicated sounding k-fold methods. Better to start with a general summary of what these techniques are for (your last sentence in 535 starts to do this) and then follow up with detail and links to more information. Watch out for jargon - you talk about over-fitting, for example. How many readers know what that is, and if they don't what should they do? A link to a discussion of what it is could help here, if you really need to include it.


### Sensitivity Analysis

Sensitivity analysis tests how sensitive the model's outputs are to changes in input data or parameters. This analysis helps understand the model's behaviour and identify potential weaknesses. Sensitivity analysis involves systematically varying the input data or model parameters and measuring the impact on the model's predictions. This helps in identifying critical factors that influence the model's performance and making necessary adjustments.
Sensitivity analysis tests how sensitive the model's outputs are to changes in input data or parameters. This analysis helps understand the model's behaviour and identify potential weaknesses. Sensitivity analysis involves systematically varying the input data or model parameters and measuring the impact on the model's outputs. This helps in identifying critical factors that influence the model's performance and making necessary adjustments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and clear.


### Model Interpretability

Implementing methods to make the model's predictions interpretable is essential for building trust with stakeholders. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain the model's decisions. These methods provide insights into how different features contribute to the model's predictions, making it easier for analysts and stakeholders to understand and trust the model's outputs.
Implementing methods to make the model's outputs interpretable is essential for building trust with stakeholders. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain the model's decisions. These methods provide insights into how different features contribute to the model's outputs, making it easier for analysts and stakeholders to understand and trust the model's outputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I have no idea what these SHAP and LIME techniques are. Can we link to them? Also, we probably need to explain what we mean by"interpretable" .... The last sentence is nice and clear - that's the level to aim for.


### Model Interpretability

Implementing methods to make the model's predictions interpretable is essential for building trust with stakeholders. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain the model's decisions. These methods provide insights into how different features contribute to the model's predictions, making it easier for analysts and stakeholders to understand and trust the model's outputs.
Implementing methods to make the model's outputs interpretable is essential for building trust with stakeholders. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain the model's decisions. These methods provide insights into how different features contribute to the model's outputs, making it easier for analysts and stakeholders to understand and trust the model's outputs.

### Model Optimisation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, start with the general. I don't know what grid search and hyperparameter tuning are. You could rework this para to make it clearer:

Optimisation is used to adjust adjusting the model's parameters to achieve the best overall performance. Continuous optimisation ensures that the model remains effective and efficient over time as inputs change. There are lots of techniques available to optimise performance. Most are designed to help find the best parameters for the model to enhance its accuracy and efficiency.

Examples of optimisation techniques in the context of machine learning include grid search and parameter tuning. [Grid search involves systematically searching through a predefined set of hyperparameters, while hyperparameter tuning adjusts the model's parameters to achieve the best possible performance. ] NOTE: These are very jargon heavy.... most people won't know what they mean....

The Use fixtures section seems to apply more widely than just to modelling. I think this para should be moved to the wider discussion of testing. Again, beware of introducing jargon terms (fixtures, parameterised tests) that people may not understand without resources to support them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add subsection on modelling-relevant testing
2 participants