-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathstandards.Rmd
88 lines (66 loc) · 3.61 KB
/
standards.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
title: "Coding Standards"
---
Our coding standards can be found [here](https://defra.sharepoint.com/:p:/r/teams/Team1478/_layouts/15/Doc.aspx?sourcedoc=%7B98C3D254-86E8-44FD-BB2E-206B37F6F802%7D&file=Coding%20Standards.pptx&action=edit&mobileredirect=true). The main points are:
1. We explicitly call packages rather than loading them.
<table>
<tr>
<th>   </th>
<th> Do </th>
<th>   </th>
<th> Don't </th>
</tr>
<tr>
<td>
</td>
<td>
```{r, eval=FALSE}
dplyr::select(...)
```
</td>
<td>
</td>
<td>
```{r, eval=FALSE}
library(dplyr)
select(...)
```
</td>
</tr>
</table>
2. Variable names must make sense in plain English e.g. `localAuthorityPopulation` not `lapop`.
3. We use camelCase e.g. `localAuthorityPopulation` not `local_authority_population`.
# Naming Conventions
1. Github repositories should be all lower case with underscores "_" instead of spaces.
2. R Scripts should be...
3. Excel files or CSVs should be...
# ONS Guidance - UNFINISHED
The [Duck Book](https://best-practice-and-impact.github.io/qa-of-code-guidance/intro.html) should inform how we work.
Important points:
- Good practice
- **Our pipelines should be well documented, and that documentation should be embedded in our code. This makes collaboration easier, as well as allowing for easier auditing and assurance. When documentation exists alongside the code, it is more likely to be kept up to date than if it is spread out across different systems. Having the documentation as part of the code means we have a single source of the truth.**
- Not following good practices creates 'technical debt', which slows down further development of the analysis.
- Needs to be:
- Reproducible analysis: "the same analysis steps performed on the same dataset consistently produces the same result"
- Auditable analysis: "analysis and supporting evidence are available for scrutiny"
- Assured analysis: "analysis has passed through a systematic process that established it as fit for purpose"
- We should automate the aspects of quality assurance that are challenging for humans. This frees up time and makes it easier to keep a record of what QA has been performed. The purpose of this automation is not to get rid of analysts, but to allow analysts to focus on the QA that computers cannot easily do. For example, analysts are much better at ensuring that the analysis is fit for purpose.
- We should automate the aspects of quality assurance that are challenging for humans. This frees up time and makes it easier to keep a record of what QA has been performed. The purpose of this automation is not to get rid of analysts, but to allow analysts to focus on the QA that computers cannot easily do. For example, analysts are much better at ensuring that the analysis is fit for purpose.
- will have several levels of master and sub scripts, with one all powerful one (that calls the waste flow and cost/ tonne model)
- In Agile, working software is considered the primary measure of success. This means that we don’t spend months creating detailed plans and documenting specifications. Instead, we prioritise early and continuous delivery of working software.
https://style.tidyverse.org/index.html
#Jake's Plan
- Week 1 - get on delivery footing
- Tidy AWS
- Consolidate github repos
- Get everyone familiar with guidance and new way of working
- Week 2 to 4
- Get the model to MVP state in terms of numbers out
- End of October deadline
- Week 5 to 7
- Finesse
- Input and output exploratory reports for Tom B
- Add unit testing etc.
- Week 8
- Wiggle room
- End of November deadline