Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added Get Started and Overview docs for review #348

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions source/documentation/Get started/contribute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Contribute

We welcome contributions to our documentation from the Analytical Platform user community. If you see anything on this website that is inaccurate, or have a new piece of content in mind, we invite you to collaborate with us. Your input helps us ensure we are meeting our users' needs.

After you complete the [Quickstart guide][quickstart.md], you can contribute to the Analytical Platform guidance. Before making any changes, contact support to discuss the content you are proposing using the **#analytical-platform-support** Slack channel on the **Justice Digital workspace**. We will assign someone to review your content before publishing it.

>**Note:** Make sure your changes are on a branch. **Do not** edit the main branch.

When writing content, consider the knowledge and requirements of who you are writing for. Following the [GDS style guide](https://www.gov.uk/guidance/style-guide/a-to-z-of-gov-uk-style) helps ensure you are meeting our writing standards and users' needs, and reduces the number of edits you will need to make before we publish your content.
Empty file.
210 changes: 210 additions & 0 deletions source/documentation/Get started/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
# Quickstart guide

This guide provides the instructions to set up the main accounts and services you need to use the Analytical Platform (AP). Once you complete it, you can:

- access the Analytical Platform Control Panel
- explore data on the Analytical Platform
- begin developing your code in either JupyterLab or RStudio
- contribute to the Analytical Platform User Guidance

## Before you begin

To use this guide, you need the following:

- a Ministry of Justice-issued Office 365 account
- a Ministry of Justice-issued laptop you can install apps on
- a mobile device you can install apps on
- access to the **Justice Digital workspace** on Slack

Complete this guide in order, following each step closely. If you encounter issues, [raise a ticket on GitHub issues](https://github.com/ministryofjustice/data-platform-support/issues/new/choose) or email **[email protected]**. A member of the Analytical Platform team will contact you.

## 1. Read Terms of use

For Analytical Platform best practice, you are required to follow certain guidelines. Read the following, ensuring you follow them when using the platform:

- [Acceptable use policy](aup.md): covers the way you should use the Analytical Platform and its associated tools and services
- [Data and Analytical Services Directorate's (DASD) coding standards](https://moj-analytical-services.github.io/our-coding-standards/): principles outlining how you should write and review code
- [MoJ Analytical IT Tools Strategy](https://moj-analytical-services.github.io/moj-analytical-it-tools-strategy/): describes recommended ways of working on the Analytical Platform

## 2. Create Slack account

We use Slack to communicate status updates, such as scheduled maintenance for the Analytical Platform. You can also use it to communicate with our support team and the Analytical Platform user community.

There are two workspaces we recommend joining: [Justice Digital](https://mojdt.slack.com/) and [ASD](https://asdslack.slack.com/). To join, while signed in to your work email, navigate to each workspace and request to join. Note that workspace moderators only consider users from the following email addresses:

- @justice.gsi.gov.uk
- @digital.justice.gov.uk
- @cjs.gsi.gov.uk
- @noms.gsi.gov.uk
- @legalaid.gsi.gov.uk
- @justice.gov.uk
- @judiciary.uk

> **Note**: You can only access the Justice Digital and ASD workspaces using DOM1 and MoJ Digital and Technology MacBooks. If you use Quantum, you will need to access Slack using a mobile device instead.

### Join Slack channels

Join the following Slack channels in the **ASD workspace**:
- **#ask-data-engineering**: for discussing data engineering and making technical queries to the Data Engineering team regarding Airflow
- **#data_science**: for discussing data science tools and techniques with the Ministry of Justice's Data Science community
- **#git**: for discussing Git tooling with the wider Ministry of Justice community
- **#python**: for discussing Python programming with the wider Ministry of Justice community
- **#r** and **#intro_r**: for discussing R programming with the wider Ministry of Justice community; #intro_r is aimed at new users

Additionally, in the Justice Digital **workspace** join the following:
- [**#analytical-platform-support**](https://mojdt.slack.com/archives/C4PF7QAJZ): for tracking [support queries raised by users of the Analytical Platform on GitHub Issues](https://github.com/ministryofjustice/data-platform-support/issues)
- [**#ask-operations-engineering**](https://mojdt.slack.com/archives/C01BUKJSZD4): for requesting support with GitHub; you can use this channel to request access to the Analytical Platform later in this guide

## 3. Create GitHub account

Using your work email address (ending either **justice.gov.uk** or **digital.justice.gov.uk**), [sign up for a GitHub account](https://github.com/join). See the [GitHub documentation](https://docs.github.com/en/get-started/signing-up-for-github/signing-up-for-a-new-github-account) for instructions. Ensure that:

- your account uses the free plan subscription
- you choose a username that does not contain upper-case characters (for good practice)
- you set your Git username for **every** repository on your device; see the [GitHub documentation](https://docs.github.com/en/get-started/getting-started-with-git/setting-your-username-in-git) for instructions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be unclear for people unfamiliar with git

Suggested change
- you set your Git username for **every** repository on your device; see the [GitHub documentation](https://docs.github.com/en/get-started/getting-started-with-git/setting-your-username-in-git) for instructions
- you set your username as the 'global' default on your device (so all of your git commands are run as you); see the [GitHub documentation](https://docs.github.com/en/get-started/getting-started-with-git/setting-your-username-in-git) for instructions

- you follow the [Ministry of Justice's best practice guidelines](https://security-guidance.service.justice.gov.uk/passwords/#passwords) when setting your password
- you [configure two-factor authentication (2FA)](https://docs.github.com/en/authentication/securing-your-account-with-two-factor-authentication-2fa/configuring-two-factor-authentication) on your mobile device; we recommend using either Google Authenticator or Microsoft Authenticator for 2FA
- For more information on 2FA within the Ministry of Justice, see the [MoJ's Security Guidance](https://security-guidance.service.justice.gov.uk/multi-factor-authentication-mfa-guide/#multi-factor-authentication-mfa-guide)

## 4. Access the Analytical Platform

Once you have your GitHub account, there are **two more steps** to complete before you can access the Analytical Platform:

- joining the MoJ Analytical Services GitHub organisation
- signing in to the Analytical Platform's Control Panel

### Join MoJ Analytical Services

After configuring your GitHub account you can request access to the Analytical Platform.

Navigate to the [MoJ Analytical Services organisation](https://github.com/moj-analytical-services) and request to join it. The Operations Engineering team will review the request. If they approve your request, you will receive an email with a link to accept the invite.

If you do not receive a response within 24 hours, request access in either the [**#ask-operations-engineering**](https://mojdt.slack.com/archives/C01BUKJSZD4) Slack channel or email **[email protected]**, providing your GitHub username in your message.

### Sign in to the Control Panel

The main entry point to the Analytical Platform is the [Control Panel](https://controlpanel.services.analytical-platform.service.justice.gov.uk/). From there, you configure core tools such as JupyterLab and RStudio.

When you access the Control Panel for the first time, a prompt will appear requiring you to configure 2FA using your mobile device. Note that while you use your GitHub account to access the Control Panel, **this 2FA is separate from the one you use to log in to GitHub**. You may need to disable browser extensions such as Dark Mode during the 2FA setup process.

After you log in to the Control Panel for the first time, you can begin requesting access to data on the platform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good up until this point, but after that why are we directing people to download and install JupyterLab and RStudio tools on their local machines? They're available on the AP and ready to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix

## 5. Download and install JupyterLab

>**Note**: Only follow this step if you want to use the Analytical Platform as a Python-based user. If you want to use R, proceed to step 6 and configure RStudio instead.

JupyterLab is a Python package, which means you install it using pip, the package installer for Python. You may already have Python or pip installed; run the following commands to check:

```
$ python --version
Python 3.N.N
$ python -m pip --version
pip X.Y.Z from ... (python 3.N.N)
```

### Download Python

If you do not have Python installed on your device, download the latest version from the [Python website](https://www.python.org/downloads/).

### Install pip

If you have Python installed, next install pip and upgrade it to the latest version using the following command:

```
$ python -m ensurepip --upgrade
```

Verify your Python and pip installations:

```
$ python -m pip --version
pip X.Y.Z from ... (python 3.N.N)
```

### Install JupyterLab

To build and deploy applications on the Analytical Platform using Python, you need to set up JupyterLab, the Python-based IDE (Integrated Development Environment) the Analytical Platform uses.

Install JupyterLab with pip:

```
pip install jupyterlab
```

To launch JupyterLab run:

```
jupyter-lab
```

### Install Jupyter Notebook

Jupyter Notebook is a server-client application that allows you to edit and run notebooks, which are documents containing live code, equations, visualisations and text. To install the classic notebook, which is the Jupyter Notebook web interface, run:

```
pip install notebook
```

To launch Jupyter Notebook run:

```
jupyter notebook
```

### Create and add JupyterLab SSH key

So you can access GitHub repositories from JupyterLab, you need an SSH key to connect the two. Do not try to use an existing SSH key; each tool you use requires a unique key.

To create an SSH key in JupyterLab:

1. Open JupyerLab from the Analytical Platform Control Panel
2. Select the **+** icon in the file browser to open a new **Launcher** tab
3. Navigate to the **Other** section and select **Terminal**
4. Run the following command in your terminal, replacing **[email protected]** with the email address you used to sign up for GitHub:

```
$ ssh-keygen -t rsa -b 4096 -C "[email protected]"
```

5. The response will ask you to choose a directory to save the key in; press Enter to accept the default location
6. The response will also ask you to set a passphrase; press Enter to not set a passphrase.
7. To view the SSH key, run:

```
$ cat /home/jovyan/.ssh/id_rsa.pub
```

8. Copy the SSH key to your clipboard

You then need to add the SSH key to GitHub; see the [GitHub documentation](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) for instructions.

## 6. Download and install R and RStudio

>**Note**: Only follow this step if you want to use the Analytical Platform as an R-based user. If you want to use Python and configured JupyterLab in the previous step, proceed to step 7.

### Download R

Download the relevant version of R for your device from the [CRAN website](https://cloud.r-project.org/). Run the installer to set up R on your device.

To build and deploy applications on the Analytical Platform using R, you need to set up RStudio, the R-based IDE (Integrated Development Environment) the Analytical Platform uses.

### Download RStudio Server

The Analytical Platform uses RStudio Server rather than RStudio desktop; accessing RStudio from a browser removes the need for you to store RStudio data on your device locally. Download the free version of RStudio Server from the [Posit website](https://posit.co/downloads/?_gl=1*31dazx*_ga*MTU4ODQ1Njg4Mi4xNjg3MzU2OTEz*_ga_2C0WZ1JHG0*MTY4NzM1NjkxMi4xLjEuMTY4NzM1Njk3MS4wLjAuMA..).

### Create and add RStudio SSH key

So you can access GitHub repositories from RStudio, you need an SSH key to connect the two. Do not try to use an existing SSH key; each tool you use requires a unique key.

To create an SSH key in RStudio:

1. Open RStudio from the Analytical Platform Control Panel
2. Navigate to **Tools>Global Options**
3. In the **Options** window, select **Git/SVN** in the navigation menu
4. Select Create RSA key and then Create
5. When the **Information** window appears, select **Close**
6. Select **View public key** and copy the SSH key to your clipboard

You then need to add the SSH key to GitHub; see the [GitHub documentation](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) for instructions.

Now you have completed this guide you are ready to begin using the Analytical Platform. See [training](training.md) for examples of different tasks you can perform.
32 changes: 32 additions & 0 deletions source/documentation/Get started/training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Training

This page provides links to training resources you can use to familiarise yourself with the Analytical Platform and the tools comprising it. Note that unless otherwise stated, the Analytical Platform team are not responsible for these resources.

## Analytical Platform

- [MoJ Analytical IT Tools Strateg](https://moj-analytical-services.github.io/moj-analytical-it-tools-strategy/): describes best practice for using the Analytical Platform tools

## Git and GitHub

- [Git from the inside out](https://maryrosecook.com/blog/post/git-from-the-inside-out): in-depth essay that describes the Git workflow with commands included
- [GitHub Quickstart](https://docs.github.com/en/get-started/quickstart/hello-world): quickstart guide including Hello World exercise
- [GitHub Skills](https://github.com/skills): interactive GitHub courses for beginners and experts
- [GitHub Training Manual](https://githubtraining.github.io/training-manual/#/01_getting_ready_for_class): tutorials on how to use basic and advanced GitHub features
- [Happy Git and GitHub for the useR](https://happygitwithr.com/): unofficial instructions on how to use Git with RStudio
- [Learn Git Branching](https://learngitbranching.js.org/): interactive tutorial on how to use branches in Git
- [git - the simple guide](http://rogerdudler.github.io/git-guide/): covers basic Git workflows
- [Understanding Git and GitHub](https://github.com/moj-analytical-services/Coffee-and-Coding/tree/master/2020-10-08%20Understanding%20Git%20and%20GitHub):

## Python and JupyterLab

- [intro-to-python](https://github.com/moj-analytical-services/intro-to-python): self-paced introduction to JupyterLab and Python
- [python-training-iterables](https://github.com/moj-analytical-services/python-training-iterables): self-paced Python training module "Advanced Iterables", which covers lists, dictionaries tuples and sets, how to convert, analyse and manipulate them, use comprehensions to create them, and how to create custom sequences with generator functions

## R and RStudio

[RStudio documentation](https://docs.posit.co/ide/user/ide/get-started/): guide on how to perform basic data visualisation using RStudio
[Hands-On Programming with R](https://rstudio-education.github.io/hopr/index.html): lessons on how to prgram in R, with applied examples


## SQL
- [sql_training](https://github.com/moj-analytical-services/sql_training): training session for using SQL (Amazon Athena) with the Analytical Platform; for details of when the next session is taking place, [raise an Analytical Platform Support Request](https://github.com/ministryofjustice/data-platform-support/issues/new/choose)
61 changes: 61 additions & 0 deletions source/documentation/Overview/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Overview

This page provides an introduction to the Analytical Platform (AP), and the benefits of using it.

The Analytical Platform is a data analysis platform made up of tools, packages and datasets for creating applications that utilise data within the Ministry of Justice (MoJ). The AP provides development environments in both Python (JupyterLab) and R (RStudio), allowing you multiple ways to query, analyse and model data.

## Intended users

Primarily intended for data analysts in the Data and Analytical Services Directorate, the Analytical Platform also hosts users from:
- Criminal Injury Claims (CICA)
- HM Courts & Tribunals Service (HMCTS)
- HM Prison and Probation Service (HMPPS)
- Legal Aid Agency (LAA)
- Office of the Public Guardian (OPG)

We can also host other MoJ organisations. [Contact us][contact] to discuss your options.

### Knowledge requirements

The Analytical Platform incorporates a variety of technical tools and concepts. While our community provide basic training materials on how to use some of these, to use the platform, as a minimum we recommend you have working knowledge of the following:

- Amazon Athena and S3: to create, manipulate and query data
- GitHub and GiHub actions: to manage your application code
- Python or R: to develop applications on the Analytical Platform
- SQL: to query and transform data

## Benefits

In additional to Python and R compatibility, benefits of using the Analytical Platform include:

- **modern data tools and services**:
- the ability to freely install packages from CRAN and PyPI to perform advanced analytical techniques, such as text mining, predictive analytics and data visualisation
- compatiblity with current cloud data services, such as Amazon Athena, Glue and Redshift, offering scalability and a managed service at commodity pay-as-you-go prices
- for more information, see the [Analytical Platform's list of tools and packages][tools]
- **centralised data**:
- our Data Engineering team converts raw data from operational systems into structures and excerpts
- we hold data files in Amazon S3 for ease of use, to load into your code or run SQL queries directly using Amazon Athena
- users can also upload data to the AP from other sources and share them with granular access controls, subject to normal data protection processes; for more information, see [Data Protection Impact Assessment][DPIA]
- **reproducible analysis**: the AP provides tools to develop reproducible analytical pipelines (RAPs) to automate time–consuming and repetitive tasks, allowing you to focus on interpreting the results with the following elements:
- when datasets are imported into the AP, snapshots of them are taken and versioned
- standardised system libraries in GitHub
- a standardised virtual machine that can run R Studio or Jupyter, or code running in an explicitly defined Dockerfile
- **secure environments**: we host the Analytical Platform in a cloud-based ecosystem that is easy to access remotely from all MoJ IT systems. Designed for data at security classifications OFFICIAL and OFFICIAL-SENSITIVE, we follow NCSC Cloud Security Principles, implementing features such as:
- two-factor authentication
- data encryption at rest and in transit
- granular access control
- extensive tracking of user behaviour, user privilege requests/changes and data flows
- multiple isolation levels between users and system components
- resilience and high availability to provide optimal performance and uptime

> **Note**: The Analytical Platform does not currently provide the following:
- production apps at scale
- management information
- real-time data; however, the Airflow tool can schedule data processing as frequently as every few minutes
- pure data archival: Amazon S3, which the AP uses for data storage, does not offer index or search facilities
- we can set up a custom bucket policy to archive data to S3-IA or Glacier but recommend exploring SaaS alternatives, such as SharePoint or Google Drive

[contact]: mailto:[email protected]
[tools]: [tools-list.md]
[DPIA]: dpia.md
[RAPs]: reproducible-analytical-pipelines.md
Loading