diff --git a/source/documentation/Get started/contribute.md b/source/documentation/Get started/contribute.md new file mode 100644 index 00000000..e979356e --- /dev/null +++ b/source/documentation/Get started/contribute.md @@ -0,0 +1,9 @@ +# Contribute + +We welcome contributions to our documentation from the Analytical Platform user community. If you see anything on this website that is inaccurate, or have a new piece of content in mind, we invite you to collaborate with us. Your input helps us ensure we are meeting our users' needs. + +After you complete the [Quickstart guide][quickstart.md], you can contribute to the Analytical Platform guidance. Before making any changes, contact support to discuss the content you are proposing using the **#analytical-platform-support** Slack channel on the **Justice Digital workspace**. We will assign someone to review your content before publishing it. + +>**Note:** Make sure your changes are on a branch. **Do not** edit the main branch. + +When writing content, consider the knowledge and requirements of who you are writing for. Following the [GDS style guide](https://www.gov.uk/guidance/style-guide/a-to-z-of-gov-uk-style) helps ensure you are meeting our writing standards and users' needs, and reduces the number of edits you will need to make before we publish your content. \ No newline at end of file diff --git a/source/documentation/Get started/index.md b/source/documentation/Get started/index.md new file mode 100644 index 00000000..e69de29b diff --git a/source/documentation/Get started/quickstart.md b/source/documentation/Get started/quickstart.md new file mode 100644 index 00000000..ce5c2172 --- /dev/null +++ b/source/documentation/Get started/quickstart.md @@ -0,0 +1,210 @@ +# Quickstart guide + +This guide provides the instructions to set up the main accounts and services you need to use the Analytical Platform (AP). Once you complete it, you can: + +- access the Analytical Platform Control Panel +- explore data on the Analytical Platform +- begin developing your code in either JupyterLab or RStudio +- contribute to the Analytical Platform User Guidance + +## Before you begin + +To use this guide, you need the following: + +- a Ministry of Justice-issued Office 365 account +- a Ministry of Justice-issued laptop you can install apps on +- a mobile device you can install apps on +- access to the **Justice Digital workspace** on Slack + +Complete this guide in order, following each step closely. If you encounter issues, [raise a ticket on GitHub issues](https://github.com/ministryofjustice/data-platform-support/issues/new/choose) or email **analytical_platform@digital.justice.gov.uk**. A member of the Analytical Platform team will contact you. + +## 1. Read Terms of use + +For Analytical Platform best practice, you are required to follow certain guidelines. Read the following, ensuring you follow them when using the platform: + +- [Acceptable use policy](aup.md): covers the way you should use the Analytical Platform and its associated tools and services +- [Data and Analytical Services Directorate's (DASD) coding standards](https://moj-analytical-services.github.io/our-coding-standards/): principles outlining how you should write and review code +- [MoJ Analytical IT Tools Strategy](https://moj-analytical-services.github.io/moj-analytical-it-tools-strategy/): describes recommended ways of working on the Analytical Platform + +## 2. Create Slack account + +We use Slack to communicate status updates, such as scheduled maintenance for the Analytical Platform. You can also use it to communicate with our support team and the Analytical Platform user community. + +There are two workspaces we recommend joining: [Justice Digital](https://mojdt.slack.com/) and [ASD](https://asdslack.slack.com/). To join, while signed in to your work email, navigate to each workspace and request to join. Note that workspace moderators only consider users from the following email addresses: + +- @justice.gsi.gov.uk +- @digital.justice.gov.uk +- @cjs.gsi.gov.uk +- @noms.gsi.gov.uk +- @legalaid.gsi.gov.uk +- @justice.gov.uk +- @judiciary.uk + +> **Note**: You can only access the Justice Digital and ASD workspaces using DOM1 and MoJ Digital and Technology MacBooks. If you use Quantum, you will need to access Slack using a mobile device instead. + +### Join Slack channels + +Join the following Slack channels in the **ASD workspace**: +- **#ask-data-engineering**: for discussing data engineering and making technical queries to the Data Engineering team regarding Airflow +- **#data_science**: for discussing data science tools and techniques with the Ministry of Justice's Data Science community +- **#git**: for discussing Git tooling with the wider Ministry of Justice community +- **#python**: for discussing Python programming with the wider Ministry of Justice community +- **#r** and **#intro_r**: for discussing R programming with the wider Ministry of Justice community; #intro_r is aimed at new users + +Additionally, in the Justice Digital **workspace** join the following: +- [**#analytical-platform-support**](https://mojdt.slack.com/archives/C4PF7QAJZ): for tracking [support queries raised by users of the Analytical Platform on GitHub Issues](https://github.com/ministryofjustice/data-platform-support/issues) +- [**#ask-operations-engineering**](https://mojdt.slack.com/archives/C01BUKJSZD4): for requesting support with GitHub; you can use this channel to request access to the Analytical Platform later in this guide + +## 3. Create GitHub account + +Using your work email address (ending either **justice.gov.uk** or **digital.justice.gov.uk**), [sign up for a GitHub account](https://github.com/join). See the [GitHub documentation](https://docs.github.com/en/get-started/signing-up-for-github/signing-up-for-a-new-github-account) for instructions. Ensure that: + +- your account uses the free plan subscription +- you choose a username that does not contain upper-case characters (for good practice) +- you set your Git username for **every** repository on your device; see the [GitHub documentation](https://docs.github.com/en/get-started/getting-started-with-git/setting-your-username-in-git) for instructions +- you follow the [Ministry of Justice's best practice guidelines](https://security-guidance.service.justice.gov.uk/passwords/#passwords) when setting your password +- you [configure two-factor authentication (2FA)](https://docs.github.com/en/authentication/securing-your-account-with-two-factor-authentication-2fa/configuring-two-factor-authentication) on your mobile device; we recommend using either Google Authenticator or Microsoft Authenticator for 2FA + - For more information on 2FA within the Ministry of Justice, see the [MoJ's Security Guidance](https://security-guidance.service.justice.gov.uk/multi-factor-authentication-mfa-guide/#multi-factor-authentication-mfa-guide) + +## 4. Access the Analytical Platform + +Once you have your GitHub account, there are **two more steps** to complete before you can access the Analytical Platform: + +- joining the MoJ Analytical Services GitHub organisation +- signing in to the Analytical Platform's Control Panel + +### Join MoJ Analytical Services + +After configuring your GitHub account you can request access to the Analytical Platform. + +Navigate to the [MoJ Analytical Services organisation](https://github.com/moj-analytical-services) and request to join it. The Operations Engineering team will review the request. If they approve your request, you will receive an email with a link to accept the invite. + +If you do not receive a response within 24 hours, request access in either the [**#ask-operations-engineering**](https://mojdt.slack.com/archives/C01BUKJSZD4) Slack channel or email **operations-engineering@digital.justice.gov.uk**, providing your GitHub username in your message. + +### Sign in to the Control Panel + +The main entry point to the Analytical Platform is the [Control Panel](https://controlpanel.services.analytical-platform.service.justice.gov.uk/). From there, you configure core tools such as JupyterLab and RStudio. + +When you access the Control Panel for the first time, a prompt will appear requiring you to configure 2FA using your mobile device. Note that while you use your GitHub account to access the Control Panel, **this 2FA is separate from the one you use to log in to GitHub**. You may need to disable browser extensions such as Dark Mode during the 2FA setup process. + +After you log in to the Control Panel for the first time, you can begin requesting access to data on the platform. + +## 5. Download and install JupyterLab + +>**Note**: Only follow this step if you want to use the Analytical Platform as a Python-based user. If you want to use R, proceed to step 6 and configure RStudio instead. + +JupyterLab is a Python package, which means you install it using pip, the package installer for Python. You may already have Python or pip installed; run the following commands to check: + +``` +$ python --version +Python 3.N.N +$ python -m pip --version +pip X.Y.Z from ... (python 3.N.N) +``` + +### Download Python + +If you do not have Python installed on your device, download the latest version from the [Python website](https://www.python.org/downloads/). + +### Install pip + +If you have Python installed, next install pip and upgrade it to the latest version using the following command: + +``` +$ python -m ensurepip --upgrade +``` + +Verify your Python and pip installations: + +``` +$ python -m pip --version +pip X.Y.Z from ... (python 3.N.N) +``` + +### Install JupyterLab + +To build and deploy applications on the Analytical Platform using Python, you need to set up JupyterLab, the Python-based IDE (Integrated Development Environment) the Analytical Platform uses. + +Install JupyterLab with pip: + +``` +pip install jupyterlab +``` + +To launch JupyterLab run: + +``` +jupyter-lab +``` + +### Install Jupyter Notebook + +Jupyter Notebook is a server-client application that allows you to edit and run notebooks, which are documents containing live code, equations, visualisations and text. To install the classic notebook, which is the Jupyter Notebook web interface, run: + +``` +pip install notebook +``` + +To launch Jupyter Notebook run: + +``` +jupyter notebook +``` + +### Create and add JupyterLab SSH key + +So you can access GitHub repositories from JupyterLab, you need an SSH key to connect the two. Do not try to use an existing SSH key; each tool you use requires a unique key. + +To create an SSH key in JupyterLab: + +1. Open JupyerLab from the Analytical Platform Control Panel +2. Select the **+** icon in the file browser to open a new **Launcher** tab +3. Navigate to the **Other** section and select **Terminal** +4. Run the following command in your terminal, replacing **your_email@example.com** with the email address you used to sign up for GitHub: + +``` +$ ssh-keygen -t rsa -b 4096 -C "your_email@example.com" +``` + +5. The response will ask you to choose a directory to save the key in; press Enter to accept the default location +6. The response will also ask you to set a passphrase; press Enter to not set a passphrase. +7. To view the SSH key, run: + +``` +$ cat /home/jovyan/.ssh/id_rsa.pub +``` + +8. Copy the SSH key to your clipboard + +You then need to add the SSH key to GitHub; see the [GitHub documentation](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) for instructions. + +## 6. Download and install R and RStudio + +>**Note**: Only follow this step if you want to use the Analytical Platform as an R-based user. If you want to use Python and configured JupyterLab in the previous step, proceed to step 7. + +### Download R + +Download the relevant version of R for your device from the [CRAN website](https://cloud.r-project.org/). Run the installer to set up R on your device. + +To build and deploy applications on the Analytical Platform using R, you need to set up RStudio, the R-based IDE (Integrated Development Environment) the Analytical Platform uses. + +### Download RStudio Server + +The Analytical Platform uses RStudio Server rather than RStudio desktop; accessing RStudio from a browser removes the need for you to store RStudio data on your device locally. Download the free version of RStudio Server from the [Posit website](https://posit.co/downloads/?_gl=1*31dazx*_ga*MTU4ODQ1Njg4Mi4xNjg3MzU2OTEz*_ga_2C0WZ1JHG0*MTY4NzM1NjkxMi4xLjEuMTY4NzM1Njk3MS4wLjAuMA..). + +### Create and add RStudio SSH key + +So you can access GitHub repositories from RStudio, you need an SSH key to connect the two. Do not try to use an existing SSH key; each tool you use requires a unique key. + +To create an SSH key in RStudio: + +1. Open RStudio from the Analytical Platform Control Panel +2. Navigate to **Tools>Global Options** +3. In the **Options** window, select **Git/SVN** in the navigation menu +4. Select Create RSA key and then Create +5. When the **Information** window appears, select **Close** +6. Select **View public key** and copy the SSH key to your clipboard + +You then need to add the SSH key to GitHub; see the [GitHub documentation](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) for instructions. + +Now you have completed this guide you are ready to begin using the Analytical Platform. See the [Training Resources](training.md) for examples of using coding tools on the Analytical Platform. \ No newline at end of file diff --git a/source/documentation/Get started/training.md b/source/documentation/Get started/training.md new file mode 100644 index 00000000..2cc10dbf --- /dev/null +++ b/source/documentation/Get started/training.md @@ -0,0 +1,32 @@ +# Training Resources + +This page provides links to training resources you can use to familiarise yourself with the Analytical Platform and the tools comprising it. Note that unless otherwise stated, the Analytical Platform team are not responsible for these resources. + +## Analytical Platform + +- [MoJ Analytical IT Tools Strategy](https://moj-analytical-services.github.io/moj-analytical-it-tools-strategy/): describes best practice for using the Analytical Platform tools + +## Git and GitHub + +- [Git from the inside out](https://maryrosecook.com/blog/post/git-from-the-inside-out): in-depth essay that describes the Git workflow with commands included +- [GitHub Quickstart](https://docs.github.com/en/get-started/quickstart/hello-world): quickstart guide including Hello World exercise +- [GitHub Skills](https://github.com/skills): interactive GitHub courses for beginners and experts +- [GitHub Training Manual](https://githubtraining.github.io/training-manual/#/01_getting_ready_for_class): tutorials on how to use basic and advanced GitHub features +- [Happy Git and GitHub for the useR](https://happygitwithr.com/): unofficial instructions on how to use Git with RStudio +- [Learn Git Branching](https://learngitbranching.js.org/): interactive tutorial on how to use branches in Git +- [git - the simple guide](http://rogerdudler.github.io/git-guide/): covers basic Git workflows +- [Understanding Git and GitHub](https://github.com/moj-analytical-services/Coffee-and-Coding/tree/master/2020-10-08%20Understanding%20Git%20and%20GitHub): + +## Python and JupyterLab + +- [intro-to-python](https://github.com/moj-analytical-services/intro-to-python): self-paced introduction to JupyterLab and Python +- [python-training-iterables](https://github.com/moj-analytical-services/python-training-iterables): self-paced Python training module "Advanced Iterables", which covers lists, dictionaries tuples and sets, how to convert, analyse and manipulate them, use comprehensions to create them, and how to create custom sequences with generator functions + +## R and RStudio + +[RStudio documentation](https://docs.posit.co/ide/user/ide/get-started/): guide on how to perform basic data visualisation using RStudio +[Hands-On Programming with R](https://rstudio-education.github.io/hopr/index.html): lessons on how to prgram in R, with applied examples + + +## SQL +- [sql_training](https://github.com/moj-analytical-services/sql_training): training session for using SQL (Amazon Athena) with the Analytical Platform; for details of when the next session is taking place, [raise an Analytical Platform Support Request](https://github.com/ministryofjustice/data-platform-support/issues/new/choose) \ No newline at end of file diff --git a/source/documentation/Overview/index.md b/source/documentation/Overview/index.md new file mode 100644 index 00000000..e3b3b00b --- /dev/null +++ b/source/documentation/Overview/index.md @@ -0,0 +1,61 @@ +# Overview + +This page provides an introduction to the Analytical Platform (AP), and the benefits of using it. + +The Analytical Platform is a data analysis platform made up of tools, packages and datasets for creating applications that utilise data within the Ministry of Justice (MoJ). The AP provides development environments in both Python (JupyterLab) and R (RStudio), allowing you multiple ways to query, analyse and model data. + +## Intended users + +Primarily intended for data analysts in the Data and Analytical Services Directorate, the Analytical Platform also hosts users from: +- Criminal Injury Claims (CICA) +- HM Courts & Tribunals Service (HMCTS) +- HM Prison and Probation Service (HMPPS) +- Legal Aid Agency (LAA) +- Office of the Public Guardian (OPG) + +We can also host other MoJ organisations. [Contact us][contact] to discuss your options. + +### Knowledge requirements + +The Analytical Platform incorporates a variety of technical tools and concepts. While our community provide basic training materials on how to use some of these, to use the platform, as a minimum we recommend you have working knowledge of the following: + +- Amazon Athena and S3: to create, manipulate and query data +- GitHub and GiHub actions: to manage your application code +- Python or R: to develop applications on the Analytical Platform +- SQL: to query and transform data + +## Benefits + +In additional to Python and R compatibility, benefits of using the Analytical Platform include: + +- **modern data tools and services**: + - the ability to freely install packages from CRAN and PyPI to perform advanced analytical techniques, such as text mining, predictive analytics and data visualisation + - compatiblity with current cloud data services, such as Amazon Athena, Glue and Redshift, offering scalability and a managed service at commodity pay-as-you-go prices + - for more information, see the [Analytical Platform's list of tools and packages][tools] +- **centralised data**: + - our Data Engineering team converts raw data from operational systems into structures and excerpts + - we hold data files in Amazon S3 for ease of use, to load into your code or run SQL queries directly using Amazon Athena + - users can also upload data to the AP from other sources and share them with granular access controls, subject to normal data protection processes; for more information, see [Data Protection Impact Assessment][DPIA] +- **reproducible analysis**: the AP provides tools to develop reproducible analytical pipelines (RAPs) to automate time–consuming and repetitive tasks, allowing you to focus on interpreting the results with the following elements: + - when datasets are imported into the AP, snapshots of them are taken and versioned + - standardised system libraries in GitHub + - a standardised virtual machine that can run R Studio or Jupyter, or code running in an explicitly defined Dockerfile +- **secure environments**: we host the Analytical Platform in a cloud-based ecosystem that is easy to access remotely from all MoJ IT systems. Designed for data at security classifications OFFICIAL and OFFICIAL-SENSITIVE, we follow NCSC Cloud Security Principles, implementing features such as: + - two-factor authentication + - data encryption at rest and in transit + - granular access control + - extensive tracking of user behaviour, user privilege requests/changes and data flows + - multiple isolation levels between users and system components + - resilience and high availability to provide optimal performance and uptime + +> **Note**: The Analytical Platform does not currently provide the following: +- production apps at scale +- management information +- real-time data; however, the Airflow tool can schedule data processing as frequently as every few minutes +- pure data archival: Amazon S3, which the AP uses for data storage, does not offer index or search facilities + - we can set up a custom bucket policy to archive data to S3-IA or Glacier but recommend exploring SaaS alternatives, such as SharePoint or Google Drive + +[contact]: mailto:analytical_platform@digital.justice.gov.uk +[tools]: [tools-list.md] +[DPIA]: dpia.md +[RAPs]: reproducible-analytical-pipelines.md diff --git a/source/documentation/Overview/tools-list.md b/source/documentation/Overview/tools-list.md new file mode 100644 index 00000000..18e6b915 --- /dev/null +++ b/source/documentation/Overview/tools-list.md @@ -0,0 +1,75 @@ +# Tools and services + +The Analytical Platform (AP) provides a range of tools, services and packages. This page describes the core tools and services that comprise the platform, as well as additional packages you can use to perform data analysis. + +Note that the AP team only provides support for third-party tools and services for features directly involving the Analytical Platform, such as bespoke configurations. For any other support with third-party tools and services, see the vendor's documentation; we have provided links where possible. + +## Core tools and services + +### [Airflow](airflow) +A tool for scheduling and monitoring workflows. + +### [Control panel](control-panel.html) +Main entry point to the Analytical Platform. Allows you to configure tools and view their status. + +### [Create a Derived Table](create-a-derived-table) +A tool for creating persistent derived tables in Athena. + +### [RStudio](rstudio) +Development environment for writing R code and R Shiny apps. For more information, see the [RStudio documentation](https://docs.posit.co/ide/user/). + +### [JupyterLab](jupyterlab) +Development environment for writing Python code. For more information, see the [JupyterLab documentation](https://jupyterlab.readthedocs.io/en/latest/). + +### [Data Discovery](../data/curated-databases/data-documentation) +Allows you to browse the databases that are available on the Analytical Platform. + +### [Data Uploader](data-uploader) +Web application for uploading data (.csv, .json, .jsonl) to the Analytical Platform in a standardised way. + +### [Data Extractor](https://github.com/ministryofjustice/data-engineering-data-extractor) + +Extracts data from applications, services or microservices to the Analytical Platform in a standardised way. + +### [GitHub](https://github.com/) + +Online hosting platform for git. Git is a distributed version control system that allows you to track changes in files, while GitHub hosts the Analytical Platform's code. + +### [Register my data](https://github.com/ministryofjustice/register-my-data) + +Moves data from microservices into the Analytical Platform's [curated databases](../data/curated-databases) in a standardised way. + +## Python packages + +The Data Engineering team maintain Python packages that help with data manipulation. The following are the packages we consider the most useful for doing so: + +### [athena_tools](https://github.com/moj-analytical-services/athena_tools) +Provides a simple way to create small persisting ad hoc databases. Currently in Alpha. + +### [dataengineeringutils3](https://github.com/moj-analytical-services/dataengineeringutils3) +Collection of useful utilities for interacting with AWS. + +### [mojap-arrow-pd-parser](https://github.com/moj-analytical-services/mojap-arrow-pd-parser) +Ensures type conformance when reading with arrow or pandas. + +### [mojap-aws-tools-demo](https://github.com/moj-analytical-services/mojap-aws-tools-demo) +Contains helpful guides on how to use the Python packages listed in this section. You can also ask for help with these in the **#ask-data-engineering** Slack channel on the **Justice Digital workspace**. + +### [mojap-metadata](https://github.com/moj-analytical-services/mojap-metadata) +Defined metadata that interacts with other packages (including arrow-pd-parser) to ensure type conformance, as well as schema converters. + +### [pydbtools](https://github.com/moj-analytical-services/pydbtools) +Queries MoJAP athena databases with features such as temp table creation. + +## R packages + +The following native R packages remove the need for using Python in R projects. + +### [dbtools](https://github.com/moj-analytical-services/dbtools) +Allows you to access databases from the Analytical Platform. The Data Engineering team maintains this package. + +### [Rdbtools](https://github.com/moj-analytical-services/Rdbtools) +Allows you to access Athena databases from the Analytical Platform. The Analytical Platform community maintain this package. + +### [Rs3tools](https://github.com/moj-analytical-services/Rs3tools) +Allows you to access AWS S3 from the Analytical Platform, which is mainly compatible with the legacy package [s3tools](https://github.com/moj-analytical-services/s3tools). The Analytical Platform community maintain this package. \ No newline at end of file