Table of Contents
- What is being created?
- Modifications to the CFN templates
- Before your first deployment
- Making Changes to the infrastructure
- How to run the Amazon Sagemaker Notebook code for detecting fake news
- Generic CDK readme
The AWS Cloud Development Kit (CDK) stacks in this repository modify and deploy the AWS CloudFormation templates for the neptune ML and export functionality. The original CloudFormation templates can be found in the following locations:
- The core stack
- The Amazon Neptune ML stack
- The Nested Amazon Sagemaker Notebook stack inside of the Neptune ML stack
For more information on this template and the Neptune export service, see https://docs.aws.amazon.com/neptune/latest/userguide/export-service.html
There are a few modifications that happen to the CloudFormation stacks:
The setup.sh script performs the following tasks for you:
- It downloads the CloudFormation templates from the stacks locally. This is done so that you can interact with the template objects in CDK.
- It adds the export names to the CloudFormation templates so that you can reference them elsewhere and between the two stacks. This is especially important since the NeptuneMlCoreStack uses the outputs from the NeptuneBase stack as parameters of the template.
The updates to the base stack are all related to the desired query timeout. If these changes are not made, the %%neptune_ml export
command fails because of a timeout. So, we increase the timeout of the Neptune DB Cluster parameter group and the Neptune Instance parameter group. To do this we:
- First update the NeptuneQueryTimeout parameter in the CloudFormation template. This will update the value for the NeptuneDBParameterGroup.
- Then we update the query timeout for the NeptuneDBClusterParameterGroup, as it does not reference the template parameter
- finally, we associate the NeptuneDBClusterParameterGroup with the Neptune DB Cluster, as the NeptuneDB Cluster does not reference the NeptuneDBClusterParameterGroup in the original CloudFormation template.
We also modify the DB instance type by setting the parameter for the CloudFormation template.
For the neptuneML Core stack, the following changes are made:
- update the stack parameters
- load the nested CloudFormation stack so that we can make the below update:
- update the permissions for the AWS Identity and Access Management (IAM) role associated with the sagemaker notebook.
First, we get the various imports from the base stack that we will need for the parameters of the neptune ML core stack cdk template. We pass those values in as parameters to the CloudFormation template. We update the notebook instance type as well.
Note that the AWS IAM role associated with the sagemaker notebook is defined in a nested stack. We can load the nested stack by specifying that as an input on the cfn include object we create (called parentTemplate).
Then, we can create a cfnInclude object for the child stack by referencing the name of the AWS::CloudFormation::stack resource in the parent cloudformation template. In this case it's NeptuneSagemakerNotebook. Check the neptune_ml_core_stack.json file to see how this is created in CloudFormation if desired.
The AWS IAM role used for the sagemaker notebook is missing two actions that are required for detect-fake-news notebooks:
s3:createBucket
which is used to create a bucket for the default session for the notebookSageMaker:ListTrainingJobsForHyperParameterTuningJob
which is required to describe the hpo job
We get the IAM role from the child template, and then these are both added to the role policy.
You must run the following commands before your first deployment:
cd /path/to/neptune_ml/folder/
chmod +x ./setup.sh && ./setup.sh
Please reference the Modifications in the CDK sections for examples of how to make updates to various resources in the CDK. Note that examples are given for parent stacks as well as nested stacks.
The code for the notebooks is located here and contains a few notebooks that you'll want to run in the Sagemaker notebook you've created as part of the SDK.
- In the AWS console, navigate to the Amazon Sagemaker Notebook Instance that was created using these instructions
- on the notebook instances page, click open JupyterLab for the notebook created by the CDK (name: aws-neptune-notebook-for-neptunedbcluster-)
- In JupyterLab,open up a terminal
- enter the following command:
cd Sagemaker && git clone https://github.com/aws-samples/amazon-neptune-ml-fake-news-detection.git
- On the left hand side, you should see a folder appear called amazon-neptune-ml-fake-news-detection. Note it may take a moment for it to appear.
- open that folder and run the notebooks in the following order:
- create-graph-dataset
- load-graph-dataset
- detect-fake-news-neptune-ml
- inductive-inference
The cdk.json
file tells the CDK Toolkit how to execute your app.
npm install
installs the required packagesnpm run build
compile typescript to jsnpm run watch
watch for changes and compilenpm run test
perform the jest unit testscdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk synth
emits the synthesized CloudFormation template