Skip to content

Latest commit

 

History

History
288 lines (226 loc) · 14.2 KB

application-overview.md

File metadata and controls

288 lines (226 loc) · 14.2 KB

Application Overview

This section aims to provide a high-level view of what components make up a self-hosted ConstructHub instance, and troubleshooting guidance where appropriate.

High-Level Architecture

The frontend of ConstructHub instances is a single-page web-app (developed in the cdklabs/construct-hub-webapp repository), served from an S3 bucket by a CloudFront Web Distribution.

The CloudFront Web Distribution also serves objects from a second S3 bucket that is used by the backend application to store indexed package data that is presented by the fronted.

The backend of ConstructHub instances is an event-driven, serverless application that performes the following tasks:

  1. A package source implementation notifes the ConstructHub instance of new packages by sending messages to an ingestion SQS queue.
  2. The ingestion function is triggered by the ingestion SQS queue, and verifies the new package version is compliant, before starting up the backend workflow.
  3. The backend workflow is a StepFunctions State Machine that orchestrates the necessary steps to fully index a package into a ConstructHub instance.
  4. The prune function enforces the configured deny-list and ensures previously indexed packages that are now part of the deny-list are removed from storage within an hour.
  5. The discovery canary, if configured, is a Lambda function the will periodically validate the hub is able to discover new packages within a predefined SLA.

1. Package Sources

ConstructHub provides two package source implementations: NpmJs and CodeArtifact.

  • The NpmJs source interfaces with the npmjs.com CouchDB replica (which is at replicate.npmjs.com/registry) by following it's _changes stream in search of relevant packages. When such a package is identified, a stager function is invoked, which stages the package tarball into an S3 bucket then notifies the ConstructHub ingestion SQS queue. The CouchDB follower is scheduled to run every 5 minutes, and stores the current CouchDB sequence ID in a specific object in the S3 bucket used for staging package tarballs.

    Back-filling is automatic for the NpmJs source. Upon initial deployment, it will start scanning the CouchDB _changes stream. Should there be a need to re-run a backfill of this source, the transaction marker object in S3 can be deleted to roll back to that initial transaction. The marker object is linked from the backend dashboard.

    • A high-severity alarm triggers if the NpmJs Follower is not running at the scheduled cadence, or if it encounters failures for more than 15 minutes.

    • A high-severity alarm triggers if the NpmJs Stager dead-letter queue is not empty.

    • Troubleshooting the NpmJs Follower can be done by inspecting its log traces in CloudWatch Logs, or by looking at service maps in the X-Ray console.

    • The NpmJs Follower produces a set of metrics that are automatically inserted in the Backend Dashboard, including the following:

      • NpmJsChangeAge shows how far behind the public npmjs.com registry the current CouchDB sequence ID is. In steady state (once the initial backfill has completed), this metric should always be below 5 minutes.
      • PackageVersionAge is the amount of time elapsed between the publication of a package version in the public npmjs.com registry, and when that was signalled to the ingestion SQS queue. In steady state, this metric should always be below 5 minutes.
      • UnprocessableEntity is the count of events received from the CouchDB instance that could not be processed. This metric is not emitted if no event was found unprocessable. The CloudWatch Logs for the NpmJs Follower will contain additional information about those events.
  • The CodeArtifact source leverages EventBridge events emitted by any CodeArtifact repository when packages it contains are modified (created, updated, deleted). It considers only events pertaining to npm packages published the specific CodeArtifact Repository that it is configured with. A Lambda Function verifies the package version from the event is eligible for ConstructHub (i.e: it is a jsii package, using an allowed license, etc...) before staging it in an S3 bucket, then notifying the ingestion SQS Queue.

    No backfill provision is currently implemented for the CodeArtifact source. If a ConstructHub instance is started off from a pre-existing CodeArtifact repository, the operator should manually inject all relevant packages from said repository into the ingestion queue.

    🚧 A managed back-fill procedure will be provided in the future.

    • A high-severity alarm triggers if the CodeArtifact Forwarder function encounters failures.

    • Troubleshooting the CodeArtifact Forwarder can be done by inspecting its log traces in CloudWatch Logs, or by looking at service maps in the X-Ray console.

  • Third party package-sources can also be used. Please refer to these sources' documentation for monitoring & troubleshooting guidance.

2. Ingestion

The ingestion process is implemented by a Lambda Function triggered directly from the ingestion SQS queue. It performs the following steps:

  1. Download the tarball from the S3 location indicated in the ingestion payload
  2. Validate the input payload using the integrity checksum
  3. Validate that it is eligible for indexing:
    • It contains a .jsii assembly document that is valid
    • It is released under an allowed license
    • Essential .jsii assembly corresponds to the package.json document
      • The package name must be identical
      • The package version must be identical
      • The license must be identical
    • The package version is not listed in the configured deny list
  4. Attempt to identify a LICENSE file bundled in the package
  5. Uploads the tarball to the package data S3 bucket
  6. Creates the manifest.json object in the package data S3 bucket, containing:
    • The contents of the LICENSE file (if one was found)
    • The publication timestamp for the package version
  7. Uploads the .jsii assembly to the package data S3 bucket as assembly.json
  8. Triggers the Backend Workflow for the package version

A high-severity alarm triggers if the ingestion function encounters failures, or if the ingestion SQS queue has messages older than 10 minutes approximately.

If the ingestion function fails for a particular queue message more than 5 times, that message will be moved into a dead-letter queue. A high-severity alarm triggers when the dead-letter queue is not empty.

Troubleshooting the ingestion function can be done by inspecting its log traces in CloudWatch Logs, or by looking at service maps in the X-Ray console. The function also produces several CloudWatch metrics that are visible in the Backend Dashboard, including:

  • InvalidTarball is the count of package versions that were rejected due to having an invalid tarball (missing the package.json file or .jsii assembly).
  • InvalidAssembly is the count of package versions that were rejected due to containing an invalid .jsii assembly (in most cases, these are old packages that were built using a pre-1.0 release of jsii that is no longer supported)
  • MismatchedIdentityRejections is the count of package versions that were rejected due to differences between data in the .jsii assembly and package.json files.
  • IneligibleLicense is the count of package versions that were rejected due to using a license that is not in the configured license allow-list.
  • FoundLicenseFile is the count of ingested package versions for which a LICENSE file could be identified.

3. Backend Workflow

The Backend Workflow is a StepFunctions State Machine that performs the following tasks:

  1. Execute the documentation rendering process for each supported language (TypeScript, Python, ...)
  2. If any documentation could be rendered, adds the package version to the catalog.json object (which is a no-op if the package version is not the latest known release of it's major line)

When any step of the State Machine fails, a message is sent to a dead-letter queue. That message includes information about the failure that happened (in case multiple failures happened, only one cause will be represented), and information about the State Machine execution (which can be used to review the full execution log in the AWS Console, or using the StepFunctions API).

Executions that successfully sent a message to the dead-letter queue will show as "success". Conversely, "failed" executions may not have a corresponding message in the dead-letter queue and must be troubleshooted starting from the failed execution instead.

A high-severity alarm trigegrs if the State Machine dead-letter queue is not empty, or if any execution fails.

Troubleshooting can be done by reviewing State Machine execution events in the StepFunctions console (or using the StepFunctions API), reviewing the log traces of each step in CloudWAtch Logs, or by looking service maps in the X-Ray console.

Messages from the dead-letter queue can be fed back to the State Machine by using the "Redrive DLQ" Lambda Function, that is linked from the Backend Dashboard.

4. Deny List Processes

Each ConstructHub instance can be configured with an optional set of deny-list rules, to prevent packgaes from being indexed in that instance. If a package was already indexed at the time it is added to the deny-list, all indexed assets for it will be deleted by a prune Lambda Function.

A high-severity alarm triggers if the prune function does not run at the configured cadence, or if it encounters failures.

Troubleshooting can be done by inspecting the log traces it produces in CloudWatch Logs, or by looking at service maps in the X-Ray console.

The prune function emits a Rules CloudWatch Metric that indicates how many deny-list rules it is currently enforcing. This could match the amount of rules that were configured on the ConstructHub instance.

5. Discovery Canary

The discovery canary is an optional canary that can be configured as part of the hub's deployment NpmJs package source. Its job is to continuously validate that the hub is able to discover and process packages in a timely manner.

The canary monitors the availability of new versions of a designated package in the ConstructHub instance (by default, this is construct-hub-probe), and emits metrics that help understand how much time elapses between the package publication to npmjs.com (construct-hub-probe gets a new version approximately every 3 hours), and when those packages are available to browse in ConstructHub.

A high-severity alarm triggers if the canary function is either malfunctioning or detects discovery SLA breaches. Troubleshooting these alarms is described in the operator runbook.

6. Feed Generation

Construct hub generates RSS/ATOM feed when the package catalog gets updated. The feed generator looks at the latest 100 packages and generates the feed. If the construct hub is configured to generate release notes, then the generated feed will contain the change log for the packages where it can be generated.

6. Release Notes Fetcher

Construct hub can be configured to generate release notes for the packages that are added to the catalog. The release notes are generated from Github when the release information is available. The generation of release notes looks for the following places in Github

  • Get the release notes from individual release from Github
  • Get the list of all the releases from Github and then match the release number
  • Get the changelog.md file and match the release number to generate the release notes

Github APIs are rate-limited and for the construct hub to generate the release notes, it has to be configured with Github Personal Access token. The release notes generation process uses a step function to ensure that the rate limits are respected and will back up the request when the API service limits are hit

Monitoring & Alarming

Each ConstructHub instance comes with a set of CloudWatch dashboards that can be used to monitor the current state of the instance. The name of the backend dashboard can be configured using the backendDashboardName property of the ConstructHub construct:

import { App, Stack } from '@aws-cdk/core';
import { ConstructHub } from 'construct-hub';

// The usual... you might have used `cdk init app` instead!
const app = new App();
const stack = new Stack(app, 'StackName', { /* ... */ });

// Now to business!
new ConstructHub(stack, 'ConstructHub', {
  backendDashboardName: 'ConstructHub-Backend'
});

This dashboard provides an overview of the most important process of the ConstructHub instance, and can provide insight into the cause of many problem.

In addition to this, several alarms are automatically created by the ConstructHub construct, that aim to inform operators about any problem. By default no actions are configured on these alarms, but the alarmActions property can be used to specify IAlarmAction instances to be bound to each alarm:

import { SnsAction } from '@aws-cdk/aws-cloudwatch-actions';
import { Topic } from '@aws-cdk/aws-sns';
import { App, Stack } from '@aws-cdk/core';
import { ConstructHub } from 'construct-hub';

// The usual... you might have used `cdk init app` instead!
const app = new App();
const stack = new Stack(app, 'StackName', { /* ... */ });

// Now to business!
const emergencyTopic = new Topic(stack, 'Emergencies', { /* ... */ });
const informationTopic = new Topic(stack, 'Information', { /* ... */ });

new ConstructHub(stack, 'ConstructHub', {
  alarmActions: {
    // This action triggers when immediate attention is needed!
    highSeverityAction: new SnsAction(emergencyTopic),
    // This action triggers with less urgent alarms.
    normalSeverityAction: new SnsAction(informationTopic),
  },
});