Skip to content

Commit

Permalink
docs(remote-ingestion): update description and deployment instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
darnaut committed May 23, 2024
1 parent e79bdc0 commit 1f1d781
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 31 deletions.
2 changes: 1 addition & 1 deletion docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ module.exports = {
"Operator Guide": [
{
type: "doc",
id: "docs/managed-datahub/operator-guide/setting-up-remote-ingestion-executor-on-aws",
id: "docs/managed-datahub/operator-guide/setting-up-remote-ingestion-executor",
className: "saasOnly",
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/managed-datahub/managed-datahub-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,8 @@ Fill out
## Additional Integrations
- [Slack Integration](docs/managed-datahub/saas-slack-setup.md)
- [Remote Ingestion Executor](docs/managed-datahub/operator-guide/setting-up-remote-ingestion-executor.md)
- [AWS Privatelink](docs/managed-datahub/integrations/aws-privatelink.md)
- [AWS Ingestion Executor](docs/managed-datahub/operator-guide/setting-up-remote-ingestion-executor-on-aws.md)
- [AWS Eventbridge](docs/managed-datahub/operator-guide/setting-up-events-api-on-aws-eventbridge.md)
## Additional SSO/Login Support
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,43 +6,45 @@ description: >-
---
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# Setting up Remote Executor on AWS
# Setting up the Remote Ingestion Executor
<FeatureAvailability saasOnly />

## Overview

> [!NOTE]
> Acryl Remote Executor can now be used to both run ingestions from and monitor ingestion sources that are not publicly accessible via the internet.
:::note

Acryl DataHub comes packaged with an Acryl-managed executor, which is hosted inside of Acryl's environment on your behalf. However, there are certain scenarios in which an Acryl-hosted executor is not sufficient to cover all of an organization's ingestion sources.
The Remote Executor can now also be used for monitoring of ingestion sources.

For example, if an ingestion source is not publicly accessible via the internet, e.g. hosted privately within a specific AWS account, then the Acryl executor will be unable to extract metadata from it.
:::

Acryl DataHub comes packaged with an Acryl-managed executor, which is hosted inside of Acryl's environment on your behalf. However, there are certain scenarios in which an Acryl-hosted executor is not ideal or sufficient to cover all of an organization's ingestion sources. For example, if an ingestion source is hosted behind a firewall or in an environment with strict access policies, then the Acryl executor might be unable to connect to it to extract metadata.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/saas/image-(12).png"/>
</p>
To accommodate these cases, Acryl supports configuring a Remote Ingestion Executor which can be deployed inside of your environment – whether that is on-prem or in cloud. This setup allows you to continue leveraging the Acryl DataHub console to create, schedule, and run both ingestion and assertion monitors, all while retaining network and credential isolation.

## Deploying a Remote Executor

To accommodate these cases, Acryl supports configuring a remote executor which can be deployed inside of your AWS account. This setup allows you to continue leveraging the Acryl DataHub console to create, schedule, and run both ingestion and assertion monitors, all while retaining network and credential isolation.
:::note

The Remote Ingestion Executor is only available for Managed DataHub. Setting up a new executor requires coordination with your Acryl representative.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/saas/image-(6).png"/>
</p>
:::

The Remote Ingestion Executor can be deployed on several different platforms, including [Amazon ECS](#deploying-on-amazon-ecs), [Kubernetes](#deploying-on-kubernetes) (GKE, EKS or self-hosted) and others. It can also be deployed using several different methods, including [CloudFormation](https://raw.githubusercontent.com/acryldata/datahub-cloudformation/master/remote-executor/datahub-executor.ecs.template.yaml) or [Terraform](https://github.com/acryldata/datahub-terraform-modules/tree/main/remote-ingestion-executor) templates for ECS, or [Helm chart](https://github.com/acryldata/datahub-executor-helm) for Kubernetes. Please reach out to your Acryl representative for other alternatives.

## Deploying a Remote Executor

> [!NOTE]
> Customers migrating from the legacy DataHub Executor: migration to the new executor requires a configuration change on Acryl side. Please contact your Acryl representative for detailed guidance.
>
> Steps you will need to perform on your end when instructed by your Acryl representative:
> 1. Temporarily stop your legacy DataHub Remote Executor instance (e.g. `aws ecs update-service --desired-count 0 --cluster "cluster-name" --service "service-name"`)
> 2. Deploy new DataHub Executor using steps below.
> 3. Trigger an ingestion to make sure the new executor is working as expected.
> 4. Tear down legacy executor ECR deployment.
### Deploying on Amazon ECS

:::note

Customers migrating from the legacy DataHub Executor: migration to the new executor requires a configuration change on Acryl side. Please contact your Acryl representative for detailed guidance.

Steps you will need to perform on your end when instructed by your Acryl representative:
1. Temporarily stop your legacy DataHub Remote Executor instance (e.g. `aws ecs update-service --desired-count 0 --cluster "cluster-name" --service "service-name"`)
2. Deploy new DataHub Executor using steps below.
3. Trigger an ingestion to make sure the new executor is working as expected.
4. Tear down legacy executor ECR deployment.

:::

1. **Provide AWS Account Id**: Provide Acryl Team with the id of the AWS in which the remote executor will be hosted. This will be used to grant access to private Acryl ECR registry. The account id can be provided to your Acryl representative via Email or [One Time Secret](https://onetimesecret.com/).

Expand Down Expand Up @@ -93,7 +95,7 @@ To accommodate these cases, Acryl supports configuring a remote executor which c
1. Create a new Ingestion Source by clicking '**Create new Source**' the '**Ingestion**' tab of the DataHub console. Configure your Ingestion Recipe as though you were running it from inside of your environment.
2. When working with "secret" fields (passwords, keys, etc), you can refer to any "self-managed" secrets by name: `${SECRET_NAME}:`


<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/saas/Screen-Shot-2023-01-19-at-4.16.52-PM.png"/>
</p>
Expand All @@ -109,9 +111,9 @@ To accommodate these cases, Acryl supports configuring a remote executor which c
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/saas/Screen-Shot-2022-03-07-at-10.23.31-AM.png"/>
</p>

## Updating a Remote Executor
#### Updating a deployment
In order to update the executor, ie. to deploy a new container version, you'll need to update the CloudFormation Stack to re-deploy the CloudFormation template with a new set of parameters.
### Steps - AWS Console
##### Steps - AWS Console
1. Navigate to CloudFormation in AWS Console
2. Select the stack dedicated to the remote executor
3. Click **Update**
Expand All @@ -128,6 +130,22 @@ In order to update the executor, ie. to deploy a new container version, you'll n
9. Click **Next**
10. Confirm your parameter changes, and update. This should perform the necessary upgrades.

### Deploying on Kubernetes

The Helm chart [datahub-executor-worker](https://github.com/acryldata/datahub-executor-helm/tree/main/charts/datahub-executor-worker) can be used to deploy on a Kubernetes cluster. These instructions also apply for deploying to Amazon Elastic Kubernetes Service (EKS) or Google Kubernetes Engine (GKE).

1. Download the [latest release](https://github.com/acryldata/datahub-executor-helm/releases) of the chart
2. Unpack the release archive
```
tar zxvf v0.0.4.tar.gz --strip-components=2
```
3. Edit the `datahub-executor-worker/values.yaml` file and change any required details. The value of `worker_id` should be set to the executor ID. The executor ID is used for routing an ingestion source task to the appropriate remote executor.
4. The remote executor image is hosted on a private registry. Contact your Acryl representative to setup access to pull the image. For access within AWS, you will need to provide the IAM principal which will be allowed to pull from the ECR repository. For Google Cloud, you will need to provide the cluster's IAM service account.
5. Install the chart
```
helm install --namespace acryl-remote-executor acryl .
```

## FAQ

### If I need to change (or add) a secret that is stored in AWS Secrets Manager, e.g. for rotation, will the new secret automatically get picked up by Acryl's executor?**
Expand All @@ -136,7 +154,7 @@ Unfortunately, no. Secrets are wired into the executor container at deployment t

### I want to deploy multiple Acryl Executors. Is this currently possible?**

This is possible, but currently requires a configuration change on Acryl side. Please contact your Acryl representative for more information.
Yes. Please contact your Acryl representative for details.

### I've run the CloudFormation Template, how can I tell that the container was successfully deployed?**

Expand All @@ -147,6 +165,3 @@ Starting datahub executor worker
```
This indicates that the remote executor has established a successful connection to your DataHub instance and is ready to execute ingestion & monitors.
If you DO NOT see this log line, but instead see something else, please contact your Acryl representative for support.

## Release Notes
This is where release notes for the Acryl Remote Executor Container will live.
2 changes: 1 addition & 1 deletion docs/ui-ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,7 @@ for the `datahub-actions` container and running `docker logs <container-id>`.
There are valid cases for ingesting metadata without the UI-based ingestion scheduler. For example,

- You have written a custom ingestion Source
- Your data sources are not reachable on the network where DataHub is deployed. Managed DataHub users can use a [remote executor](managed-datahub/operator-guide/setting-up-remote-ingestion-executor-on-aws.md) for remote UI-based ingestion.
- Your data sources are not reachable on the network where DataHub is deployed. Managed DataHub users can use a [remote executor](managed-datahub/operator-guide/setting-up-remote-ingestion-executor.md) for remote UI-based ingestion.
- Your ingestion source requires context from a local filesystem (e.g. input files)
- You want to distribute metadata ingestion among multiple producers / environments

Expand Down

0 comments on commit 1f1d781

Please sign in to comment.