Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RC: RDI in the cloud #570

Draft
wants to merge 51 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
c4d39e6
begin index and setup drafts
kaitlynmichael Aug 19, 2024
d586971
overview draft
kaitlynmichael Aug 22, 2024
b5beb6a
edit desc
kaitlynmichael Aug 22, 2024
10d5de8
fix relref
kaitlynmichael Aug 22, 2024
c9e5161
get started, doc link
kaitlynmichael Aug 22, 2024
42fcff9
prepare source database with link
kaitlynmichael Aug 22, 2024
17270d8
setup connectivity
kaitlynmichael Aug 22, 2024
9e0de8e
connectivity steps
kaitlynmichael Aug 22, 2024
efc8b0b
pane > box
kaitlynmichael Aug 22, 2024
20c3050
Limitations draft
kaitlynmichael Aug 23, 2024
c3b7bca
Apply suggestions from code review
cmilesb Aug 30, 2024
85c6d08
Merge branch 'main' into DOC-4137
cmilesb Sep 3, 2024
e938762
Edits and DB credentials changes
cmilesb Sep 3, 2024
a5977a2
Add setup step and more edits
cmilesb Sep 6, 2024
289f0e4
Move last step to provision
cmilesb Sep 11, 2024
1d6fce4
Merge branch 'main' into DOC-4137
cmilesb Sep 13, 2024
272e518
Add some of Yaron's suggestions to intro
cmilesb Sep 13, 2024
9d12553
Apply suggestions from code review
cmilesb Sep 19, 2024
1551e1d
Merge branch 'main' into DOC-4137
cmilesb Sep 19, 2024
a32204e
Add define steps
cmilesb Sep 20, 2024
07bc249
Merge branch 'main' into DOC-4137
cmilesb Sep 20, 2024
bc41625
Merge branch 'main' into DOC-4137
cmilesb Sep 20, 2024
71f431c
Fix relrefs
cmilesb Sep 20, 2024
e8ea310
stash commit
cmilesb Sep 23, 2024
860944a
Add security information
cmilesb Sep 24, 2024
639fc50
Merge branch 'main' into DOC-4137
cmilesb Sep 27, 2024
697d665
Merge branch 'main' into DOC-4137
cmilesb Oct 7, 2024
fa87f86
Add View/Edit
cmilesb Oct 7, 2024
49d1f54
Incorporate Yaron's feedback, part 1
cmilesb Oct 11, 2024
4abac23
Merge branch 'main' into DOC-4137
cmilesb Oct 11, 2024
4d69274
stash commit
cmilesb Oct 15, 2024
129d6af
More suggestions from Yaron
cmilesb Oct 21, 2024
99229b1
Merge branch 'main' into DOC-4137
cmilesb Oct 21, 2024
c3e501e
Apply suggestions from code review
cmilesb Oct 22, 2024
01907d8
Merge branch 'main' into DOC-4137
cmilesb Oct 24, 2024
c9469e0
Combine define and provision and remove errors section
cmilesb Oct 25, 2024
d56188a
Merge branch 'main' into DOC-4137
cmilesb Nov 5, 2024
f817848
Add secret permissions and keys
cmilesb Nov 5, 2024
8f274f5
Update content/operate/rc/databases/rdi/define.md
cmilesb Nov 5, 2024
2d2820d
Fix note
cmilesb Nov 5, 2024
46603a6
replace account ID in resource permissions
cmilesb Nov 5, 2024
89e0c4f
DOC-4548 Setup and define screenshots
cmilesb Nov 12, 2024
962c1e9
Merge branch 'main' into DOC-4137
cmilesb Nov 12, 2024
2b110b4
stash commit
cmilesb Nov 12, 2024
a51743b
Merge branch 'main' into DOC-4137
cmilesb Nov 12, 2024
35c8dc5
Fix screenshot widths
cmilesb Nov 12, 2024
52412cc
add edit pipeline images
cmilesb Nov 12, 2024
7ef44a2
Apply suggestions from code review
cmilesb Nov 25, 2024
d0bf623
Merge branch 'main' into DOC-4137
cmilesb Nov 25, 2024
07b45a0
Merge branch 'main' into DOC-4137
cmilesb Nov 27, 2024
ff2c061
Added metrics
cmilesb Nov 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions content/operate/rc/databases/rdi/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
Title: Data Integration
alwaysopen: false
categories:
- docs
- operate
- rc
description: Use Redis Data Integration with Redis Cloud.
hideListLinks: true
weight: 99
---

Redis Cloud now supports [Redis Data Integration (RDI)]({{<relref "integrate/redis-data-integration">}}), a fast and simple way to bring your data into Redis from other types of primary databases.

A relational database usually handles queries much more slowly than a Redis database. If your application uses a relational database and makes many more reads than writes (which is the typical case) then you can improve performance by using Redis as a cache to handle the read queries quickly. Redis Cloud uses [ingest]({{<relref "/integrate/redis-data-integration/">}}) to help you offload all read queries from the application database to Redis automatically.

Using a data pipeline lets you have a cache that is always ready for queries. RDI Data pipelines ensure that any changes made to your primary database are captured in your Redis cache within a few seconds, preventing cache misses and stale data within the cache.

RDI helps Redis customers sync Redis Cloud with live data from their primary databases to:
- Meet the required speed and scale of read queries and provide an excellent and predictable user experience.
- Save resources and time when building pipelines and coding data transformations.
- Reduce the total cost of ownership by saving money on expensive database read replicas.

Using RDI with Redis Cloud simplifies managing your data integration pipeline. No need to worry about hardware or underlying infrastructure, as Redis Cloud manages that for you. Creating the data flow from source to target is much easier, and there are validations in place to reduce errors.

## Data pipeline architecture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically in cloud service, you don't explain about the architecture too much but more about the functionality. I think in this case what matters to the user:

  • The fact that it can start with "backfill" process taking a baseline snapshot of the desired data
  • then move automatically to track the changes for this dataset
  • guaranteed to deliver at least once each record unless the RDI database crushed (and then can go back to baseline)
  • can transform relational database rows to Hash or JSON in Redis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure I cover the "backfill" process by using the term "ingest" and the "tracking changes" by talking about "change streaming" - are these meaningfully different terms?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Hash/JSON in 129d6af.


A RDI data pipeline sits between your source database and your target Redis database. Initially, the pipeline reads all of the data and imports it into the target database during the *initial cache loading* phase. After this initial sync is complete, the data pipeline enters the *change streaming* phase, where changes are captured as they happen. Changes in the source database are added to the target within a few seconds of capture. The data pipeline translates relational database rows to Redis hashes or JSON documents.

For more info on how RDI works, see [RDI Architecture]({{<relref "/integrate/redis-data-integration/architecture">}}).

### Pipeline security

Data pipelines are set up to ensure a high level of data security. Source database credentials and TLS secrets are stored in AWS secret manager and shared using the AWS Secret Manager CSI driver for secrets. See [Share source database credentials]({{<relref "/operate/rc/databases/rdi/setup#share-source-database-credentials">}}) to learn how to share your source database credentials and TLS certificates with Redis Cloud.

Connections to the source database use Java Database Connectivity (JDBC) through [AWS PrivateLink](https://aws.amazon.com/privatelink/), ensuring that the data pipeline is only exposed to the specific database endpoint. See [Set up connectivity]({{<relref "/operate/rc/databases/rdi/setup#set-up-connectivity">}}) to learn how to connect your PrivateLink to the Redis Cloud VPC.

RDI encrypts all network connections with TLS. The pipeline will process data from the source database in-memory and write it to the target database using a TLS connection. There are no external connections to your data pipeline except from Redis Cloud management services.

## Prerequisites

Before you can create a data pipeline, you must have:

- A [Redis Cloud Pro database]({{< relref "/operate/rc/databases/create-database/create-pro-database-new" >}}) hosted on Amazon Web Services (AWS)
- One supported source database, also hosted on AWS and connected to [AWS PrivateLink](https://aws.amazon.com/privatelink/):
- MySQL
cmilesb marked this conversation as resolved.
Show resolved Hide resolved
- Oracle
- SQL Server
- PostgreSQL
- mariaDB
- Aurora

{{< note >}}
Please be aware of the following limitations:

- The target database must be a Redis Cloud Pro database hosted on Amazon Web Services (AWS). Redis Cloud Essentials databases and databases hosted on Google Cloud do not support Data Integration.
- Source databases must also be hosted on AWS.
- One source database can only be synced to one target database.
{{< /note >}}

## Get started

To create a new data pipeline, you need to:

1. [Prepare your source database]({{<relref "/operate/rc/databases/rdi/setup">}}) and any associated credentials.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for preparing we need to create the private link and the secret

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that page also includes PrivateLink and creating the credentials secret.

2. [Provision data pipeline infrastructure]({{<relref "/operate/rc/databases/rdi/provision">}}) and troubleshoot errors.
3. [Define the data pipeline]({{<relref "/operate/rc/databases/rdi/define">}}) by selecting which tables to sync.

Once your data pipeline is defined, you can [view and edit]({{<relref "/operate/rc/databases/rdi/view-edit">}}) it.
37 changes: 37 additions & 0 deletions content/operate/rc/databases/rdi/define.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
Title: Define data pipeline
alwaysopen: false
categories:
- docs
- operate
- rc
description: Define your data pipeline by selecting which tables to sync.
hideListLinks: true
weight: 3
---

After you have [provisioned your data pipeline]({{<relref "/operate/rc/databases/rdi/provision]">}}), you need to define it. You will select the database schemas and columns that you want to import and synchronize with your primary database.
cmilesb marked this conversation as resolved.
Show resolved Hide resolved

## Configure a new pipeline

1. In the [Redis Cloud console](https://cloud.redis.io/), go to your target database and select the **Data Pipeline** tab. If your pipeline is already provisioned, select **Complete setup** to go to the **Pipeline definition** section.
1. For the **Configure a new pipeline** option, select the Redis data type to write keys to the target. You can choose **Hash** or **JSON**.

Select **Continue**.
1. Select the Schema and Tables you want to migrate to the target database from the **Source data selection** list. You can select any number of columns from a table.

If any tables are missing a unique constraint, the **Missing unique constraint** list will appear. Select the columns that define a unique constraint for those tables from the list.

Select **Add schema** to add additional database schemas.

Select **Delete** to delete a schema. You must have at least one schema to continue.

After you've selected the schemas and tables you want to sync, select **Continue**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be a sublist (bulletted)? I definitely think it should be because the formatting currently looks a bit odd without it.

Suggested change
Select **Add schema** to add additional database schemas.
Select **Delete** to delete a schema. You must have at least one schema to continue.
After you've selected the schemas and tables you want to sync, select **Continue**.
- Select **Add schema** to add additional database schemas.
- Select **Delete** to delete a schema. You must have at least one schema to continue.
- After you've selected the schemas and tables you want to sync, select **Continue**.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be a lot easier to parse once I add screenshots.


1. Review the tables you selected in the **Summary**. If everything looks correct, select **Start ingest** to start ingesting data from your source database.

At this point, the data pipeline will ingest data from the source database to your target Redis database. This process will take time, especially if you have a lot of records in your source database.

After this initial sync is complete, the data pipeline enters the *change streaming* phase, where changes are captured as they happen. Changes in the source database are added to the target within a few seconds of capture.

You can view the status of your data pipeline in the **Data pipeline** tab of your database. See [View and edit data pipeline]({{<relref "/operate/rc/databases/rdi/view-edit">}}) to learn more.
38 changes: 38 additions & 0 deletions content/operate/rc/databases/rdi/provision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
Title: Provision data pipeline
alwaysopen: false
categories:
- docs
- operate
- rc
description: Provision and troubleshoot your data pipeline infrastructure.
hideListLinks: true
weight: 2
---

After you have [prepared your source database]({{<relref "/operate/rc/databases/rdi/setup">}}) and connection information, you can set up your new pipeline.

1. In the [Redis Cloud console](https://cloud.redis.io/), go to your target database and select the **Data Pipeline** tab.
1. Select **Start pipeline setup**.
1. Enter a **Pipeline name**. This pipeline name will be the prefix to all keys generated by this pipeline in the target database.
1. Enter the **Deployment CIDR** for your pipeline, or use the one generated for you. This CIDR should not conflict with your apps or other databases.
1. In the **Connectivity** section, enter the **PrivateLink service name** of the [PrivateLink connected to your source database]({{< relref "/operate/rc/databases/rdi/setup#set-up-connectivity" >}}).
1. Enter your database details. This depends on your database type, and includes:
- **Port**: The database's port
- **Database**: Your database's name, or the root database *(PostgresSQL, Oracle only)*; or a comma-separated list of one or more databases you want to connect to *(SQL Server only)*
cmilesb marked this conversation as resolved.
Show resolved Hide resolved
- **Database Server ID**: Unique ID for the replication client. Leave as default if you don't use replication *(mySQL and mariaDB only)*
- **PDB**: Name of the Oracle pluggable database *(Oracle only)*
1. Enter the ARN of your [database credentials secret]({{< relref "/operate/rc/databases/rdi/setup#share-source-database-credentials" >}}) in the **Source database secrets ARN** field.
1. Select **Start pipeline setup**.

At this point, Redis Cloud will provision the pipeline infrastructure that will allow you to define your data pipeline.

Pipelines are provisioned in the background. You aren't allowed to make changes to your data pipeline or to your database during provisioning. This process will take a long time, so you can close the window and come back later.

See [Pipeline provisioning errors](#errors) to view a list of errors that can occur at this point.

When your pipeline is provisioned, select **Complete setup**. You will then [define your data pipeline]({{<relref "/operate/rc/databases/rdi/define">}}).

## Pipeline provisioning errors {#errors}

Add errors here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you haven't been provided with the list of errors yet :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, yes. I need to go through the mockups again.

66 changes: 66 additions & 0 deletions content/operate/rc/databases/rdi/setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
Title: Prepare source database
alwaysopen: false
categories:
- docs
- operate
- rc
description: Prepare your source database and database credentials for Data integration.
hideListLinks: true
weight: 1
---

## Create new data pipeline

1. In the [Redis Cloud console](https://cloud.redis.io/), go to your target database and select the **Data Pipeline** tab.
1. Select **Create data pipeline**.
1. Select your source database type. The following database types are supported:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also mention MariaDB and Aurora in the intro page. AFAIK, MariaDB is basically the same as MySQL, so I generally write MySQL/MariaDB together. Is Aurora specific to cloud RDI, with it being an Amazon product? Also, if we do support Aurora for this then do we need some extra instructions for it in the Prepare Source Databases section?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaronp68 - Can you elaborate more on the Aurora inclusion? Is it the same as mySQL/mariaDB, or do we need to add more information as Andy said?

- MySQL
- Oracle
- SQL Server
- PostgreSQL
1. If you know the size of your source database, enter it into the **Source dataset size** field.

## Prepare source database

Before using the pipeline, you must first prepare your source database to use the Debezium connector for change data capture (CDC).

See [Prepare source databases]({{<relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs/">}}) to find steps for your database type.

See the [RDI architecture overview]({{< relref "/integrate/redis-data-integration/architecture#overview" >}}) for more information about CDC.

## Share source database credentials
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credentials and TLS certificates

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's covered in this section. I thought "credentials" in the header would include certificates without making the header too long.


You need to share your source database credentials and certificates in an Amazon secret with Redis Cloud so that the pipeline can connect to your database.

In the [AWS Management Console](https://console.aws.amazon.com/), use the **Services** menu to locate and select **Security, Identity, and Compliance** > **Secrets Manager**. [Create a secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) of type **Other type of secret** with the following key/value fields:

- `username`: Database username
- `password`: Database password
- `server_certificate`: Server certificate in PEM format *(TLS only)*
- `client_certificate`: [X.509 client certificate](https://en.wikipedia.org/wiki/X.509) or chain in PEM format *(mTLS only)*
- `client_certificate_key`: Key for the client certificate or chain in PEM format *(mTLS only)*
- `client_certificate_passphrase`: Passphrase or password for the client certificate or chain in PEM format *(mTLS only)*

{{<note>}}
If your source database has TLS or mTLS enabled, we recommend that you enter the `server_certificate`, `client_certificate`, and `client_certificate_key` into the secret editor using the **Key/Value** input method instead of the **JSON** input method. Pasting directly into the JSON editor may cause an error.
{{</note>}}

After you store this secret, you can view and copy the [Amazon Resource Name (ARN)](https://docs.aws.amazon.com/secretsmanager/latest/userguide/reference_iam-permissions.html#iam-resources) of your secret on the secret details page.

## Set up connectivity

To expose your source database to Redis, you need to add Redis Cloud as an Allowed Principal on the [AWS PrivateLink VPC permissions](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#add-remove-permissions) for the PrivateLink connected to your source database.

1. Copy the Amazon Resource Name (ARN) provided in the **Setup connectivity** section.
1. Open the [Amazon VPC console](https://console.aws.amazon.com/vpc/) and select **Endpoint services**.
1. Navigate to **Allow principals** tab.
1. Add the Redis Cloud ARN and choose **Allow principals**.
1. Copy your PrivateLink service name for later.

For more details on AWS PrivateLink, see [Share your services through AWS PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html).


## Next steps

After you have set up your source database and prepared connectivity and credentials, select **Start pipeline setup** to [provision data pipeline infrastructure]({{<relref "/operate/rc/databases/rdi/provision">}}).
81 changes: 81 additions & 0 deletions content/operate/rc/databases/rdi/view-edit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
Title: View and edit data pipeline
alwaysopen: false
categories:
- docs
- operate
- rc
description: Edit and observe your data pipeline.
hideListLinks: true
weight: 4
---

You can use the **Data pipeline** tab in your database to view and edit it.
cmilesb marked this conversation as resolved.
Show resolved Hide resolved

The **Data pipeline** tab gives an overview of your data pipeline and lets you view your data stream metrics.

## Edit data pipeline

To change the data you want to ingest from the data pipeline:

1. From the **Data pipeline** tab, select **Edit**.

1. For the **Configure a new pipeline** option, select the Redis data type to write keys to the target. You can choose **Hash** or **JSON**.

Select **Continue**.

1. Select the schema and tables you want to migrate to the target database from the **Source data selection** list. You can select any number of columns from a table.

If any tables are missing a unique constraint, the **Missing unique constraint** list will appear. Select the columns that define a unique constraint for those tables from the list.

Select **Add schema** to add additional database schema.

Select **Delete** to delete a schema. You must have at least one schema to continue.

After you've selected the schemas and tables you want to sync, select **Continue**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sublist again, maybe?

Suggested change
Select **Add schema** to add additional database schema.
Select **Delete** to delete a schema. You must have at least one schema to continue.
After you've selected the schemas and tables you want to sync, select **Continue**.
- Select **Add schema** to add additional database schema.
- Select **Delete** to delete a schema. You must have at least one schema to continue.
- After you've selected the schemas and tables you want to sync, select **Continue**.


1. Review the tables you selected in the **Summary** and select how you want to update the data pipeline:

- **Apply to new data changes only**: The data pipeline will only synchronize new updates to the schema and tables selected. The data pipeline will not ingest any data from new schemas or tables that are selected.
- **Reset pipeline (re-process all data)**: The data pipeline will re-ingest all of the selected data.
- **Flush cached data and reset pipeline**: The data pipeline will flush the target Redis database, and then re-ingest all of the selected data from the source database.

1. Select **Apply changes**.

At this point, the data pipeline will apply the changes. If you selected **Reset pipeline** or **Flush cached data and reset pipeline**, the data pipeline will ingest data from the source database to the target database. After this initial sync is complete, the data pipeline enters the *change streaming* phase, where changes are captured as they happen.

If you selected **Apply to new data changes only**, the data pipeline will enter the *change streaming* phase without ingesting the data.

## Reset data pipeline

Resetting the data pipeline creates a new baseline snapshot from the current state of your source database, and re-processes the data from the source database to the target Redis database. You may want to reset the pipeline if the source and target databases were disconnected or you made large changes to the data pipeline.

To reset the data pipeline and restart the ingest process:
cmilesb marked this conversation as resolved.
Show resolved Hide resolved

1. From the **Data pipeline** tab, select **More actions**, and then **Reset pipeline**.

1. If you want to flush the database, check **Flush target database**.

1. Select **Reset data pipeline**.

At this point, the data pipeline will re-ingest data from the source database to your target Redis database.

## Stop and restart data pipeline

To stop the data pipeline from synchronizing new data:

1. From the **Data pipeline** tab, select **More actions**, and then **Stop pipeline**.

1. Select **Stop data pipeline** to confirm.

Stopping the data pipeline will suspend data processing. To restart the pipeline from the **Data pipeline** tab, select **More actions**, and then **Start pipeline**.

## Delete pipeline

To delete the data pipeline:

1. From the **Data pipeline** tab, select **More actions**, and then **Delete pipeline**.

1. Select **Delete data pipeline** to confirm.

Deleted data pipelines cannot be recovered.
Loading