Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add removal policies for the tables created by integrations #158

Open
Jon-AtAWS opened this issue Jun 6, 2024 · 2 comments
Open
Labels
enhancement New feature or request integration integration related content

Comments

@Jon-AtAWS
Copy link
Member

Is your feature request related to a problem?
No

What solution would you like?
If I use an integration, it creates various tables in GDC. When I delete the integration, OpenSearch does not delete those tables. While this is the right behavior in many scenarios, there are many other scenarios where I want to delete all associated resources, including the tables.

Can we add a "removal policy" to the integration that lets me specify what OpenSearch should do when the integration is deleted? The AWS Cloud Development Kit (CDK) has a RemovalPolicy that standardizes how the generated template should handle deletes for AWS resources. For example, I can create an S3 bucket with this Python code

        s3_bucket = s3.Bucket(self, f'MyGreatS3Bucket',
            block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
            bucket_name=BUCKET_NAME,
            enforce_ssl=True,
            versioned=True,
            removal_policy=cdk.RemovalPolicy.DESTROY,
            auto_delete_objects=True,
        )

cdk.RemovalPolicy.DESTROY causes CDK to generate the following resource in the CloudFormation template

  MyGreatBucket:
    Type: AWS::S3::Bucket
    Properties:
      ....
    UpdateReplacePolicy: Delete
    DeletionPolicy: Delete
      ...

Other choices are RemovalPolicy.SNAPSHOT and RemovalPolicy.RETAIN. This seems like a good way for me to specify what I want to happen.

@Jon-AtAWS Jon-AtAWS added enhancement New feature or request untriaged labels Jun 6, 2024
@YANG-DB YANG-DB added the integration integration related content label Jun 6, 2024
@Swiddis
Copy link
Collaborator

Swiddis commented Jun 7, 2024

Thanks for the request!

I'm thinking of how to try and go about it. We probably need to get the table and MV information added in the instance object, with steps to delete for each object type, and some flag. It might be tricky because we don't necessarily know ahead of time every type of GDC object that we'll need to delete (with DROP TABLE or DROP MATERIALIZED VIEW or other options).

Maybe adding some sort of glue_params object with a resources array:

"glue_params": {
	"removal_policy": "delete", // Assuming we don't need anything more granular than a global policy enum

	// Insert in install order, probably need to delete in reverse order
    // (does Spark SQL let you drop everything in parallel?)
	"resources": [
		{
			"name": "s3conn.database.table",
			"type": "table"
		},
		{
			"name": "s3conn.database.index",
			"type": "skipping_index"
		}
	]
}

Then we just need to add a way to specify this policy in the install process, probably as part of building.

Related is the issue of letting integrations drop tables during a failed install if earlier queries succeed; being able to do that rollback would also require type-based dropping, but is tricky because many integrations use CREATE IF NOT EXISTS queries so we can't blindly delete everything we touch.

@YANG-DB can you triage?

@Swiddis Swiddis removed the untriaged label Jun 7, 2024
@Swiddis
Copy link
Collaborator

Swiddis commented Jun 7, 2024

can you triage?

"Fine, I'll do it myself."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request integration integration related content
Projects
Status: No status
Status: No status
Development

No branches or pull requests

3 participants