Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic cleanup jobs #1052

Merged
merged 29 commits into from
Feb 1, 2023
Merged

Automatic cleanup jobs #1052

merged 29 commits into from
Feb 1, 2023

Conversation

achantavy
Copy link
Contributor

@achantavy achantavy commented Dec 19, 2022

Uses the schema in #1038 to automatically create cartography cleanup jobs.

Refactors AWS EMR sync to use this new method and adds tests.

Before

{
    "statements": [
        {
            "query": "MATCH (:AWSAccount{id: $AWS_ID})-[:RESOURCE]->(n:EMRCluster) WHERE n.lastupdated <> $UPDATE_TAG WITH n LIMIT $LIMIT_SIZE DETACH DELETE (n)",
            "iterative": true,
            "iterationsize": 100
        },
        {
            "query": "MATCH (:AWSAccount{id: $AWS_ID})-[r:RESOURCE]->(:EMRCluster) WHERE r.lastupdated <> $UPDATE_TAG WITH r LIMIT $LIMIT_SIZE DELETE r",
            "iterative": true,
            "iterationsize": 100
        }
    ],
    "name": "cleanup EMRCluster"
}
def cleanup(neo4j_session: neo4j.Session, common_job_parameters: Dict) -> None:
    run_cleanup_job('aws_import_emr_cleanup.json', neo4j_session, common_job_parameters)

Now

def cleanup(neo4j_session: neo4j.Session, common_job_parameters: Dict) -> None:
    cleanup_job = GraphJob.from_node_schema(EMRClusterSchema(), common_job_parameters)
    cleanup_job.run(neo4j_session)

@achantavy achantavy changed the title [WIP] Automatic cleanup jobs Automatic cleanup jobs Dec 29, 2022
@achantavy achantavy marked this pull request as ready for review December 29, 2022 08:42
@@ -86,6 +124,36 @@ def from_json(cls, blob: str, short_name: Optional[str] = None) -> 'GraphJob':
name = data["name"]
return cls(name, statements, short_name)

@classmethod
def from_node_schema(cls, node_schema: CartographyNodeSchema, parameters: Dict[str, Any]) -> 'GraphJob':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to cleanup_job_from_node_schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a GraphJob.from_json_file(), so I think we should follow the naming convention

@achantavy achantavy force-pushed the autocleanup branch 2 times, most recently from 2595997 to 14fcbae Compare January 3, 2023 23:35
cartography/graph/cleanupbuilder.py Show resolved Hide resolved
cartography/graph/querybuilder.py Outdated Show resolved Hide resolved
cartography/graph/cleanupbuilder.py Outdated Show resolved Hide resolved
cartography/graph/cleanupbuilder.py Outdated Show resolved Hide resolved
cartography/graph/job.py Show resolved Hide resolved
cartography/graph/cleanupbuilder.py Show resolved Hide resolved
cartography/graph/cleanupbuilder.py Outdated Show resolved Hide resolved
cartography/graph/cleanupbuilder.py Outdated Show resolved Hide resolved
cartography/graph/job.py Outdated Show resolved Hide resolved
actual_param_keys: Set[str] = set(parameters.keys())
# Hacky, but LIMIT_SIZE is specified by default in cartography.graph.statement, so we exclude it from validation
actual_param_keys.add('LIMIT_SIZE')
if actual_param_keys != expected_param_keys:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this

Copy link
Collaborator

@ramonpetgrave64 ramonpetgrave64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I really would like to see _build_cleanup_node_query() and _build_cleanup_rel_query() combined into one function.

_build_cleanup_rel_query(node_schema, rel),
)

if sub_resource_rel:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this conditional, since we would raise an exception first.

if sub_resource_rel:
# Make sure that the sub resource one is last in the list; order matters.
result.append(
_build_cleanup_node_query(node_schema),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be explicit about cleaning up the sub_resource_rel, since we will always require the node to have it. And it would simplify the logic a bit in _build_cleanup_node_query()

cartography/graph/cleanupbuilder.py Outdated Show resolved Hide resolved
tests/integration/cartography/graph/test_cleanupbuilder.py Outdated Show resolved Hide resolved
@achantavy achantavy mentioned this pull request Jan 30, 2023
InterestingAssetToHelloAssetRel(),
)
-->
MATCH (src:InterestingAsset)<-[:RELATIONSHIP_LABEL]-(:SubResource{id: $sub_resource_id})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: sketch out unifying this function with build_cleanup_node_query

Copy link
Collaborator

@ramonpetgrave64 ramonpetgrave64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last thing, I think about the example path

cartography/graph/cleanupbuilder.py Outdated Show resolved Hide resolved
Co-authored-by: Ramon Petgrave <[email protected]>
Copy link
Collaborator

@ramonpetgrave64 ramonpetgrave64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@achantavy achantavy merged commit e320232 into master Feb 1, 2023
@achantavy achantavy deleted the autocleanup branch February 1, 2023 19:58
chandanchowdhury pushed a commit to juju4/cartography that referenced this pull request Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants