-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic cleanup jobs #1052
Automatic cleanup jobs #1052
Conversation
cartography/graph/job.py
Outdated
@@ -86,6 +124,36 @@ def from_json(cls, blob: str, short_name: Optional[str] = None) -> 'GraphJob': | |||
name = data["name"] | |||
return cls(name, statements, short_name) | |||
|
|||
@classmethod | |||
def from_node_schema(cls, node_schema: CartographyNodeSchema, parameters: Dict[str, Any]) -> 'GraphJob': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to cleanup_job_from_node_schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a GraphJob.from_json_file()
, so I think we should follow the naming convention
2595997
to
14fcbae
Compare
actual_param_keys: Set[str] = set(parameters.keys()) | ||
# Hacky, but LIMIT_SIZE is specified by default in cartography.graph.statement, so we exclude it from validation | ||
actual_param_keys.add('LIMIT_SIZE') | ||
if actual_param_keys != expected_param_keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this
ca92f96
to
ea6ae44
Compare
5e36e04
to
43188b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I really would like to see _build_cleanup_node_query() and _build_cleanup_rel_query() combined into one function.
cartography/graph/cleanupbuilder.py
Outdated
_build_cleanup_rel_query(node_schema, rel), | ||
) | ||
|
||
if sub_resource_rel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this conditional, since we would raise an exception first.
cartography/graph/cleanupbuilder.py
Outdated
if sub_resource_rel: | ||
# Make sure that the sub resource one is last in the list; order matters. | ||
result.append( | ||
_build_cleanup_node_query(node_schema), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be explicit about cleaning up the sub_resource_rel
, since we will always require the node to have it. And it would simplify the logic a bit in _build_cleanup_node_query()
cartography/graph/cleanupbuilder.py
Outdated
InterestingAssetToHelloAssetRel(), | ||
) | ||
--> | ||
MATCH (src:InterestingAsset)<-[:RELATIONSHIP_LABEL]-(:SubResource{id: $sub_resource_id}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: sketch out unifying this function with build_cleanup_node_query
43188b5
to
0bbf749
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last thing, I think about the example path
Co-authored-by: Ramon Petgrave <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
Uses the schema in #1038 to automatically create cartography cleanup jobs.
Refactors AWS EMR sync to use this new method and adds tests.
Before
Now