-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#349: Generalize graph write queries #1038
Conversation
) -> None: | ||
query = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR refactors the EMR sync to demonstrate using the new functionality. Rather than hand write queries, module authors now define schema objects and call build_ingestion_query()
and load_graph_data()
.
Note that the existing EMR integration tests still pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not yet finished reviewing, and I want to add a helper function to make it a bit easier to create a CartographyNodeSchema
. like this, using the types
library. But this can come later.
create_node_schema(
label='EMRCluster',
properties=create_node_properties(
arn='ClusterArn',
auto_terminate='AutoTerminate',
...
),
...
)
@dataclass | ||
class EMRClusterSchema(CartographyNodeSchema): | ||
label: str = 'EMRCluster' | ||
properties: EMRClusterNodeProperties = EMRClusterNodeProperties() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
properties: EMRClusterNodeProperties = EMRClusterNodeProperties() | |
properties: CartographyNodeProperties = EMRClusterNodeProperties() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we typehint with the specific type and not the generic one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's helpful just as a clue to devs on what the supertype should be. When editing in an IDE, though, the effect is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By doing this we're saying that EMRClusterSchema.properties
will accept any CartographyNodeProperties
type, which is not correct, so I think we should keep this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! The type tricks I mentioned can come later.
Co-authored-by: Ramon Petgrave <[email protected]>
…nto generalizequerywrites
…tiple node labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but that one test case needs to fix.
tests/integration/cartography/graph/test_querybuilder_rel_subsets.py
Outdated
Show resolved
Hide resolved
tests/integration/cartography/graph/test_querybuilder_rel_subsets.py
Outdated
Show resolved
Hide resolved
tests/integration/cartography/graph/test_querybuilder_rel_subsets.py
Outdated
Show resolved
Hide resolved
…f#1038) * Build ingest query * Linter * Save cleanup query for another PR * Implement schema * bump mypy to 0.981 for python/mypy#13398 * linter * make load_graph_data interface make more sense * fix comment * Docs and some better names * add a todo * Doc updates, rename some fields * Fix pre-commit * Code commment suggestions Co-authored-by: Ramon Petgrave <[email protected]> * Stackoverflow comment for clarity) * Support ingesting only parts of a schema without breaking the others * Doc comment * Linter * Support matching on one or more properties * Correctly name test * Change key_refs to TargetNodeMatcher to enforce it as a mandatory field * Remove use of hacky default_field() * Support subset of schema relationships for query generation, test multiple node labels * Docstrings * Comments in tests * Better comments * Test for exception conditions * Remove irrelevant comment Co-authored-by: Ramon Petgrave <[email protected]>
Addresses #349, also a good step working toward #1024.
Why not use an existing ORM? Existing ones:
UNWIND
+MERGE
for speed and batching