Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBSVolume => New data model; Allow node attr updates from multiple intel modules #1154

Closed
wants to merge 4 commits into from

Conversation

achantavy
Copy link
Contributor

Refactors EBS Volume details to new data model.

Additionally, fixes a bug that prevented a node from being updated correctly by multiple intel modules.

Expected behavior:
After refactoring the EBS Volume sync to the new model, we can run the EC2 instance sync and the EBS volume sync one after the other and see the deleteontermination field on the EBSVolumes that are attached to EC2 instances.

Actual behavior:
After refactoring, when I run the EC2 instance sync and the EBS volume sync one after the other, the deleteontermination field is not present.

Root cause:
The EC2 instance sync loads in :EC2Instances with :EBSVolumes attached based on the output of describe-ec2-instances. This data includes a field called deleteontermination on the :EBSVolume. deleteontermination is only set during the EC2 instance sync.

Next when the EBS Volume sync runs, it enriches existing :EBSVolumes with additional fields from data retrieved with describe-ebs-volumes but because this API call does not include deleteontermination, we will overwrite the previous values of deleteontermination with None, thus removing the property from all :EBSVolumes.

Fix:
This PR updates the PropertyRef implementation to use COALESCE() so that any existing values already defined for a given node property are preserved. This is quirky so I have also tried to thoroughly update documentation.

(r['n1.id'], r['n2.id']) for r in result

# Assert that the account to volume rels exist
assert check_rels(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should run before sync_ebs_volumes

from typing import Optional


def build_arn(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

cartography/intel/aws/ec2/volumes.py Outdated Show resolved Hide resolved

for attachment in active_attachments:
vol_with_attachment = raw_vol.copy()
vol_with_attachment['InstanceId'] = attachment['InstanceId']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this because volumes can be attached to multiple instances?
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html


volume_id = volume['VolumeId']
raw_vol = ({
'Arn': build_arn('ec2', current_aws_account_id, 'volume', volume_id, region),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for Arn, should we bother being explicit about the rest of the props?
We could do

raw_vol = {
'Arn': build_arn('ec2', current_aws_account_id, 'volume', volume_id, region),,
**volume
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I don't have a strong opinion here. I think being explicit is nice because it lets a reader see what fields are being processed.

cartography/models/core/common.py Outdated Show resolved Hide resolved
# a non-None value. If so, we are done. Else continue and check the next source.
# 2. An existing value on the node itself. That is, we convert self.name to lowercase and check if that is non
# null on the Neo4j node `i` already. If None, then the property is not set (this a Neo4j driver behavior).
# This means we make an ASSUMPTION that the property name set on the node is the lowercase version of self.name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to enforce that PropertyRef variables on NodeSchemas are all lowercase. And also enforce a standard between the variable names and the PropertyRef inputs.

myprop = PropertyRef('MyProp')

instead of

mycoolprop = PropertyRef('MyProp')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Or we need to find another way to allow different intel modules to update the same node types using the same data schema objects. I don't know, can you think of a better way to solve this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, another way would be to use subclasses. The ec2/volumes module can have a base version of EBSVolumeNodeProperties that does not include deleteontermination, while the ec2/instances module has a subclass of EBSVolumeNodeProperties that does include that extra property.

@dataclass(frozen=True)
class EBSVolumeNodePropertiesFromInstances(EBSVolumeNodeProperties):
    deleteontermination: PropertyRef = PropertyRef('DeleteOnTermination')

@dataclass(frozen=True)
class EBSVolumeSchemaFromInstances(EBSVolumeSchema):
    properties: EBSVolumeNodePropertiesFromInstances = EBSVolumeNodePropertiesFromInstances()

This way only the ec2/instances module can overwrite the deleteontermination property.

One problem with that: What if we wanted the subclass to have fewer props that the the base class? Then we would need something that could make the asdict method preclude the undesired props.

achantavy and others added 2 commits April 7, 2023 17:46
Co-authored-by: Ramon Petgrave <[email protected]>
Co-authored-by: Ramon Petgrave <[email protected]>
@achantavy
Copy link
Contributor Author

Going to close this PR, re-file it as an issue, and try a different approach using separate classes for properties. I think this solution is too quirky and complicated.

@achantavy achantavy closed this Jul 14, 2023
achantavy added a commit that referenced this pull request Jul 14, 2023
…iple intel modules (#1214)

See #1210 for full context.

#1154 tried to solve this problem by updating the querybuilder but this
was too complex and would not generalize well.

This solution is simpler where we use different property classes for
each API response so that we don't overwrite properties on a node set by
another sync job.

This PR can be reviewed commit-by-commit:
- c0d9ac4 shows a repro of the error
with a failing integration test.
- facb63b shows the solution using
multiple classes.

---------

Co-authored-by: Ramon Petgrave <[email protected]>
ramonpetgrave64 added a commit that referenced this pull request Aug 3, 2023
…iple intel modules (#1214)

See #1210 for full context.

#1154 tried to solve this problem by updating the querybuilder but this
was too complex and would not generalize well.

This solution is simpler where we use different property classes for
each API response so that we don't overwrite properties on a node set by
another sync job.

This PR can be reviewed commit-by-commit:
- c0d9ac4 shows a repro of the error
with a failing integration test.
- facb63b shows the solution using
multiple classes.

---------

Co-authored-by: Ramon Petgrave <[email protected]>
chandanchowdhury pushed a commit to juju4/cartography that referenced this pull request Jun 26, 2024
…pdates from multiple intel modules (cartography-cncf#1214)

See cartography-cncf#1210 for full context.

cartography-cncf#1154 tried to solve this problem by updating the querybuilder but this
was too complex and would not generalize well.

This solution is simpler where we use different property classes for
each API response so that we don't overwrite properties on a node set by
another sync job.

This PR can be reviewed commit-by-commit:
- c0d9ac4 shows a repro of the error
with a failing integration test.
- facb63b shows the solution using
multiple classes.

---------

Co-authored-by: Ramon Petgrave <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants