-
Notifications
You must be signed in to change notification settings - Fork 16
Proposal: Make Collections Top Level Entities
Status: CLOSED
Comment Period Closes: August 20th, 2015
Affects Backwards Compatibility: Yes
Relevant Issues: https://github.com/MAECProject/schemas/issues/96
Bundle Collections are cumbersome to use and perhaps unnecessarily complicated because of their extensive use of nesting (e.g., Collections -> Behavior_Collections -> Behavior_Collection). And, as currently defined, Bundle Collections and top level container elements (e.g., Actions, Behaviors, Objects) serve similar functions - both store or reference a single type of MAEC entity (e.g., only Actions or only Objects).
This proposal is related to the following proposed changes to the schema:
- https://github.com/MAECProject/schemas/wiki/Proposal:-Deprecate-MAEC-Bundle-(as-a-concept-and-output-format)
- https://github.com/MAECProject/schemas/wiki/Proposal:-Make-Relationships-Top-Level-Entities
We propose to deprecate entity-specific collections in favor of defining more general Collections as top-level entities in a MAEC Package. A "maec_entity_type" field would define the type of MAEC entity captured in the Collection. General, top level Collections would provide flexibility by permitting the capture of collections of any MAEC entity(ies), including Malware Subjects.
In addition, an “association_type” field would optionally define an association between the entities in the Collection. For example, when Malware Subjects are captured in a Collection, the association_type field would enable the Collection to replace the existing Grouping Relationship.
Note that associations captured in a Collection are non-directional (i.e., they are many-to-many), whereas the newly-defined top-level Relationship entity defines a directed relationship between a pair of entities where one entity is considered the source and the other is considered the target.
The following existing schema types would be deprecated: maecBundle:CollectionsType
, maecBundle:BehaviorCollectionListType
, maecBundle:BehaviorCollectionType
, maecBundle:ActionCollectionListType
, maecBundle:ActionCollectionType
, maecBundle:ObjectCollectionListType
, maecBundle:ObjectCollectionType
, maecBundle:CandidateIndicatorCollectionListType
, maecBundle:CandidateIndicatorCollectionType
, maecBundle:BaseCollectionType
, maecPackage:GroupingRelationshipType
, maecPackage:GroupingRelationshipListType
.
A new CollectionType
schema type would be defined in the MAEC Package schema with the following fields:
Field | Type | Multiplicity | Description |
---|---|---|---|
@id | xs:QName |
1 | The id field specifies a unique identifier for the Collection. |
@maec_entity_type | maecVocabs:CollectionEntityTypeEnum |
1 | The required maec_entity_type field specifies the type of MAEC entity that is captured in the Collection, via the CollectionEntityTypeEnum. Example types would be 'objects' or 'various'. The default value is 'various'. |
@association_type | maecVocabs:CollectionAssociationTypeEnum |
0-1 | The association type field specifies the nature of the contents of the Collection, via the CollectionAssociationTypeEnum. |
Name | xs:string |
0-1 | The Name field specifies the name of the Collection. |
Entity_Reference | maecCore:EntityReferenceType |
0-* | The Entity_Reference field references an existing MAEC entity that is captured in the Collection, via its ID. |
Association_Metadata | maecPackage:AssociationMetadataType |
0-1 | The Association_Metadata field captures metadata that may be relevant to specific association types as captured in the association_type field. |
There may be cases where a Collection must be associated with a particular Malware Subject. To handle this requirement, we propose using a first-class relationship (see example).
A new enumeration of possible MAEC entity types that can be captured as part of a collection, the CollectionEntityTypeEnum
, would be created with the following values:
Value | Description |
---|---|
malware subjects | The 'malware subjects' value specifies that the collection contains ONLY MAEC Malware Subjects. |
actions | The 'actions' value specifies that the collection contains ONLY MAEC Malware Actions. |
objects | The 'objects' value specifies that the collection contains ONLY CybOX Objects. |
behaviors | The 'behaviors' value specifies that the collection contains ONLY MAEC Behaviors. |
capabilities | The 'capabilities' value specifies that the collection contains ONLY MAEC Capabilities. |
tools | The 'tools' value specifies that the collection contains ONLY MAEC Tools. |
process trees | The 'process trees' value specifies that the collection contains ONLY MAEC Process Trees. |
various | The 'various' value specifies that the collection contains various types of entities, such as Malware Actions AND CybOX Objects, for example. |
A new enumeration of possible Collection association types, the CollectionAssociationTypeEnum
, would be created with the values below. It includes enumeration values associated with the existing GroupingRelationshipTypeVocab
, where some values have been generalized to apply to any entity, not just Malware Subjects. The GroupingRelationshipTypeVocab
and its corresponding enumeration would be deprecated.
Value | Description |
---|---|
file system entities | The 'file system entities' value specifies that the Collection contains ONLY file system related entities; for example, this could include MAEC Actions that operate on files and/or CybOX File Objects. |
network entities | The 'network entities' value specifies that the Collection contains ONLY network related entities; for example, this could include MAEC Actions that operate on sockets and/or CybOX Address Objects. |
process entities | The 'process entities' value specifies that the Collection contains ONLY operating system process related entities; for example, this could include MAEC Actions that operate on processes and/or CybOX Process Objects. |
memory entities | The 'process entities' value specifies that the Collection contains ONLY memory related entities; for example, this could include MAEC Actions that operate on system memory and/or CybOX Memory Objects. |
ipc entities | The 'ipc entities' value specifies that the Collection contains ONLY interprocess-communication related entities; for example, this could include MAEC Actions that operate on mutexes and/or CybOX Mutex Objects. |
device entities | The 'device entities' value specifies that the Collection contains ONLY system device related entities; for example, this could include MAEC Actions that operate on disks and/or CybOX Disk Objects. |
registry entities | The 'registry entities' value specifies that the Collection contains ONLY Windows registry related entities; for example, this could include MAEC Actions that operate on registry keys and/or CybOX Windows Registry Key Objects. |
service entities | The 'service entities' value specifies that the Collection contains ONLY Windows service related entities; for example, this could include MAEC Actions that operate on services and/or CybOX Windows Service Objects. |
potential indicators | The 'potential indicators' value specifies that the Collection contains entities that could serve as potential indicators for a malware instance; for example, this could include specific CybOX File Objects that are created by the malware instance on a host system. |
same malware family | The 'same malware family' value specifies that the Collection contains Malware Subjects that are all part of the same malware family. |
clustered together | The 'clustered together' value specifies that the Collection contains entities that were clustered together by some algorithm or other capability. |
observed together | The 'observed together' value specifies that the Collection contains entities that were abstractly observed together. |
part of intrusion set | The 'part of intrusion set' value specifies that the Collection contains Malware Subjects that were found as part of the same malware intrusion set. |
same malware toolkit | The 'same malware toolkit' value specifies that the Collection contains Malware Subjects that were created using the same malware toolkit, independent of toolkit version. |
Finally, a new AssociationMetadataType
schema type (based on the existing GroupingRelationshipType
) would be defined in the MAEC Package schema with the following fields:
Field | Type | Multiplicity | Description |
---|---|---|---|
Malware_Family_Name | xs:string |
0-1 | The Malware_Family_Name field specifies the name of the malware family referred to by the 'same malware family' association type. |
Malware_Toolkit_Name | xs:string |
0-1 | The Malware_Toolkit_Name field specifies the name of the malware toolkit referred to by the 'same malware toolkit' association type. |
Intrusion_Set_Name | xs:string |
0-1 | The Intrusion_Set_Name field specifies the name of the intrusion set referred to by the ‘part of intrusion set’ association type. |
Clustering_Metadata | maecPackage:ClusteringMetadataType |
0-1 | The Clustering_Metadata field captures any metadata associated with the 'clustered together' association type. |
Because the composition of a cluster (when using the 'clustered together' association type) is inherently defined by a Collection, the ClusteringMetadataType
in the MAEC Package would be modified by removing the 'Clustering_Composition' field.
Accordingly, the following types associated with cluster composition would be deprecated:
ClusterCompositionType
-
ClusterEdgeNodePairType
.
All other fields in the ClusteringMetadataType
- 'Algorithm_Name', 'Algorithm_Version', 'Algorithm Parameters', 'Cluster_Size', and 'Cluster_Description' - would remain.
This example assumes that all related proposals will be implemented.
<MAEC_Package>
<Collections>
<Collection id="collection-1" association_type="network entities" maec_entity_type="actions">
<Name>Test collection of network actions</Name>
<Entity_Reference entity_idref="action-1"/>
<Entity_Reference entity_idref="action-2"/>
<Entity_Reference entity_idref="action-3"/>
</Collection>
</Collections>
<Malware_Subjects>
<Malware_Subject id="malware-subject-1">
...
</Malware_Subject>
</Malware_Subjects>
<Relationships>
<Relationship id="relationship-1" source_id="collection-1" target_id="malware-subject-1">
<Type>belongs to</Type>
</Relationship>
</Relationships>
</MAEC_Package>
This change will not be backward compatible and is one of several revisions planned in new major version.
- Should Collections be top-level entities?
- Should Collections be able to capture any set of related entities?
- Is the
CollectionType
schema type reasonably defined? - Is the 'maec_entity_type' field useful and necessary?
- Is the 'association_type' field aptly named and useful?
- Do the values in the
CollectionAssociationTypeEnum
make sense? Are there any values that are missing and should be added? - What other fields should be added to
AssociationMetaDataType
? - Is the 'Clustering_Metadata' field of
AssociationMetaDataType
appropriate for capturing clustering information? - Should Relationships be used to associate Collections with Malware Subjects?
- Are there alternative solutions to making Collections more meaningful and easier to use?