Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet: support property feature extraction #1168

Merged
merged 61 commits into from
Sep 9, 2022
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
1515ddb
Add property feature extraction support for fields accessed directly …
anushkavirgaonkar Jul 21, 2022
9fae150
Support property feature extraction
anushkavirgaonkar Jul 25, 2022
f34c5ec
Format code
anushkavirgaonkar Jul 25, 2022
387561d
Remove common api and property references
anushkavirgaonkar Jul 27, 2022
3fb0ff7
Format code
anushkavirgaonkar Jul 27, 2022
70ae4a6
Format code
anushkavirgaonkar Jul 27, 2022
dba77c1
Revert "Format code"
anushkavirgaonkar Jul 27, 2022
9c76b04
Format code
anushkavirgaonkar Jul 27, 2022
564bb96
Format code
anushkavirgaonkar Jul 27, 2022
dbdd321
Format code
anushkavirgaonkar Jul 27, 2022
8dc01f8
Update capa/features/extractors/dnfile/helpers.py
anushkavirgaonkar Jul 28, 2022
9d54ced
Update capa/features/extractors/dnfile/insn.py
anushkavirgaonkar Jul 28, 2022
7a2e552
Update capa/features/extractors/dnfile/insn.py
anushkavirgaonkar Jul 28, 2022
6c2d740
Update capa/features/extractors/dnfile/insn.py
anushkavirgaonkar Jul 28, 2022
e25d494
Fix replace method name logic
anushkavirgaonkar Jul 28, 2022
9e624bd
Merge branch 'feature_property' of https://github.com/anushkavirgaonk…
anushkavirgaonkar Jul 28, 2022
bf1713d
Cache properties/fields
anushkavirgaonkar Jul 29, 2022
ccb1aad
Add comments for checks
anushkavirgaonkar Jul 30, 2022
7e24af5
Remove return statement
anushkavirgaonkar Aug 5, 2022
db8f1b2
Update capa/features/extractors/dnfile/insn.py
anushkavirgaonkar Aug 5, 2022
02a2d2c
Update capa/features/extractors/dnfile/insn.py
anushkavirgaonkar Aug 5, 2022
adfd672
Update capa/features/extractors/dnfile/helpers.py
anushkavirgaonkar Aug 5, 2022
d35cd47
Fix get_dotnet_properties return type
anushkavirgaonkar Aug 5, 2022
45a6e1b
Add tests for property features
anushkavirgaonkar Aug 5, 2022
8cc6069
Fix file path
anushkavirgaonkar Aug 5, 2022
20928f1
Add constants for metadata table numbers
anushkavirgaonkar Aug 8, 2022
54d29cd
Add tests covering different methodsfor referencing properties
anushkavirgaonkar Aug 8, 2022
ffbf038
Add test
anushkavirgaonkar Aug 8, 2022
25e8226
remove test
anushkavirgaonkar Aug 9, 2022
c84aeb1
Update dnfile version
anushkavirgaonkar Aug 9, 2022
9d46098
Add MethodDef property test
anushkavirgaonkar Aug 9, 2022
5b69ac7
Update dncil version
anushkavirgaonkar Aug 9, 2022
ec18483
Merge branch 'master' into feature_property
mike-hunhoff Aug 9, 2022
a4b2bee
Emit read/write property features
anushkavirgaonkar Aug 10, 2022
b58ba2c
Add tests for read/write property features
anushkavirgaonkar Aug 10, 2022
d0eecf8
Merge branch 'feature_property' of https://github.com/anushkavirgaonk…
anushkavirgaonkar Aug 10, 2022
8f035ad
Format code
anushkavirgaonkar Aug 10, 2022
2163d7b
Format code
anushkavirgaonkar Aug 11, 2022
f77fe03
Add enum for property access type
anushkavirgaonkar Aug 11, 2022
c2c9bd1
Fix imports
anushkavirgaonkar Aug 11, 2022
ccf1f56
Fix imports
anushkavirgaonkar Aug 11, 2022
dc03c6e
Fix imports
anushkavirgaonkar Aug 11, 2022
6767ec8
Use one Property feature class for read/write specifiers
anushkavirgaonkar Aug 11, 2022
f8658b5
Format code
anushkavirgaonkar Aug 11, 2022
82a5cb4
Fix logic
anushkavirgaonkar Aug 12, 2022
ac91d55
merge upstream
mike-hunhoff Sep 8, 2022
e4e503c
update setup.py
mike-hunhoff Sep 8, 2022
8454671
implement #1142 and refactor code to accomodate changes
mike-hunhoff Sep 9, 2022
fa6ec72
update CHANGELOG
mike-hunhoff Sep 9, 2022
97fd697
Update capa/features/common.py
mike-hunhoff Sep 9, 2022
dc27b28
Update capa/features/extractors/dnfile/insn.py
mike-hunhoff Sep 9, 2022
e3b1362
Update capa/features/extractors/dnfile/insn.py
mike-hunhoff Sep 9, 2022
1cdb699
Update capa/features/extractors/dnfile/insn.py
mike-hunhoff Sep 9, 2022
f7b2b57
Update capa/features/extractors/dnfile/insn.py
mike-hunhoff Sep 9, 2022
12947a3
Update capa/features/extractors/dnfile/insn.py
mike-hunhoff Sep 9, 2022
39408f6
Update capa/features/extractors/dnfile/insn.py
mike-hunhoff Sep 9, 2022
585d764
PR feedback updates
mike-hunhoff Sep 9, 2022
ce36c08
fix formatting
mike-hunhoff Sep 9, 2022
1960408
fix vverbose rendering and add tests
mike-hunhoff Sep 9, 2022
99cb5ab
subclassing feature access
mike-hunhoff Sep 9, 2022
8fec752
Update common.py
mike-hunhoff Sep 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

### New Features
- verify rule metadata format on load #1160 @mr-tz
- extract property features from .NET PE files #1168 @anushkavirgaonkar

### Breaking Changes

Expand Down
22 changes: 18 additions & 4 deletions capa/features/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,11 @@
THUNK_CHAIN_DEPTH_DELTA = 5


ACCESS_READ = "read"
ACCESS_WRITE = "write"
VALID_ACCESS = (ACCESS_READ, ACCESS_WRITE)


def bytes_to_str(b: bytes) -> str:
return str(codecs.encode(b, "hex").decode("utf-8"))

Expand Down Expand Up @@ -92,23 +97,32 @@ def __nonzero__(self):


class Feature(abc.ABC):
def __init__(self, value: Union[str, int, float, bytes], description=None):
def __init__(
self, value: Union[str, int, float, bytes], access: Optional[str] = None, description: Optional[str] = None
mike-hunhoff marked this conversation as resolved.
Show resolved Hide resolved
):
"""
Args:
value (any): the value of the feature, such as the number or string.
description (str): a human-readable description that explains the feature value.
mike-hunhoff marked this conversation as resolved.
Show resolved Hide resolved
"""
super(Feature, self).__init__()
self.name = self.__class__.__name__.lower()

if access is not None:
if access not in VALID_ACCESS:
raise ValueError("access '%s' must be one of %s" % (access, VALID_ACCESS))
self.name = self.__class__.__name__.lower() + "/" + access
mike-hunhoff marked this conversation as resolved.
Show resolved Hide resolved
else:
self.name = self.__class__.__name__.lower()

self.value = value
self.access = access
self.description = description

def __hash__(self):
return hash((self.name, self.value))
return hash((self.name, self.value, self.access))

def __eq__(self, other):
return self.name == other.name and self.value == other.value
return self.name == other.name and self.access == other.access and self.value == other.value

def __lt__(self, other):
# TODO: this is a huge hack!
Expand Down
139 changes: 95 additions & 44 deletions capa/features/extractors/dnfile/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from __future__ import annotations

import logging
from enum import Enum
from typing import Any, Tuple, Iterator, Optional

import dnfile
Expand All @@ -17,6 +18,8 @@
from dncil.clr.token import Token, StringToken, InvalidToken
from dncil.cil.body.reader import CilMethodBodyReaderBase

from capa.features.common import ACCESS_READ, ACCESS_WRITE

logger = logging.getLogger(__name__)

# key indexes to dotnet metadata tables
Expand All @@ -41,56 +44,47 @@ def seek(self, offset: int) -> int:
return self.offset


class DnClass(object):
def __init__(self, token: int, namespace: str, classname: str):
self.token: int = token
self.namespace: str = namespace
self.classname: str = classname
class DnType(object):
def __init__(self, token: int, class_: str, access: Optional[str] = None, namespace: str = "", member: str = ""):
mike-hunhoff marked this conversation as resolved.
Show resolved Hide resolved
self.token = token
self.access = access
self.namespace = namespace
self.class_ = class_
self.member = member

def __hash__(self):
return hash((self.token,))
return hash((self.token, self.access, self.namespace, self.class_, self.member))

def __eq__(self, other):
return self.token == other.token
return (
self.token == other.token
and self.access == other.access
and self.namespace == other.namespace
and self.class_ == other.class_
and self.member == other.member
)

def __str__(self):
return DnClass.format_name(self.namespace, self.classname)
return DnType.format_name(self.class_, namespace=self.namespace, member=self.member)

def __repr__(self):
return str(self)

@staticmethod
def format_name(namespace: str, classname: str):
name: str = classname
if namespace:
# like System.IO.File::OpenRead
name = f"{namespace}.{name}"
return name


class DnMethod(DnClass):
def __init__(self, token: int, namespace: str, classname: str, methodname: str):
super(DnMethod, self).__init__(token, namespace, classname)
self.methodname: str = methodname

def __str__(self):
return DnMethod.format_name(self.namespace, self.classname, self.methodname)

@staticmethod
def format_name(namespace: str, classname: str, methodname: str): # type: ignore
def format_name(class_: str, namespace: str = "", member: str = ""):
# like File::OpenRead
name: str = f"{classname}::{methodname}"
name: str = f"{class_}::{member}" if member else class_
if namespace:
# like System.IO.File::OpenRead
name = f"{namespace}.{name}"
return name


class DnUnmanagedMethod:
def __init__(self, token: int, modulename: str, methodname: str):
def __init__(self, token: int, module: str, method: str):
self.token: int = token
self.modulename: str = modulename
self.methodname: str = methodname
self.module: str = module
self.method: str = method
mike-hunhoff marked this conversation as resolved.
Show resolved Hide resolved

def __hash__(self):
return hash((self.token,))
Expand All @@ -99,14 +93,14 @@ def __eq__(self, other):
return self.token == other.token

def __str__(self):
return DnUnmanagedMethod.format_name(self.modulename, self.methodname)
return DnUnmanagedMethod.format_name(self.module, self.method)

def __repr__(self):
return str(self)

@staticmethod
def format_name(modulename, methodname):
return f"{modulename}.{methodname}"
def format_name(module, method):
return f"{module}.{method}"


def resolve_dotnet_token(pe: dnfile.dnPE, token: Token) -> Any:
Expand Down Expand Up @@ -139,7 +133,7 @@ def read_dotnet_method_body(pe: dnfile.dnPE, row: dnfile.mdtable.MethodDefRow) -
try:
return CilMethodBody(DnfileMethodBodyReader(pe, row))
except MethodBodyFormatError as e:
logger.warn("failed to parse managed method body @ 0x%08x (%s)" % (row.Rva, e))
logger.warning("failed to parse managed method body @ 0x%08x (%s)" % (row.Rva, e))
return None


Expand All @@ -148,7 +142,7 @@ def read_dotnet_user_string(pe: dnfile.dnPE, token: StringToken) -> Optional[str
try:
user_string: Optional[dnfile.stream.UserString] = pe.net.user_strings.get_us(token.rid)
except UnicodeDecodeError as e:
logger.warn("failed to decode #US stream index 0x%08x (%s)" % (token.rid, e))
logger.warning("failed to decode #US stream index 0x%08x (%s)" % (token.rid, e))
return None

if user_string is None:
Expand All @@ -157,7 +151,7 @@ def read_dotnet_user_string(pe: dnfile.dnPE, token: StringToken) -> Optional[str
return user_string.value


def get_dotnet_managed_imports(pe: dnfile.dnPE) -> Iterator[DnMethod]:
def get_dotnet_managed_imports(pe: dnfile.dnPE) -> Iterator[DnType]:
"""get managed imports from MemberRef table

see https://www.ntcore.com/files/dotnetformat.htm
Expand All @@ -176,10 +170,10 @@ def get_dotnet_managed_imports(pe: dnfile.dnPE) -> Iterator[DnMethod]:
continue

token: int = calculate_dotnet_token_value(pe.net.mdtables.MemberRef.number, rid + 1)
yield DnMethod(token, row.Class.row.TypeNamespace, row.Class.row.TypeName, row.Name)
yield DnType(token, row.Class.row.TypeName, namespace=row.Class.row.TypeNamespace, member=row.Name)


def get_dotnet_managed_methods(pe: dnfile.dnPE) -> Iterator[DnMethod]:
def get_dotnet_managed_methods(pe: dnfile.dnPE) -> Iterator[DnType]:
"""get managed method names from TypeDef table

see https://www.ntcore.com/files/dotnetformat.htm
Expand All @@ -193,7 +187,64 @@ def get_dotnet_managed_methods(pe: dnfile.dnPE) -> Iterator[DnMethod]:
for row in iter_dotnet_table(pe, "TypeDef"):
for index in row.MethodList:
token = calculate_dotnet_token_value(index.table.number, index.row_index)
yield DnMethod(token, row.TypeNamespace, row.TypeName, index.row.Name)
yield DnType(token, row.TypeName, namespace=row.TypeNamespace, member=index.row.Name)


def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
"""get fields from TypeDef table"""
for row in iter_dotnet_table(pe, "TypeDef"):
for index in row.FieldList:
token = calculate_dotnet_token_value(index.table.number, index.row_index)
yield DnType(token, row.TypeName, namespace=row.TypeNamespace, member=index.row.Name)


def get_dotnet_property_map(
pe: dnfile.dnPE, property_row: dnfile.mdtable.PropertyRow
) -> Optional[dnfile.mdtable.TypeDefRow]:
"""get property map from PropertyMap table

see https://www.ntcore.com/files/dotnetformat.htm

21 - PropertyMap Table
List of Properties owned by a specific class.
Parent (index into the TypeDef table)
PropertyList (index into Property table). It marks the first of a contiguous run of Properties owned by Parent. The run continues to the smaller of:
the last row of the Property table
the next run of Properties, found by inspecting the PropertyList of the next row in this PropertyMap table
"""
for row in iter_dotnet_table(pe, "PropertyMap"):
for index in row.PropertyList:
if index.row.Name == property_row.Name:
return row.Parent.row
return None


def get_dotnet_properties(pe: dnfile.dnPE) -> Iterator[DnType]:
"""get property from MethodSemantics table

see https://www.ntcore.com/files/dotnetformat.htm

24 - MethodSemantics Table
Links Events and Properties to specific methods. For example one Event can be associated to more methods. A property uses this table to associate get/set methods.
Semantics (a 2-byte bitmask of type MethodSemanticsAttributes)
Method (index into the MethodDef table)
Association (index into the Event or Property table; more precisely, a HasSemantics coded index)
"""
for row in iter_dotnet_table(pe, "MethodSemantics"):
typedef_row = get_dotnet_property_map(pe, row.Association.row)
if typedef_row is None:
continue

token = calculate_dotnet_token_value(row.Method.table.number, row.Method.row_index)
access_type = ACCESS_WRITE if row.Semantics.msSetter else ACCESS_READ if row.Semantics.msGetter else None

yield DnType(
token,
typedef_row.TypeName,
access=access_type,
namespace=typedef_row.TypeNamespace,
member=row.Association.row.Name,
)


def get_dotnet_managed_method_bodies(pe: dnfile.dnPE) -> Iterator[Tuple[int, CilMethodBody]]:
Expand Down Expand Up @@ -226,20 +277,20 @@ def get_dotnet_unmanaged_imports(pe: dnfile.dnPE) -> Iterator[DnUnmanagedMethod]
ImportScope (index into the ModuleRef table)
"""
for row in iter_dotnet_table(pe, "ImplMap"):
modulename: str = row.ImportScope.row.Name
methodname: str = row.ImportName
module: str = row.ImportScope.row.Name
method: str = row.ImportName

# ECMA says "Each row of the ImplMap table associates a row in the MethodDef table (MemberForwarded) with the
# name of a routine (ImportName) in some unmanaged DLL (ImportScope)"; so we calculate and map the MemberForwarded
# MethodDef table token to help us later record native import method calls made from CIL
token: int = calculate_dotnet_token_value(row.MemberForwarded.table.number, row.MemberForwarded.row_index)

# like Kernel32.dll
if modulename and "." in modulename:
modulename = modulename.split(".")[0]
if module and "." in module:
module = module.split(".")[0]

# like kernel32.CreateFileA
yield DnUnmanagedMethod(token, modulename, methodname)
yield DnUnmanagedMethod(token, module, method)


def calculate_dotnet_token_value(table: int, rid: int) -> int:
Expand Down
Loading