DI-based TypeHierarchy #18

fabianbs96 · 2023-05-29T12:45:57Z

See secure-software-engineering#623

CMakeLists.txt

include/phasar/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.h

lib/PhasarLLVM/HelperAnalyses.cpp

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

unittests/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchyTest.cpp

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

fabianbs96 · 2023-06-27T19:53:17Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+
+  // Initialize the transitive closure matrix with all as false
+  llvm::BitVector InitVector(VertexTypes.size(), false);
+


TransitiveClosure.reserve(VertexTypes.size());

unittests/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchyTest.cpp

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

fabianbs96 · 2023-07-03T18:23:13Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+  for (size_t I = 0; I < VTableSize; I++) {
+    IndexToFunctions.push_back(Init);
+  }


I don't think, the size of IndexToFunctions is tied to the max virtual index; rather it is the number of types. The sizes of the inner vectors depend on that, but on a per-type basis

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

fabianbs96 · 2023-07-03T18:28:44Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+      IndexFunctions.push_back(FunctionToAdd);
+      // concatenate vectors of functions of this index
+      IndexToFunctions[VirtualIndex].insert(
+          IndexToFunctions.at(VirtualIndex).begin(), IndexFunctions.begin(),
+          IndexFunctions.end());


I would rather go with sth like

if (IndexToFunctions[TypeIndex].size() <=VirtualIndex){ IndexToFunctions[TypeIndex].resize(VirtualIndex + 1); } IndexToFunctions[TypeIndex][VirtualIndex] = FunctionToAdd;

fabianbs96 · 2023-07-03T18:33:56Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

    for (const auto &Function : VTable.getAllFunctions()) {
-      OS << Function->getName() << "\n";
+      OS << Function->getName() << ", ";
    }


You may want to take a look at llvm::interleaveComma

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

unittests/PhasarLLVM/TypeHierarchy/CMakeLists.txt

fabianbs96 · 2023-07-03T18:39:27Z

unittests/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchyTest.cpp

-}
+  EXPECT_TRUE(DBTH.hasType(DBTH.getType("Base")));
+  EXPECT_TRUE(DBTH.hasType(DBTH.getType("Child")));
+  EXPECT_TRUE(BaseSubTypes.find(DBTH.getType("Child")) != BaseSubTypes.end());


Suggested change

EXPECT_TRUE(BaseSubTypes.find(DBTH.getType("Child")) != BaseSubTypes.end());

EXPECT_TRUE(BaseSubTypes.count(DBTH.getType("Child")));

unittests/PhasarLLVM/TypeHierarchy/LLVMTypeHierarchyTest.cpp

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

fabianbs96 · 2023-08-25T15:13:49Z

unittests/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchyTest.cpp

+  if (VTableForBase->empty()) {
+    EXPECT_TRUE(false);
+  } else {
+    EXPECT_TRUE(VTableForBase->getFunction(0)->getName() == "_ZN4Base3barEv");
+  }


Suggested change

if (VTableForBase->empty()) {

EXPECT_TRUE(false);

} else {

EXPECT_TRUE(VTableForBase->getFunction(0)->getName() == "_ZN4Base3barEv");

}

ASSERT_FALSE(VTableForBase->empty());

EXPECT_EQ(VTableForBase->getFunction(0)->getName(), "_ZN4Base3barEv");

fabianbs96 · 2023-09-01T16:01:09Z

lib/PhasarLLVM/TaintConfig/LLVMTaintConfig.cpp

+                llvm::outs() << "Current arg: " << Arg << "\n";
+                llvm::outs().flush();


May use PhASAR's logger here instead

fabianbs96 · 2023-09-01T16:01:57Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+    if (IndexToFunctions[TypeIndex->getSecond()].size() <= VirtualIndex) {
+      IndexToFunctions[TypeIndex->getSecond()].resize(VirtualIndex);
+    }


This seems to be redundant

fabianbs96 · 2023-09-01T16:02:50Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+    llvm::outs() << "TC.size(): " << TransitiveClosure.size()
+                 << " VT.size(): " << VertexTypes.size();
+    llvm::outs().flush();


We should print such messages to stderr (llvm::errs()) instead.

fabianbs96 · 2023-09-01T16:04:00Z

unittests/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchyTest.cpp

+  EXPECT_TRUE(BaseType);
+  if (BaseType) {
+    EXPECT_TRUE(DBTH.hasType(BaseType));
+    EXPECT_TRUE(DBTH.hasVFTable(BaseType));
+  }


Suggested change

EXPECT_TRUE(BaseType);

if (BaseType) {

EXPECT_TRUE(DBTH.hasType(BaseType));

EXPECT_TRUE(DBTH.hasVFTable(BaseType));

}

ASSERT_NE(nullptr, BaseType);

EXPECT_TRUE(DBTH.hasType(BaseType));

EXPECT_TRUE(DBTH.hasVFTable(BaseType));

…evelopment

MMory · 2023-09-21T15:04:45Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+               TypeToVertex.end());
+        size_t BaseTypeVertex = TypeToVertex[DerivedType->getBaseType()];
+
+        llvm::outs().flush();


MMory · 2023-09-21T15:05:39Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+        size_t BaseTypeVertex = TypeToVertex[DerivedType->getBaseType()];
+
+        llvm::outs().flush();
+        assert(TransitiveClosure.size() >= BaseTypeVertex);


MMory · 2023-09-21T15:06:28Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+
+        llvm::outs().flush();
+        assert(TransitiveClosure.size() >= BaseTypeVertex);
+        assert(TransitiveClosure.size() >= ActualDerivedType);


In line 75, I want to access TransitiveClosure at the indices BaseTypeVertex and ActualDerivedType.
Line 75: "TransitiveClosure[BaseTypeVertex][ActualDerivedType] = true;"
Shouldn't I check beforehand if these indices are valid, by checking that the TransitiveClosure.size() is bigger than the indices? That's why I put the asserts in line 73 and line 74 there. Is there something I'm missing?

you should, but in the equality case you are still accessing out of bounds, as the first element has index 0

Makes sense, thank you. I interpreted the > as "see above", as in "why?"

MMory · 2023-09-21T15:10:09Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+  std::vector<std::vector<const llvm::Function *>> IndexToFunctions;
+  IndexToFunctions.resize(VertexTypes.size());


there is a constructor that allows you to initialize a std::vector with a particular size

MMory · 2023-09-21T15:14:13Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+  size_t VTableSize = 0;
+  for (const auto &Subprogram : Finder.subprograms()) {
+    if (Subprogram->getVirtualIndex() > VTableSize) {
+      VTableSize = Subprogram->getVirtualIndex();
+    }
+  }
+  // if the biggest virtual index is 2 for example, the vector needs to have a
+  // size of 3 (Indices: 0, 1, 2)
+  VTableSize++;


MMory · 2023-09-21T15:15:18Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+      continue;
+    }
+
+    const auto *const FunctionToAdd =


that's a lot of const ;-)

MMory · 2023-09-21T15:16:14Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+    if (TypeIndex->getSecond() >= IndexToFunctions.size()) {
+      continue;
+    }


that is a weird handling of this case.

I'm unsure what I should do to improve this case. Is an assert better here, or should I use the phasar logger with log level "WARNING"?

I think an assertion should be the better choice. This should never happen and would be a reason to fail hard.

MMory · 2023-09-21T15:25:12Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+void DIBasedTypeHierarchy::printAsDot(llvm::raw_ostream &OS) const {
+  OS << "digraph TypeHierarchy{\n";


this would benefit from labels, so that you see them in the graph visualization.

What kind of labels? Do you mean like a legend that tells you how the graph is structured?

The dot file format allows attributes for nodes and edges, one of which is label. There are also more that allow you to set colors, borders etc. when you plot it to a image or just in xdot.

By doing so you could introduce nodes by their id and then reference by id, for example:

0[label="Base"]
1[label="Child"]
1->0[label="public"]

The label for the edge is just an example. In fact, it might be worth thinking about what would be interesting properties of the edges.

This dump would be helpful for debugging, as you see the mapping between ids and names.

I will implement this, thank you

MMory · 2023-09-21T15:33:22Z

unittests/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchyTest.cpp

+TEST(DBTHTest, BasicTHReconstruction_2) {
+  LLVMProjectIRDB IRDB({unittest::PathToLLTestFiles +
+                        "type_hierarchies/type_hierarchy_17_cpp_dbg.ll"});


the indices are inconsistent, that's very surprising and confusing. Is there a reason you omit the other cases? What about case 16 (ll index) ?

MMory · 2023-09-21T15:34:23Z

unittests/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchyTest.cpp

+  // Since Child2 hasn't been created, it shouldn't exist and also not be found
+  // via DBTH.getType("Child2")
+  const auto &Child2Type = DBTH.getType("Child2");
+  EXPECT_FALSE(Child2Type);


is that intentional? here you are relying on compiler behavior.

MMory · 2023-09-22T07:32:09Z

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp

+  size_t CurrentRowIndex = 0;
+  for (const auto &Row : TransitiveClosure) {
+    for (const auto &Cell : Row.set_bits()) {
+      if (Row[Cell]) {


actually this is always true. set_bits() lets you iterate over all the positions where 1 is set.

…LLVMTypeHierarchy

fabianbs96 · 2024-02-25T14:16:34Z

Closed in favor of secure-software-engineering#702

small backup safe

cd2300f

fabianbs96 assigned mxHuber May 29, 2023

mxHuber and others added 17 commits May 30, 2023 12:52

backup save, still needs metadata extraction

9428cb5

refactoring + some basic functions implemented

0d4d535

basic structure of constructHierarchy()

483596e

DIBasedTypeHierarchy structure

5d1a5d3

basic impl of constructor and hasVFTable

0b9c1e8

impl edges of graph, isSubType, getSubType and print

ac6495c

untested version of transitive closure

e630e04

added transitive closure and changed print

688c115

fixed transitive closure + refactoring

f8b48aa

bug fixes + tests

587839f

debugging Debug info extraction

deebbcf

Fixed type extraction, untested transitive hull

a81ab3b

fixed includes + more debug info

36a0fa2

bug fixes and non recursive transitive hull

00dace4

working direct edge detection

d833748

BitVector, cleanup, start of vtable impl

cd5d7a4

vtables and dotgraph

21bf49f

fabianbs96 commented Jun 27, 2023

View reviewed changes

mxHuber added 4 commits June 28, 2023 16:12

review changes + vtable fix, 50% finished

7451de3

impl review suggestions

71592b4

removed old type_hierarchy unittests

03aadf1

impl .set_bits() loop

57eefb8

fabianbs96 commented Jul 3, 2023

View reviewed changes

lib/PhasarLLVM/TypeHierarchy/DIBasedTypeHierarchy.cpp Outdated Show resolved Hide resolved

mxHuber added 4 commits July 5, 2023 08:54

fixed vtables

518309b

fixes and code cleanup

24a6cd0

added llvm::interleaveComma

61d18ff

fixed wrong assertion

7ea9969

mxHuber added 4 commits July 14, 2023 16:52

important bugfixes

39756ac

unittests for multiple base classes

6ece219

unittests not finished, backup

8bc4a30

more unittests, all pass

a7dd8ca

fabianbs96 commented Aug 25, 2023

View reviewed changes

reworked unittests

0e61b48

fabianbs96 commented Sep 1, 2023

View reviewed changes

mxHuber and others added 6 commits September 4, 2023 07:57

review changes

fa093ef

review changes

b1635c1

myphasartools.cpp revert

c3c9b51

Merge remote-tracking branch 'refs/remotes/origin/development' into d…

8669373

…evelopment

current final version

15af897

Bump submodules

29a0c12

MMory reviewed Sep 21, 2023

View reviewed changes

fabianbs96 marked this pull request as ready for review September 21, 2023 17:13

MMory reviewed Sep 22, 2023

View reviewed changes

mxHuber and others added 12 commits September 22, 2023 20:26

backup of fixes + unittests

af587b2

more unittests

1f1e323

new unittest

5b56d8c

Pin swift version

06dbc3f

basicRecoTH backup

6cdaaa7

backup of structure

7756202

new unittests, some fail

e6b87cd

unittests fixed, all pass

a8ef03a

Merge remote-tracking branch 'mainline/development' into f-Modernized…

4379bd0

…LLVMTypeHierarchy

Add LLVM-RTTI-style type-hierarchy layout

5f15a19

Merge branch 'development' into f-ModernizedLLVMTypeHierarchy

e616272

Fix logging macro invocation

9e215dc

fabianbs96 closed this Feb 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DI-based TypeHierarchy #18

DI-based TypeHierarchy #18

fabianbs96 commented May 29, 2023 •

edited

Loading

fabianbs96 Jun 27, 2023

fabianbs96 Jul 3, 2023

fabianbs96 Jul 3, 2023

fabianbs96 Jul 3, 2023

fabianbs96 Jul 3, 2023

fabianbs96 Aug 25, 2023

fabianbs96 Aug 25, 2023

fabianbs96 Sep 1, 2023

fabianbs96 Sep 1, 2023

fabianbs96 Sep 1, 2023

fabianbs96 Sep 1, 2023

MMory Sep 21, 2023

MMory Sep 21, 2023

MMory Sep 21, 2023

mxHuber Sep 22, 2023

MMory Sep 22, 2023

mxHuber Sep 22, 2023

MMory Sep 21, 2023

MMory Sep 21, 2023

MMory Sep 21, 2023

MMory Sep 21, 2023

mxHuber Sep 22, 2023

MMory Sep 22, 2023 •

edited

Loading

MMory Sep 21, 2023

mxHuber Sep 22, 2023

MMory Sep 22, 2023 •

edited

Loading

mxHuber Sep 22, 2023

MMory Sep 21, 2023

MMory Sep 21, 2023

MMory Sep 22, 2023

fabianbs96 commented Feb 25, 2024


		// Initialize the transitive closure matrix with all as false
		llvm::BitVector InitVector(VertexTypes.size(), false);

	EXPECT_TRUE(BaseSubTypes.find(DBTH.getType("Child")) != BaseSubTypes.end());
	EXPECT_TRUE(BaseSubTypes.count(DBTH.getType("Child")));

		llvm::outs() << "Current arg: " << Arg << "\n";
		llvm::outs().flush();

		std::vector<std::vector<const llvm::Function *>> IndexToFunctions;
		IndexToFunctions.resize(VertexTypes.size());

		void DIBasedTypeHierarchy::printAsDot(llvm::raw_ostream &OS) const {
		OS << "digraph TypeHierarchy{\n";

DI-based TypeHierarchy #18

DI-based TypeHierarchy #18

Conversation

fabianbs96 commented May 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MMory Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MMory Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabianbs96 commented Feb 25, 2024

fabianbs96 commented May 29, 2023 •

edited

Loading

MMory Sep 22, 2023 •

edited

Loading

MMory Sep 22, 2023 •

edited

Loading