Skip to content

Commit

Permalink
Avoid iterating excessive definitions and comment out a log of debugging
Browse files Browse the repository at this point in the history
logging.

Signed-off-by: David Korczynski <[email protected]>
  • Loading branch information
DavidKorczynski committed Mar 29, 2023
1 parent 26a2dc8 commit 507c89d
Show file tree
Hide file tree
Showing 6 changed files with 126 additions and 113 deletions.
2 changes: 1 addition & 1 deletion pycg/machinery/callgraph.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def add_edge(self, src, dest, lineno=-1, mod="", ext_mod=""):
"ext_mod" : ext_mod
}
)
logger.debug(self.cg_extended[src])
#logger.debug(self.cg_extended[src])

def get(self):
return self.cg
Expand Down
14 changes: 13 additions & 1 deletion pycg/machinery/definitions.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,14 @@
# specific language governing permissions and limitations
# under the License.
#
import logging

from pycg.machinery.pointers import NamePointer, LiteralPointer
from pycg import utils

logger = logging.getLogger(__name__)


class DefinitionManager(object):
def __init__(self):
self.defs = {}
Expand Down Expand Up @@ -142,11 +147,19 @@ def update_pointsto_args(pointsto_args, arg, name):
pointsto_arg_def.add(item)
return changed_something

logger.info("Def-Iterating %d defs"%(len(self.defs)))
if len(self.defs) > 9000:
logger.info("The definition list is too large. This is likely to take forever. Avoid this step")
return

This comment has been minimized.

Copy link
@DavidKorczynski

DavidKorczynski Mar 29, 2023

Author Collaborator

The main reason for the above change is that some projects will have 20K+ definitions, and the below six-nested loops will have a significant performance hit. ipykernel is the one that triggered this: google/oss-fuzz#9914

Note that this is an approimation suddenly, in that the converging will not happen in the usual manner. I don't think this should have significant impact on the results though -- at least not from initial experimentation.

This comment has been minimized.

Copy link
@DavidKorczynski

DavidKorczynski May 8, 2023

Author Collaborator

For reference, I don't understand fully why the issue that is being fixed happens.

This loop, and in particular self.defs can be different from two runs of the same program. Sometimes, defs will be large, and this will cause the following loop here to take ages (~100 hours or so I believe). Setting an upper limit had no visible effects on my local runs besides making the process run faster. This caused several OSS-Fuzz projects to sometime build successfully, and sometimes not.

I currently suspect that this flakiness happens because there is some non-determinism somewhere in the process, and I am not sure at all where it is.


for i in range(len(self.defs)):
#logger.info("Def-idx-%d"%(i))
changed_something = False
for ns, current_def in self.defs.items():
#logger.info("Def-idx2-%d"%(idx2))
# the name pointer of the definition we're currently iterating
current_name_pointer = current_def.get_name_pointer()
#print("Name point: %s"%(str(current_name_pointer)))
# iterate the names the current definition points to items
# for name in current_name_pointer.get():
for name in current_name_pointer.get().copy():
Expand All @@ -172,7 +185,6 @@ def update_pointsto_args(pointsto_args, arg, name):
pointsto_name_pointer.add_arg(arg_name, arg)
continue
changed_something = changed_something or update_pointsto_args(pointsto_args, arg, current_def.get_ns())

if not changed_something:
break

Expand Down
Loading

0 comments on commit 507c89d

Please sign in to comment.