Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead Code Elimination Transformation #979

Merged
merged 42 commits into from
May 3, 2023

Conversation

just-max
Copy link
Contributor

@just-max just-max commented Feb 6, 2023

Dead code elimination transformation as suggested by Simmo in #850. The implementation in its current state is still quite messy, but all the important bits should be there.

The transformation:

  • removes dead code (statements) within functions
  • removes functions that are entirely dead
  • using CIL Rmtmps/RmUnused, tracks references between globals (functions, variables, named types, struct/union types, enums), then removes any that aren't referenced (possibly transitively) by live code

This PR also makes small changes to the transformation subsystem:

  • expand query/ask for access to liveness information
  • streamline output to file

Merging will close #850.

TODO:

  • Clean up code, move all to own module
  • Put transformation behind appropriate option and clean up output to file

src/framework/control.ml Outdated Show resolved Hide resolved
src/framework/control.ml Outdated Show resolved Hide resolved
src/transform/deadCode.ml Outdated Show resolved Hide resolved
src/util/options.schema.json Show resolved Hide resolved
@just-max just-max changed the title Draft: Dead Code Elimination Transformation Dead Code Elimination Transformation Mar 2, 2023
@just-max just-max marked this pull request as ready for review March 2, 2023 10:10
@just-max
Copy link
Contributor Author

just-max commented Mar 8, 2023

As discussed with Michael, a final change would be to rename the Rmtmps CIL module, since it removes much more than temporaries. I would suggest merging this first, then refactoring to the new CIL API in a separate PR.

There should be no more material changes to this pull request.

@just-max just-max added the pr-dependency Depends or builds on another PR, which should be merged before label Mar 8, 2023
Copy link
Member

@michael-schwarz michael-schwarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as you have convinced yourself that the issue with gotos mentioned above is a non-issue, I think this is ready to merge!

@michael-schwarz
Copy link
Member

@sim642 any comments before we merge?

Copy link
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this handle the fact that there isn't a one-to-one correspondance between our CFG nodes (which we compute liveness for) and CIL statements?
That is, our CFG construction skips over many CIL statements, so we don't compute liveness for many of them. Evene moreso, CfgTools never puts Block statements into nodes (which are handled here), but instead Instr statements (which are left into the default true case here).

It would be good to have (e.g. cram) tests for this that really show all aspects working, because I'm not entirely convinced right now.
CIL transformations working at the level of statements and locations instead of our CFG nodes has been an issue in the past (other transformations have a bunch of hackery regarding this), so it's odd that this transformation can get away with not needing any such special treatment.

src/framework/control.ml Outdated Show resolved Hide resolved
src/transform/transform.ml Outdated Show resolved Hide resolved
src/transform/deadCode.ml Outdated Show resolved Hide resolved
src/transform/deadCode.ml Outdated Show resolved Hide resolved
src/transform/deadCode.ml Outdated Show resolved Hide resolved
@sim642 sim642 removed the pr-dependency Depends or builds on another PR, which should be merged before label Mar 15, 2023
@just-max
Copy link
Contributor Author

just-max commented Mar 25, 2023

How does this handle the fact that there isn't a one-to-one correspondence between our CFG nodes (which we compute liveness for) and CIL statements?

Edit: Looks like I still had some misunderstandings about CFG construction. The current implementation kind of gets lucky; if statements and while loops have a location that corresponds to the condition node in the CFG, and are thus kept if the condition node is live, and single Instr and Return statements do all directly map to nodes in the CFG for which liveness can be checked.

All other CIL statements, which are not in the CFG at all, are always kept. This includes plain Block statements.
This also means though that e.g. Goto statements are not removed, since they don't appear as nodes in the CFG.

To do things properly, it looks like it is necessary to traverse through CIL statements in the same way as CFG construction. Then, liveness of If, Loop and Instr/Return can be checked as before, but liveness of anything else would be decided by liveness of the preceding CFG node.


Original comment:

CIL statements are put in three categories by this transformation:

  • in the (solver) result, dead in all contexts
  • in the result, but live in some contexts
  • not in the result

Only the first category is removed. Block statements fall under the third category, and are not removed.

(By "result" I mean the lh hashtable of solver results by CFG node and context.)

which are left into the default true case here

The case distinction only handles the recursion. A dead Instr is removed above:

and impl_stmt stmt =
if not (f stmt) then false
else
match stmt.skind with


CIL transformations working at the level of statements and locations instead of our CFG nodes has been an issue in the past

Yes, it's not the prettiest. But ultimately the transformation must occur on a syntax level on the original CIL file, short of regenerating the C code from scratch out of the CFG. From my understanding, other transformations need a bit of a hack because they go through the query system. This transformation sort of gets around that with the new Transform.queries must_be_dead parameter, which directly looks up a statement/location in the map of results joined by statement/location. As long as each CFG node corresponds to one CIL statement, this should work.

Edit: it would probably make sense for transformations to work on statements/function declarations (i.e., pass those to ask), which makes things slightly better by not conflating e.g. a statement and a block containing only that statement.

@just-max just-max marked this pull request as draft March 27, 2023 05:15
@just-max
Copy link
Contributor Author

To do things properly, it looks like it is necessary to traverse through CIL statements in the same way as CFG construction. Then, liveness of If, Loop and Instr/Return can be checked as before, but liveness of anything else would be decided by liveness of the preceding CFG node.

This is essentially what I've implemented now. Statements that are skipped during CFG construction are tracked directly. Then, a statement between two nodes is considered dead if the preceding or following node is dead (with appropriate handling for statements on multiple such paths).

@just-max
Copy link
Contributor Author

just-max commented Apr 19, 2023

Integrated review (ff9aa70...just-max:goblint-analyzer:deadcode_removal).

The regression cram tests are currently failing because apparently cram only sees an empty stdout. Probably some part of file redirection trickery in transform.sh not working right.

Of course it comes down to bash not waiting for processes inside process substitution. Fixed by using temp file instead.

Also, it probably never makes sense for cram tests to have backtraces enabled (in CI), since those are very fragile.


Edit: Making this work on Mac without having such a system locally is quite frustrating. The /bin/bash in the image is version 3.2, well on its way to being 20 years old!

@michael-schwarz
Copy link
Member

Edit: Making this work on Mac without having such a system locally is quite frustrating.

I feel you, I have the same issue all the time myself too!

@michael-schwarz michael-schwarz marked this pull request as ready for review May 3, 2023 07:24
@just-max
Copy link
Contributor Author

just-max commented May 3, 2023

As Simmo suggested, I've used a string constant in dead conditions. Everything should be ready now.

@just-max just-max mentioned this pull request May 3, 2023
@michael-schwarz michael-schwarz merged commit fc9d23b into goblint:master May 3, 2023
michael-schwarz added a commit that referenced this pull request May 5, 2023
LageTs pushed a commit to serenita/goblint-analyzer that referenced this pull request May 21, 2023
@sim642 sim642 added this to the v2.2.0 milestone Sep 11, 2023
sim642 added a commit to sim642/opam-repository that referenced this pull request Sep 13, 2023
CHANGES:

* Add `setjmp`/`longjmp` analysis (goblint/analyzer#887, goblint/analyzer#970, goblint/analyzer#1015, goblint/analyzer#1019).
* Refactor race analysis to lazy distribution (goblint/analyzer#1084, goblint/analyzer#1089, goblint/analyzer#1136, goblint/analyzer#1016).
* Add thread-unsafe library function call analysis (goblint/analyzer#723, goblint/analyzer#1082).
* Add mutex type analysis and mutex API analysis (goblint/analyzer#800, goblint/analyzer#839, goblint/analyzer#1073).
* Add interval set domain and string literals domain (goblint/analyzer#901, goblint/analyzer#966, goblint/analyzer#994, goblint/analyzer#1048).
* Add affine equalities analysis (goblint/analyzer#592).
* Add use-after-free analysis (goblint/analyzer#1050, goblint/analyzer#1114).
* Add dead code elimination transformation (goblint/analyzer#850, goblint/analyzer#979).
* Add taint analysis for partial contexts (goblint/analyzer#553, goblint/analyzer#952).
* Add YAML witness validation via unassume (goblint/analyzer#796, goblint/analyzer#977, goblint/analyzer#1044, goblint/analyzer#1045, goblint/analyzer#1124).
* Add incremental analysis rename detection (goblint/analyzer#774, goblint/analyzer#777).
* Fix address sets unsoundness (goblint/analyzer#822, goblint/analyzer#967, goblint/analyzer#564, goblint/analyzer#1032, goblint/analyzer#998, goblint/analyzer#1031).
* Fix thread escape analysis unsoundness (goblint/analyzer#939, goblint/analyzer#984, goblint/analyzer#1074, goblint/analyzer#1078).
* Fix many incremental analysis issues (goblint/analyzer#627, goblint/analyzer#836, goblint/analyzer#835, goblint/analyzer#841, goblint/analyzer#932, goblint/analyzer#678, goblint/analyzer#942, goblint/analyzer#949, goblint/analyzer#950, goblint/analyzer#957, goblint/analyzer#955, goblint/analyzer#954, goblint/analyzer#960, goblint/analyzer#959, goblint/analyzer#1004, goblint/analyzer#558, goblint/analyzer#1010, goblint/analyzer#1091).
* Fix server mode for abstract debugging (goblint/analyzer#983, goblint/analyzer#990, goblint/analyzer#997, goblint/analyzer#1000, goblint/analyzer#1001, goblint/analyzer#1013, goblint/analyzer#1018, goblint/analyzer#1017, goblint/analyzer#1026, goblint/analyzer#1027).
* Add documentation for configuration JSON schema and OCaml API (goblint/analyzer#999, goblint/analyzer#1054, goblint/analyzer#1055, goblint/analyzer#1053).
* Add many library function specifications (goblint/analyzer#962, goblint/analyzer#996, goblint/analyzer#1028, goblint/analyzer#1079, goblint/analyzer#1121, goblint/analyzer#1135, goblint/analyzer#1138).
* Add OCaml 5.0 support (goblint/analyzer#1003, goblint/analyzer#945, goblint/analyzer#1162).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dead code elimination transformation
3 participants