Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect translation of phi nodes in llvmir2hll #658

Closed
PeterMatula opened this issue Sep 20, 2019 · 0 comments
Closed

Incorrect translation of phi nodes in llvmir2hll #658

PeterMatula opened this issue Sep 20, 2019 · 0 comments
Assignees
Milestone

Comments

@PeterMatula
Copy link
Collaborator

After #652 I had to disable IDA plugin tests:

tools.idaplugin.feedback.01-prochazka-17.2.2015.Test_17_2_2015 (unp1b_good.out --select 0x409640)

Ok state before:

        switch (errorCode) {
            case 0: {
                // 0x4096a6
                g4 = 0;
                // break -> 0x4096c4
                break;
            }
            case 2: {
            }
            case 3: {
                // 0x4096aa
                g4 = 6;
                // break -> 0x4096c4
                break;
            }
            case 5: {
                // 0x4096b1
                g4 = 4;
                // break -> 0x4096c4
                break;
            }
            case 8: {
                // 0x4096b8
                g4 = 3;
                // break -> 0x4096c4
                break;
            }
            case 87: {
                // 0x4096bf
                g4 = 2;
                // break -> 0x4096c4
                break;
            }
        }
// ...
int32_t result = g4; // 0x4096d4
return result;

State after register localization:

    switch (errorCode) {
        case 0: {
            // break -> 0x4096c4
            break;
        }
        case 2: {
        }
        case 3: {
            // break -> 0x4096c4
            break;
        }
        case 5: {
            // break -> 0x4096c4
            break;
        }
        case 8: {
            // break -> 0x4096c4
            break;
        }
        case 87: {
            // break -> 0x4096c4
            break;
        }
    }
// ...
return 1;

At first, I thought register localization caused LLVM passes to optimize out the correct code. But then I saw this in the final LLVM IR module:

  switch i32 %v0_40968d, label %dec_label_pc_4096c4 [
    i32 0, label %dec_label_pc_4096a6
    i32 2, label %dec_label_pc_4096aa
    i32 3, label %dec_label_pc_4096aa
    i32 5, label %dec_label_pc_4096b1
    i32 8, label %dec_label_pc_4096b8
    i32 87, label %dec_label_pc_4096bf
  ]

dec_label_pc_4096a6:                              ; preds = %dec_label_pc_409688
  br label %dec_label_pc_4096c4

dec_label_pc_4096aa:                              ; preds = %dec_label_pc_409688, %dec_label_pc_409688
  br label %dec_label_pc_4096c4

dec_label_pc_4096b1:                              ; preds = %dec_label_pc_409688
  br label %dec_label_pc_4096c4

dec_label_pc_4096b8:                              ; preds = %dec_label_pc_409688
  br label %dec_label_pc_4096c4

dec_label_pc_4096bf:                              ; preds = %dec_label_pc_409688
  br label %dec_label_pc_4096c4
; ...
dec_label_pc_4096c4:
   %esi.0 = phi i32 [ 1, %dec_label_pc_409688 ], [ 2, %dec_label_pc_4096bf ], [ 3, %dec_label_pc_4096b8 ], [ 4, %dec_label_pc_4096b1 ], [ 6, %dec_label_pc_4096aa ], [ 0, %dec_label_pc_4096a6 ], [ 0, %dec_label_pc_40967f ]
   ret i32 %esi.0

So, all the needed code is in there, but it looks like llvmir2hll ignores the phi node and takes only the first value in it. Ideally, it should handle this correctly, but ...
It looks like LLVM optimizations went much further than we needed in this case. Basically, the whole switch got optimized into a single phi node. It has no function anymore - other than giving phi node info from which BB control flow came, so it can decide which value to use. I don't see how we could reasonably represent this in C, without basically reverting the optimization and putting the assignments back to switch case bodies. So this is probably the reason llvmir2hll does not handle this.

@s3rvac I will need to discuss this with you, since this is quite an interesting problem without an obvious solution.

@PeterMatula PeterMatula added this to the RetDec v4+ milestone Sep 20, 2019
PeterMatula added a commit to avast/retdec-regression-tests that referenced this issue Sep 20, 2019
@s3rvac s3rvac removed their assignment Oct 1, 2019
@PeterMatula PeterMatula modified the milestones: RetDec v4+, RetDec v4 Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants