Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leak in llvm code generated by rust notfound: functions with String return parameter #643

Closed
4 of 7 tasks
StamesJames opened this issue Jul 17, 2023 · 7 comments
Closed
4 of 7 tasks

Comments

@StamesJames
Copy link
Contributor

StamesJames commented Jul 17, 2023

  • I have searched open and closed issues for duplicates
  • I made sure that I am not using an old project version (DO: pull Phasar, update git submodules, rebuild the project and check if the bug is still there)

Bug description

I'm trying to find leaks in llvm code generated with rust for the following programm:

#[inline(never)]
#[no_mangle]
fn source() -> String {
    "Test".to_string()
}

#[inline(never)]
#[no_mangle]
fn sink(source: &str) -> String {
    source.to_string()
}

#[inline(never)]
#[no_mangle]
fn sanitize(source: &str) -> String {
    source.to_owned()
}

fn main() {
    let unsanitized = source();
    let source = source();
    let sanitized = sanitize(&source);
    let sink_unsanitized = sink(&unsanitized);
    let sink_sanitized = sink(&sanitized);
    println!("{sink_unsanitized}");
    println!("{sink_sanitized}");
}

A simpler example worked ( #642 ) now I changed the functions from returning ints to returning Strings. They get compiled to the following llvm code:

; Function Attrs: noinline nonlazybind uwtable
define dso_local void @source(%"alloc::string::String"* sret(%"alloc::string::String") %0) unnamed_addr #1 {
start:
; call <str as alloc::string::ToString>::to_string
  call void @"_ZN47_$LT$str$u20$as$u20$alloc..string..ToString$GT$9to_string17h488739110bf80537E"(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 bitcast (<{ [4 x i8] }>* @alloc63 to [0 x i8]*), i64 4)
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}

; Function Attrs: noinline nonlazybind uwtable
define dso_local void @sink(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 %source.0, i64 %source.1) unnamed_addr #1 {
start:
; call <str as alloc::string::ToString>::to_string
  call void @"_ZN47_$LT$str$u20$as$u20$alloc..string..ToString$GT$9to_string17h488739110bf80537E"(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 %source.0, i64 %source.1)
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}

I set my analysis-config to:

{
    "name": "taint-03-simple-functions-string",
    "version": 1,
    "functions": [
        {
            "name": "source",
            "params": {
                "source": [0]
            }
        },
        {
            "name": "sink",
            "params": {
                "sink": [1]
            }
        },
        {
            "name": "sanitize",
            "ret": "sanitizer"
        }
    ],
    "variables": []
  }

because in my understanding the two functions now don't return anything but get a pointer to which they write the value to return.
I Invoke my analysis with

phasar-cli \
   -m target/debug/deps/sql_injection_03_simple_requests-0a2c4db10e6afc34.ll \
   -D ifds-taint \
   --analysis-config=analysis-config.json \
   --entry-points _ZN32sql_injection_03_simple_requests4main17h3819e5f83b074069E

Where _ZN32sql_injection_03_simple_requests4main17h3819e5f83b074069E is the mangled name of my main function.

If I set the 0th parameter of the sink function as sink, phasar reports a leak but it's not simply the leaked variable obtained by the source function but some very long description. Here the first lines of that

Leak(s):
IR  : %"core::fmt::Arguments"* %0 | ID: _ZN4core3fmt9Arguments6new_v117hc8a21f4658044cffE.0
IR  : %"alloc::string::String"* %0 | ID: _ZN5alloc6string6String19from_utf8_unchecked17h6553b59f13851d7cE.0
IR  : %"alloc::vec::Vec<u8>"* %bytes | ID: _ZN5alloc6string6String19from_utf8_unchecked17h6553b59f13851d7cE.1
IR  : @alloc55 = private unnamed_addr constant <{ [75 x i8] }> <{ [75 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/fmt/mod.rs" }>, align 1, !psr.id !4 | ID: 4
IR  : @alloc59 = private unnamed_addr constant <{ [74 x i8] }> <{ [74 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/alloc.rs" }>, align 1, !psr.id !8 | ID: 8
IR  : @alloc60 = private unnamed_addr constant <{ i8*, [16 x i8] }> <{ i8* getelementptr inbounds (<{ [74 x i8] }>, <{ [74 x i8] }>* @alloc59, i32 0, i32 0, i32 0), [16 x i8] c"J\00\00\00\00\00\00\00\AC\00\00\00\1B\00\00\00" }>, align 8, !psr.id !9 | ID: 9
IR  : @alloc61 = private unnamed_addr constant <{ [76 x i8] }> <{ [76 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/raw_vec.rs" }>, align 1, !psr.id !10 | ID: 10
IR  : @alloc62 = private unnamed_addr constant <{ i8*, [16 x i8] }> <{ i8* getelementptr inbounds (<{ [76 x i8] }>, <{ [76 x i8] }>* @alloc61, i32 0, i32 0, i32 0), [16 x i8] c"L\00\00\00\00\00\00\00\F7\00\00\00;\00\00\00" }>, align 8, !psr.id !11 | ID: 11
IR  : @alloc22 = private unnamed_addr constant <{ [1 x i8] }> <{ [1 x i8] c"\0A" }>, align 1, !psr.id !13 | ID: 13
IR  : @alloc21 = private unnamed_addr constant <{ i8*, [8 x i8], i8*, [8 x i8] }> <{ i8* bitcast (<{}>* @alloc20 to i8*), [8 x i8] zeroinitializer, i8* getelementptr inbounds (<{ [1 x i8] }>, <{ [1 x i8] }>* @alloc22, i32 0, i32 0, i32 0), [8 x i8] c"\01\00\00\00\00\00\00\00" }>, align 8, !psr.id !14 | ID: 14
IR  : %_2 = call i8* @"_ZN4core3ptr6unique15Unique$LT$T$GT$6as_ptr17h3b210c5ac01b064fE"(i8* %unique), !psr.id !18 | ID: 15
IR  : i8* %unique | ID: _ZN119_$LT$core..ptr..non_null..NonNull$LT$T$GT$$u20$as$u20$core..convert..From$LT$core..ptr..unique..Unique$LT$T$GT$$GT$$GT$4from17hd493d251c602c8e8E.0
IR  : %0 = call i8* @"_ZN4core3ptr8non_null16NonNull$LT$T$GT$13new_unchecked17h6f1d783941022635E"(i8* %_2), !psr.id !20 | ID: 17

But in my understanding the 0th parameter is no sink parameter because it acts as the return value but the 1st and 2nd should produce a leak because here values from inside the source String get passed.
I attached all relevant files.

Steps to reproduce

  • compile with cargo using rust-toolchain.toml and .cargo/config.toml
  • run phasar-cli taint analysis with custom entry-point and analysis-config.json

Actual result: Describe here what happens after you run the steps above (i.e. the buggy behaviour)

  • phasar dosn't find the leak

Expected result: Describe here what should happen after you run the steps above (i.e. what would be the correct behaviour)

  • phasar finds a leak because some values from the source String variable get passed into the sink function
    • not the whole String gets passed because of rusts String dereferencing into a string slice which in my understandign of the generated llvm code is then not passed as a struct but as its two components separately

Context (Environment)

Operating System:

  • Linux
  • Windows
  • macOS

Build Type:

  • cmake
  • custom build

Example files

Files:

examplefiles.zip

@MMory
Copy link
Member

MMory commented Jul 28, 2023

Hi @StamesJames,

it's good that someone is letting phasar analyze some Rust code.

With the files you provided I am unable to compile the sample, as cargo wants a Cargo.toml and I am a Rust noob not knowing where I would get that from. I think it would be the easiest for me if you could provide me the full IR file that you try to analyze.

Cheers
Martin

@MMory
Copy link
Member

MMory commented Jul 28, 2023

Correction: I followed your instructions in the other issue and was able to build your example. Will look into it now.

@MMory
Copy link
Member

MMory commented Jul 28, 2023

Another correction: my rustc/cargo build IR for LLVM >14, which phasar cannot analyze. Please provide your IR file :)

@StamesJames
Copy link
Contributor Author

Hi @MMory

sorry I wrote the issue a bit in a rush. Here is the corrected version
example_files.zip
The IR is in the root folder now.
It also should work now to cargo build inside the root folder. The right rust version is specified in the rust-toolchain.toml file and the compiler options to build the IR inside the target/debug/deps folder are specified in the .cargo/config.toml file.

@MMory
Copy link
Member

MMory commented Aug 4, 2023

Hi @StamesJames, in case you didn't notice: we merged a fix that should address your issue.

@MMory
Copy link
Member

MMory commented Sep 21, 2023

Hi @StamesJames, could you please provide feedback w.r.t. the fix we merged on Jul 31?

@StamesJames
Copy link
Contributor Author

Hi @MMory,
yes ofcourse.
I was able to find the leak with the newest version on the development branch I build as a docker image. I will try to find leaks in more complex examples now.

@MMory MMory closed this as completed Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants