Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow between a memcpy and a Load is broken in SVFG #1589

Open
TrivikramAT opened this issue Nov 6, 2024 · 1 comment
Open

Dataflow between a memcpy and a Load is broken in SVFG #1589

TrivikramAT opened this issue Nov 6, 2024 · 1 comment

Comments

@TrivikramAT
Copy link

Hello,

I am trying to find if a taint propagates from main() to a certain if-condition. In the following test program, the taint (argc) flows through a memcpy instruction as well as a direct assignment:

int main(int argc, char *argv[])
{
    int a, b;
    a = argc;
    memcpy(&b, &argc, sizeof(int));
    if(a)
        if(b)
            printf("a and b\n");
    return 0;
}

When I generate the SVFG using the command wpa -ander -svfg -dump-vfg -opt-svfg=false memcpy_simple.ll , I get two disconnected sections, one for if(a) and one for if(b) (truncated for clarity):

Dataflow into if(a):
image

Dataflow into if(b):
image

I had expected the SVFG to connect the StoreVFGNode containing the memcpy instruction, to the LoadVFGNode in image 2.
Is this a feature which SVF supports out of the box? Or would I need to write some analysis on top of it myself?

Thanks

PS: I have attached the C file, LLVM IR, and the graph SVG for your reference.
SVF_Issue.zip

@for-just-we
Copy link

for-just-we commented Nov 12, 2024

I think there is such value-flow, the FormalINSVFGNode whose ID is 48 may be what you are looking for

Source Code

#include <string.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    int a, b;
    a = argc;
    memcpy(&b, &argc, sizeof(int));
    if(a)
        if(b)
            printf("a and b\n");
    return 0;
}

LLVM:

define dso_local i32 @main(i32 noundef %argc, i8** noundef %argv) #0 {
entry:
  %retval = alloca i32, align 4
  %argc.addr = alloca i32, align 4
  %argv.addr = alloca i8**, align 8
  %a = alloca i32, align 4
  %b = alloca i32, align 4
  store i32 0, i32* %retval, align 4
  store i32 %argc, i32* %argc.addr, align 4
  store i8** %argv, i8*** %argv.addr, align 8
  %0 = load i32, i32* %argc.addr, align 4
  store i32 %0, i32* %a, align 4
  %1 = bitcast i32* %b to i8*
  %2 = bitcast i32* %argc.addr to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %1, i8* align 4 %2, i64 4, i1 false)
  %3 = load i32, i32* %a, align 4
  %tobool = icmp ne i32 %3, 0
  br i1 %tobool, label %if.then, label %if.end3

if.then:                                          ; preds = %entry
  %4 = load i32, i32* %b, align 4
  %tobool1 = icmp ne i32 %4, 0
  br i1 %tobool1, label %if.then2, label %if.end

if.then2:                                         ; preds = %if.then
  %call = call i32 (i8*, ...) @printf(i8* noundef getelementptr inbounds ([9 x i8], [9 x i8]* @.str, i64 0, i64 0))
  br label %if.end

if.end:                                           ; preds = %if.then2, %if.then
  br label %if.end3

if.end3:                                          ; preds = %if.end, %entry
  ret i32 0
}

Memory Layouts:

memory objects type size(bytes) object-type
alloca(retval) i32 4 stack
alloca(argc.addr) i32 4 stack
alloca(argv.addr) i8* 8 stack
alloca(a) i32 4 stack
alloca(b) i32 4 stack

Andersen Analysis Results:

pointers point-to set
%a { alloca(a) }
%b { alloca(b) }
%argc.addr { alloca(argc.addr) }
%argv.addr { alloca(argv.addr) }
%1 { alloca(b) }
%2 { alloca(argc.addr) }

Critical Store And Loads

instructions behavior
store i32 %argc, i32* %argc.addr write value %argc to alloca(argc.addr)
%0 = load i32, i32* %argc.addr load value in alloca(argc.addr) to register %0
store i32 %0, i32* %a store value in alloca(argc.addr) to alloca(a)
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %1, i8* align 4 %2, i64 4, i1 false) copy value in alloca(argc.addr) to alloca(b)

1.value flow of a = argc

In SVFG, solid lines refer to direct value-flow which do not require pointer analysis.
While dashed lines refer to indirect value-flow, which refer to relations from store to load/store instruction.

For a = argc, we need build a value-flow from the last write of alloca(argc.addr) to the write of alloca(a).

  • last write of alloca(argc.addr) is store i32 %argc, i32* %argc.addr

  • write of alloca(a) is store i32 %0, i32* %a

The total value-flow contains:

  • indirect value-flow store i32 %argc, i32* %argc.addr --> %0 = load i32, i32* %argc.addr. Which we need pointer analysis to determine %0 load the last write of alloca(argc.addr)

  • direct value-flow %0 = load i32, i32* %argc.addr --> store i32 %0, i32* %a. Which can directly be retrived in LLVM IR.

The part of value-flow is reflected as:

截屏2024-11-12 12 24 03

2.value flow of memcpy

memcpy(&b, &argc, sizeof(int)); is a little bit complex, the corresponding IR is call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %1, i8* align 4 %2, i64 4, i1 false).

It should build a value-flow from the last write of alloca(argc.addr) to the write of alloca(b).

The handle of memcpy API is defined is SVFIRBuilder::handleExtCall. Where in SVFIRBuilder::addComplexConsForExt, for memcpy(p1, p2), SVF will build a dummyNode dummy and a load dummy = load p2, and a store store dummy, p1.
In this case, p1 is (b + 0), p2 is (argc + 0) (They both have field access, default to 0).

In SVFG, it can be reflected as:

截屏2024-11-12 12 29 33

The LoadStmt 41 and storeStmt 55 is new built relations. Var94 is the DummyValVar, Var93 is %2 and Var92 is %1.
Now the problem is the indirect value flow from last write to alloca(argc.addr) is still missing. I think the FormalINSVFGNode ID: 78 {fun: main}14V_1 = ENCHI(MR_14V_1) pts{96 } what we look for. But link it to store i32 %argc, i32* %argc.addr, align 4 needs dig into the construction of Memory SSA.

If you dump Constraint Graph, you can see:

截屏2024-11-12 14 24 56

Where:

  • 58 is %2 = bitcast i32* %argc.addr to i8*, which equals to &argc in source-code:

  • 93 can be deemed as &argc + 0

  • 92 can bee deemed as &argv + 0

  • 57 is %1 = bitcast i32* %b to i8*, which equals to &b.

3.Value-flow of if(b)

As for if(b), the value-flow regard it is:

截屏2024-11-12 14 31 36

I think the FormalINSVFGNode ID: 77 {fun: main}10V_1 = ENCHI(MR_10V_1) pts{51 } is what you are looking for.
The problem is that SVF did not directly show such value-flow and not easy to use such value-flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants