simple analysis
So we want to understand the value of a memory location within a given basic block of a CFG? No problem. Enter, value slicing
Consider the following
int64_t four,
int32_t five,
int64_t six,
int32_t seven,
int64_t eight,
int32_t nine
);
void
app_main(void)
{
int rc = bigcall(101, 102, 103, 104, 105, 106, 107, 108, 109);
printf("bigcall() return %d\n",rc); // 945
}
and then the asm of the bigcall function:
Dump of assembler code for function bigcall:
0x42005052 <+0>: addi sp,sp,-16
0x42005054 <+2>: sw s0,0(sp)
0x42005056 <+4>: sw ra,4(sp)
0x42005058 <+6>: addi s0,sp,16
0x4200505a <+8>: add a0,a0,a1
0x4200505c <+10>: add a0,a0,a3
0x4200505e <+12>: add a0,a0,a4
0x42005060 <+14>: add a0,a0,a6
0x42005062 <+16>: add a0,a0,a7
0x42005064 <+18>: lw t0,4(s0)
0x42005068 <+22>: add a0,a0,t0
0x4200506a <+24>: lw t0,8(s0)
0x4200506e <+28>: add a0,a0,t0
0x42005070 <+30>: lw t0,16(s0)
0x42005074 <+34>: add a0,a0,t0
0x42005076 <+36>: sw zero,8(sp)
0x42005078 <+38>: lw ra,4(sp)
0x4200507a <+40>: lw s0,0(sp)
0x4200507c <+42>: addi sp,sp,16
0x4200507e <+44>: ret
End of assembler dump.
This function is one basic block. We can use value slicing to figure out the return value (register a0). Start at step 1 and work towards step 12.
Dump of assembler code for function bigcall:
0x42005052 <+0>: addi sp,sp,-16
0x42005054 <+2>: sw s0,0(sp)
0x42005056 <+4>: sw ra,4(sp)
0x42005058 <+6>: addi s0,sp,16
(step 12, final) a0 = (((((((a0 + a1) + a3) + a4) + a6) + a7) + 4(sp+16)) + 8(sp+16)) + 16(sp+16)
0x4200505a <+8>: add a0,a0,a1
(step 11) a0 = (((((((a0 + a1) + a3) + a4) + a6) + a7) + 4(s0)) + 8(s0)) + 16(s0)
0x4200505c <+10>: add a0,a0,a3
(step 10) a0 = ((((((a0 + a3) + a4) + a6) + a7) + 4(s0)) + 8(s0)) + 16(s0)
0x4200505e <+12>: add a0,a0,a4
(step 9) a0 = (((((a0 + a4) + a6) + a7) + 4(s0)) + 8(s0)) + 16(s0)
0x42005060 <+14>: add a0,a0,a6
(step 8) a0 = ((((a0 + a6) + a7) + 4(s0)) + 8(s0)) + 16(s0)
0x42005062 <+16>: add a0,a0,a7
(step 7) a0 = (((a0 + a7) + 4(s0)) + 8(s0)) + 16(s0)
0x42005064 <+18>: lw t0,4(s0)****
(step 6) a0 = ((a0 + 4(s0)) + 8(s0)) + 16(s0)
0x42005068 <+22>: add a0,a0,t0
(step 5) a0 = ((a0+t0)+8(s0)) + 16(s0)
0x4200506a <+24>: lw t0,8(s0)
(step 4) a0 = (a0+8(s0)) + 16(s0)
0x4200506e <+28>: add a0,a0,t0
(step 3) a0 = (a0+t0) + 16(s0)
0x42005070 <+30>: lw t0,16(s0)
(step 2) a0 = a0 + 16(s0)
0x42005074 <+34>: add a0,a0,t0
(start here, step 1) a0 = a0 + t0
0x42005076 <+36>: sw zero,8(sp)
0x42005078 <+38>: lw ra,4(sp)
0x4200507a <+40>: lw s0,0(sp)
0x4200507c <+42>: addi sp,sp,16
0x4200507e <+44>: ret
End of assembler dump.
So our return value is the calculation a0 = (((((((a0 + a1) + a3) + a4) + a6) + a7) + 4(sp+16)) + 8(sp+16)) + 16(sp+16)
The calling convention for riscv is pass-by-register (a0-a7) and then switches to storing values on the stack when we run out of registers. Considering bigcall takes 9 parameters, and just adds them up, this calculation makes sense.
This is pretty useful in basic taint analysis. Suppose you have some known dangerous function, that's exploitable when a parameter meets certain criterial. You can trace that parameter back to determine if it includes user input. This is a kind of source (user input) and sink (dangerous operation) methodology.
Citations
- https://asm-docs.microagi.org/riscv/html/riscv-asm.html