simple analysis

09 January 2025
RE

So we want to understand the value of a memory location within a given basic block of a CFG? No problem. Enter, value slicing

Consider the following

int64_t four,
int32_t five,
int64_t six,
int32_t seven,
int64_t eight,
int32_t nine
);

void 
app_main(void)
{
  int rc = bigcall(101, 102, 103, 104, 105, 106, 107, 108, 109);
  
  printf("bigcall() return %d\n",rc); // 945
}

and then the asm of the bigcall function:

Dump of assembler code for function bigcall:
   0x42005052 <+0>:	addi	sp,sp,-16
   0x42005054 <+2>:	sw	s0,0(sp)
   0x42005056 <+4>:	sw	ra,4(sp)
   0x42005058 <+6>:	addi	s0,sp,16
   0x4200505a <+8>:	add	a0,a0,a1
   0x4200505c <+10>:	add	a0,a0,a3
   0x4200505e <+12>:	add	a0,a0,a4
   0x42005060 <+14>:	add	a0,a0,a6
   0x42005062 <+16>:	add	a0,a0,a7
   0x42005064 <+18>:	lw	t0,4(s0)
   0x42005068 <+22>:	add	a0,a0,t0
   0x4200506a <+24>:	lw	t0,8(s0)
   0x4200506e <+28>:	add	a0,a0,t0
   0x42005070 <+30>:	lw	t0,16(s0)
   0x42005074 <+34>:	add	a0,a0,t0
   0x42005076 <+36>:	sw	zero,8(sp)
   0x42005078 <+38>:	lw	ra,4(sp)
   0x4200507a <+40>:	lw	s0,0(sp)
   0x4200507c <+42>:	addi	sp,sp,16
   0x4200507e <+44>:	ret
End of assembler dump.

This function is one basic block. We can use value slicing to figure out the return value (register a0). Start at step 1 and work towards step 12.

Dump of assembler code for function bigcall:
0x42005052 <+0>: addi sp,sp,-16
0x42005054 <+2>: sw s0,0(sp)
0x42005056 <+4>: sw ra,4(sp)
0x42005058 <+6>: addi s0,sp,16

(step 12, final) a0 = (((((((a0 + a1) + a3) + a4) + a6) + a7) + 4(sp+16)) + 8(sp+16)) + 16(sp+16)

0x4200505a <+8>: add a0,a0,a1

(step 11) a0 = (((((((a0 + a1) + a3) + a4) + a6) + a7) + 4(s0)) + 8(s0)) + 16(s0)

0x4200505c <+10>: add a0,a0,a3

(step 10) a0 = ((((((a0 + a3) + a4) + a6) + a7) + 4(s0)) + 8(s0)) + 16(s0)

0x4200505e <+12>: add a0,a0,a4

(step 9) a0 = (((((a0 + a4) + a6) + a7) + 4(s0)) + 8(s0)) + 16(s0)

0x42005060 <+14>: add a0,a0,a6

(step 8) a0 = ((((a0 + a6) + a7) + 4(s0)) + 8(s0)) + 16(s0)

0x42005062 <+16>: add a0,a0,a7

(step 7) a0 = (((a0 + a7) + 4(s0)) + 8(s0)) + 16(s0)

0x42005064 <+18>: lw t0,4(s0)****

(step 6) a0 = ((a0 + 4(s0)) + 8(s0)) + 16(s0)

0x42005068 <+22>: add a0,a0,t0

(step 5) a0 = ((a0+t0)+8(s0)) + 16(s0)

0x4200506a <+24>: lw t0,8(s0)

(step 4) a0 = (a0+8(s0)) + 16(s0)

0x4200506e <+28>: add a0,a0,t0

(step 3) a0 = (a0+t0) + 16(s0)

0x42005070 <+30>: lw t0,16(s0)

(step 2) a0 = a0 + 16(s0)

0x42005074 <+34>: add a0,a0,t0

(start here, step 1) a0 = a0 + t0

0x42005076 <+36>: sw zero,8(sp)
0x42005078 <+38>: lw ra,4(sp)
0x4200507a <+40>: lw s0,0(sp)
0x4200507c <+42>: addi sp,sp,16
0x4200507e <+44>: ret
End of assembler dump.

So our return value is the calculation a0 = (((((((a0 + a1) + a3) + a4) + a6) + a7) + 4(sp+16)) + 8(sp+16)) + 16(sp+16)

The calling convention for riscv is pass-by-register (a0-a7) and then switches to storing values on the stack when we run out of registers. Considering bigcall takes 9 parameters, and just adds them up, this calculation makes sense.

This is pretty useful in basic taint analysis. Suppose you have some known dangerous function, that's exploitable when a parameter meets certain criterial. You can trace that parameter back to determine if it includes user input. This is a kind of source (user input) and sink (dangerous operation) methodology.

Citations

https://asm-docs.microagi.org/riscv/html/riscv-asm.html

Next →
identifying grapth flattening