Disassembly with gdb
When debugging C or C++ code, I often find it helpful to view the assembly alongside the source; it helps me understand what the code is really doing, and it may be the easiest way to debug complicated C macros and even C++ templates. I’ll show an example of analyzing disassembled code from gdb on x86-x64 Linux.
Sample C Program
We’ll make a simple program that calculates a Hamming distance between two strings of equal length:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv) {
if (argc != 3) exit(EXIT_FAILURE);
const char *s1 = argv[1], *s2 = argv[2];
const size_t len = strlen(s1);
if (len != strlen(s2)) exit(EXIT_FAILURE);
size_t distance = 0;
for (size_t i = 0; i < len; i++) {
if (s1[i] != s2[i]) {
distance++;
}
}
printf("%ld\n", distance);
return EXIT_SUCCESS;
}
We will compile it with:
gcc -g -o main main.c
Nothing fancy: no optimizations and symbols included: a typical “debug” build. To test it, run it with two strings as arguments, and it will output the Hamming distance between them:
./main abc abd
1
Starting the Debugger and Asm Layout
Let’s run the above program from the GNU gdb debugger:
gdb --args main abc abd
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main..
(gdb)
Before we go any further, I want to switch the assembly syntax from AT&T to Intel. Neither is objectively superior, but I prefer the latter:
(gdb) set disassembly-flavor intel
When debugging, I typically use an interface that displays the source code alongside the gdb command line window. My preference is cgdb, but it is also possible to use gdb in TUI mode which provides a similar user experience assuming gdb is built with the option to use it (this is not always the case).
It is possible to use an assembly layout in the tui mode:
(gdb) tui layout asm
The output looks like this:
It does display the disassembly, but it is not integrated with the C source code, which I find less than ideal. Let’s disable TUI mode and go back to vanilla gdb:
(gdb) tui disable
(gdb)
Disassemble Command
Let’s start with the disassemble command. We’ll use it with the /s modifier: it causes the debugger to print the combination of C source and disassembly which is precisely what we want:
(gdb) disas /s main
Dump of assembler code for function main:
main.c:
6 int main(int argc, char **argv) {
0x0000000000001189 <+0>: endbr64
0x000000000000118d <+4>: push rbp
0x000000000000118e <+5>: mov rbp,rsp
0x0000000000001191 <+8>: sub rsp,0x40
0x0000000000001195 <+12>: mov DWORD PTR [rbp-0x34],edi
0x0000000000001198 <+15>: mov QWORD PTR [rbp-0x40],rsi
7 if (argc != 3) exit(EXIT_FAILURE);
0x000000000000119c <+19>: cmp DWORD PTR [rbp-0x34],0x3
0x00000000000011a0 <+23>: je 0x11ac <main+35>
0x00000000000011a2 <+25>: mov edi,0x1
0x00000000000011a7 <+30>: call 0x1090 <exit@plt>
8
9 const char *s1 = argv[1], *s2 = argv[2];
0x00000000000011ac <+35>: mov rax,QWORD PTR [rbp-0x40]
0x00000000000011b0 <+39>: mov rax,QWORD PTR [rax+0x8]
0x00000000000011b4 <+43>: mov QWORD PTR [rbp-0x18],rax
0x00000000000011b8 <+47>: mov rax,QWORD PTR [rbp-0x40]
0x00000000000011bc <+51>: mov rax,QWORD PTR [rax+0x10]
0x00000000000011c0 <+55>: mov QWORD PTR [rbp-0x10],rax
10 const size_t len = strlen(s1);
0x00000000000011c4 <+59>: mov rax,QWORD PTR [rbp-0x18]
0x00000000000011c8 <+63>: mov rdi,rax
0x00000000000011cb <+66>: call 0x1070 <strlen@plt>
0x00000000000011d0 <+71>: mov QWORD PTR [rbp-0x8],rax
11
12 if (len != strlen(s2)) exit(EXIT_FAILURE);
0x00000000000011d4 <+75>: mov rax,QWORD PTR [rbp-0x10]
0x00000000000011d8 <+79>: mov rdi,rax
0x00000000000011db <+82>: call 0x1070 <strlen@plt>
0x00000000000011e0 <+87>: cmp QWORD PTR [rbp-0x8],rax
0x00000000000011e4 <+91>: je 0x11f0 <main+103>
0x00000000000011e6 <+93>: mov edi,0x1
0x00000000000011eb <+98>: call 0x1090 <exit@plt>
13
14 size_t distance = 0;
0x00000000000011f0 <+103>: mov QWORD PTR [rbp-0x28],0x0
15 for (size_t i = 0; i < len; i++) {
0x00000000000011f8 <+111>: mov QWORD PTR [rbp-0x20],0x0
0x0000000000001200 <+119>: jmp 0x122c <main+163>
16 if (s1[i] != s2[i]) {
0x0000000000001202 <+121>: mov rdx,QWORD PTR [rbp-0x18]
0x0000000000001206 <+125>: mov rax,QWORD PTR [rbp-0x20]
0x000000000000120a <+129>: add rax,rdx
0x000000000000120d <+132>: movzx edx,BYTE PTR [rax]
0x0000000000001210 <+135>: mov rcx,QWORD PTR [rbp-0x10]
0x0000000000001214 <+139>: mov rax,QWORD PTR [rbp-0x20]
0x0000000000001218 <+143>: add rax,rcx
0x000000000000121b <+146>: movzx eax,BYTE PTR [rax]
0x000000000000121e <+149>: cmp dl,al
0x0000000000001220 <+151>: je 0x1227 <main+158>
17 distance++;
0x0000000000001222 <+153>: add QWORD PTR [rbp-0x28],0x1
--Type <RET> for more, q to quit, c to continue without paging--
15 for (size_t i = 0; i < len; i++) {
0x0000000000001227 <+158>: add QWORD PTR [rbp-0x20],0x1
0x000000000000122c <+163>: mov rax,QWORD PTR [rbp-0x20]
0x0000000000001230 <+167>: cmp rax,QWORD PTR [rbp-0x8]
0x0000000000001234 <+171>: jb 0x1202 <main+121>
18 }
19 }
20
21 printf("%ld\n", distance);
0x0000000000001236 <+173>: mov rax,QWORD PTR [rbp-0x28]
0x000000000000123a <+177>: mov rsi,rax
0x000000000000123d <+180>: lea rax,[rip+0xdc0] # 0x2004
0x0000000000001244 <+187>: mov rdi,rax
0x0000000000001247 <+190>: mov eax,0x0
0x000000000000124c <+195>: call 0x1080 <printf@plt>
22
23 return EXIT_SUCCESS;
0x0000000000001251 <+200>: mov eax,0x0
24 }
0x0000000000001256 <+205>: leave
0x0000000000001257 <+206>: ret
End of assembler dump.
We used the form of disass command that takes the symbol name and displays the entire disassembled function main. If we wish to see, for example, just the first 20 bytes of the function, we can specify the range:
(gdb) disas /s main, main+20
Dump of assembler code from 0x1189 to 0x119d:
main.c:
6 int main(int argc, char **argv) {
0x0000000000001189 <main+0>: endbr64
0x000000000000118d <main+4>: push %rbp
0x000000000000118e <main+5>: mov %rsp,%rbp
0x0000000000001191 <main+8>: sub $0x40,%rsp
0x0000000000001195 <main+12>: mov %edi,-0x34(%rbp)
0x0000000000001198 <main+15>: mov %rsi,-0x40(%rbp)
7 if (argc != 3) exit(EXIT_FAILURE);
0x000000000000119c <main+19>: cmpl $0x3,-0x34(%rbp)
End of assembler dump.
We could also directly provide addresses instead of using the symbol names if we wanted:
(gdb) disas /s 0x0000000000001189, 0x000000000000119d
Dump of assembler code from 0x1189 to 0x119d:
main.c:
6 int main(int argc, char **argv) {
0x0000000000001189 <main+0>: endbr64
0x000000000000118d <main+4>: push rbp
0x000000000000118e <main+5>: mov rbp,rsp
0x0000000000001191 <main+8>: sub rsp,0x40
0x0000000000001195 <main+12>: mov DWORD PTR [rbp-0x34],edi
0x0000000000001198 <main+15>: mov QWORD PTR [rbp-0x40],rsi
7 if (argc != 3) exit(EXIT_FAILURE);
0x000000000000119c <main+19>: cmp DWORD PTR [rbp-0x34],0x3
End of assembler dump.
Single Line Disassembly
Sometimes we do want to look at the disassembly of the entire function; other times we may want to focus on a single line of C code. Let’s navigate to the line we want to inspect:
(gdb) b 16
Breakpoint 1 at 0x1202: file main.c, line 16.
(gdb) run
Starting program: /home/nemtrif/gdbasm/main abc abd
...
Breakpoint 1, main (argc=3, argv=0x7fffffffe188) at main.c:16
16 if (s1[i] != s2[i]) {
To view disassembly for a specific line, we first ask for its address range, and then use the disas command:
(gdb) info line
Line 16 of "main.c" starts at address 0x555555555202 <main+121> and ends at 0x555555555210 <main+135>.
(gdb) disas 0x555555555202 0x555555555210
A syntax error in expression, near `0x555555555210'.
(gdb) disas 0x555555555202, 0x555555555210
Dump of assembler code from 0x555555555202 to 0x555555555210:
=> 0x0000555555555202 <main+121>: mov rdx,QWORD PTR [rbp-0x18]
0x0000555555555206 <main+125>: mov rax,QWORD PTR [rbp-0x20]
0x000055555555520a <main+129>: add rax,rdx
0x000055555555520d <main+132>: movzx edx,BYTE PTR [rax]
End of assembler dump.
Addresses now begin with 0x0000555555555 instead of 0x0000000000001 after we actually started running the program, but the three last digits remain unchanged, as does the offset from main.
Another point to note is that the assembler dump shows only a subset of the assembly generated for line 16, which is easy to verify when compared with the complete disassembly listing of the function.
In any case, we can peek into the next instruction using the gdb examine command with the appropriate format /i (for “instruction”):
(gdb) x /i 0x555555555202
=> 0x555555555202 <main+121>: mov rdx,QWORD PTR [rbp-0x18]
Or we can even use the value of the rip (instruction pointer) register:
(gdb) x /i $rip
=> 0x555555555202 <main+121>: mov rdx,QWORD PTR [rbp-0x18]
For that matter, we can easily check the values of all registers:
(gdb) info registers
rax 0x0 0
rbx 0x7fffffffe188 140737488347528
rcx 0x555555557db0 93824992247216
rdx 0x7fffffffe434 140737488348212
rsi 0x7fffffffe188 140737488347528
rdi 0x7fffffffe434 140737488348212
rbp 0x7fffffffe060 0x7fffffffe060
rsp 0x7fffffffe020 0x7fffffffe020
r8 0x0 0
r9 0x7ffff7fca380 140737353917312
r10 0x7fffffffdd80 140737488346496
r11 0x203 515
r12 0x3 3
r13 0x0 0
r14 0x555555557db0 93824992247216
r15 0x7ffff7ffd000 140737354125312
rip 0x555555555202 0x555555555202 <main+121>
eflags 0x293 [ CF AF SF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fs_base 0x7ffff7da2740 140737351657280
gs_base 0x0 0
Stepping Through Instructions
If we want to step through individual assembly instructions and observe their effect, we can enable the disassemble-next-line setting:
(gdb) set disassemble-next-line on
Then we can try the ni command and see what happens:
(gdb) ni
0x0000555555555206 16 if (s1[i] != s2[i]) {
0x0000555555555202 <main+121>: 48 8b 55 e8 mov rdx,QWORD PTR [rbp-0x18]
=> 0x0000555555555206 <main+125>: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
0x000055555555520a <main+129>: 48 01 d0 add rax,rdx
0x000055555555520d <main+132>: 0f b6 10 movzx edx,BYTE PTR [rax]
Notice the “=>” sign indicating our current position. We can now see how the instruction affected the rdx register:
(gdb) p /x $rdx
$1 = 0x7fffffffe430
After several ni commands, we can reach the actual branch:
(gdb) ni
16 if (s1[i] != s2[i]) {
=> 0x000055555555521e <main+149>: 38 c2 cmp dl,al
0x0000555555555220 <main+151>: 74 05 je 0x555555555227 <main+158>
We can execute commands at the source level, and still obtain the disassembly:
(gdb) delete 1
(gdb) u 21
main (argc=3, argv=0x7fffffffe188) at main.c:21
21 printf("%ld\n", distance);
=> 0x0000555555555236 <main+173>: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
0x000055555555523a <main+177>: 48 89 c6 mov rsi,rax
0x000055555555523d <main+180>: 48 8d 05 c0 0d 00 00 lea rax,[rip+0xdc0] # 0x555555556004
0x0000555555555244 <main+187>: 48 89 c7 mov rdi,rax
0x0000555555555247 <main+190>: b8 00 00 00 00 mov eax,0x0
0x000055555555524c <main+195>: e8 2f fe ff ff call 0x555555555080 <printf@plt>
At some point we will want to stop showing disassembly:
(gdb) set disassemble-next-line off
(gdb) n
1
23 return EXIT_SUCCESS;
Official documentation for machine code debugging with gdb can be found here: Gdb - Source and Machine Code.