在 MacOS M1 Pro (ARM) 上编写一个简单的 Hello World 汇编程序。

CPU 指令集架构

通过 uname -a 来查看 CPU 指令集架构，常见的有：

$ uname -a
Darwin Pipe-Macbook 21.6.0 Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000 arm64

$ uname -p
arm

编写最简单的 ARM 汇编

参考 apple-m1-assembly-language-hello-world 以及基本的 ARM 指令含义

新建 main.s 文件，并编写以下内容

.global _main
_main:
	mov	x0, #1
	mov	x1, #2
	add	x0, x0, x1
	ret

关于 r0、x0、w0 寄存器的区别：

The aarch64 registers are named:

r0 through r30 - to refer generally to the registers

x0 through x30 - for 64-bit-wide access (same registers)

w0 through w30 - for 32-bit-wide access (same registers - upper 32 bits are either cleared on load or sign-extended (set to the value of the most significant bit of the loaded value)).

带注释版本

.global _main
// 主入口
_main:
	// 对寄存器0 赋值 1
	mov	x1, #1

	// 对寄存器1 赋值 2
	mov	x2, #2

	// 对寄存器 1 和 寄存器 2 进行相加
	// 并赋值给寄存器 0
	add	x0, x1, x2

	// 返回值
	ret

将以上汇编代码编译之后，可以得到以下内容：

// 编译代码
$ gcc main.s

// 执行代码
$ ./a.out

// 查看程序输出，应该会输出 3
$ echo $?
3

调试程序

LLDB 调试技巧

参考 x86 Assembly on MacOS，不再重复。

调试程序

通过以下命令开始调试程序：

# 开始调试程序
$ lldb a.out
(lldb) target create "a.out"
Current executable set to '/xxx/a.out' (arm64).

# 设置 main 为断点
(lldb) b main
Breakpoint 1: where = a.out`main, address = 0x0000000100003fa8

# 运行程序
(lldb) r
Process 19910 launched: '/xxx/a.out' (arm64)
Process 19910 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003fa8 a.out`main
a.out`main:
->  0x100003fa8 <+0>:  mov    x1, #0x1
    0x100003fac <+4>:  mov    x2, #0x2
    0x100003fb0 <+8>:  add    x0, x1, x2
    0x100003fb4 <+12>: ret
Target 0: (a.out) stopped.

可以看到最终的执行代码中，以上汇编代码被编译成了

0x100003fa8 <+0>:  mov    x1, #0x1
0x100003fac <+4>:  mov    x2, #0x2
0x100003fb0 <+8>:  add    x0, x1, x2
0x100003fb4 <+12>: ret

# 下一步
(lldb) n
Process 19910 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0000000100003fb0 a.out`main + 8
a.out`main:
->  0x100003fb0 <+8>:  add    x0, x1, x2
    0x100003fb4 <+12>: ret
    0x100003fb8:       udf    #0x1
    0x100003fbc:       udf    #0x1c
Target 0: (a.out) stopped.


# 执行完赋值之后，查看一下寄存器的状态
# 可以看到 x1 为 1，x2 为 2，符合预期
(lldb) re r
General Purpose Registers:
        x0 = 0x0000000000000001
        x1 = 0x0000000000000001
        x2 = 0x0000000000000002

# 下一步，执行加法逻辑
(lldb) n
Process 19910 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0000000100003fb4 a.out`main + 12
a.out`main:
->  0x100003fb4 <+12>: ret
    0x100003fb8:       udf    #0x1
    0x100003fbc:       udf    #0x1c
    0x100003fc0:       udf    #0x0
Target 0: (a.out) stopped.

# 执行完加法逻辑后，x0 为 3
(lldb) re r
General Purpose Registers:
        x0 = 0x0000000000000003
        x1 = 0x0000000000000001
        x2 = 0x0000000000000002

# 执行完成所有程序
# 可以看到最终的返回状态为 3，符合预期
(lldb) thread continue
Resuming thread 0x1817033 in process 19910
Process 19910 resuming
Process 19910 exited with status = 3 (0x00000003)

操作指令

确认 main 函数入口地址：

$ nm a.out
0000000100000000 T __mh_execute_header
0000000100003fa8 T _main

通过查看 a.out 的二进制代码：

$ xxd -s 0x3fa8 -l 20 -c 4 a.out
00003fa8: 2100 80d2  !...
00003fac: 4200 80d2  B...
00003fb0: 2000 028b   ...
00003fb4: c003 5fd6  .._.
00003fb8: 0100 0000  ....
...

可以看到，最终的二进制码如下：

|-----  运行时代码 ------------------|   |- 二进制-|   |------ 前 32 位的二进制指令 --------|
0x100003fa8 <+0>:  mov    x1, #0x1      2100 80d2   00100001 00000000 10000000 11010010
0x100003fac <+4>:  mov    x2, #0x2      4200 80d2   01000010 00000000 10000000 11010010
0x100003fb0 <+8>:  add    x0, x1, x2    2000 028b   00100000 00000000 00000010 10001011
0x100003fb4 <+12>: ret                  c003 5fd6   11000000 00000011 01011111 11010110

然后 ARM 的二进制源码是从右往左读的（这块我看了很久才意识到的），也就是按每个字节（8 bit），需要做个反转，即得到如下代码：

|-----  运行时代码 ------------------|   |- 二进制-|   |------ 前 32 位的二进制指令 --------|
0x100003fa8 <+0>:  mov    x1, #0x1      d280 0021   11010010 10000000 00000000 00100001
0x100003fac <+4>:  mov    x2, #0x2      d280 0042   11010010 10000000 00000000 01000010
0x100003fb0 <+8>:  add    x0, x1, x2    8b02 0020   10001011 00000010 00000000 00100000
0x100003fb4 <+12>: ret                  d65f 03c0   11010110 01011111 00000011 11000000

然后参考 MOV ADD RET

MOV

从操作指令上看，可以确定具体的操作为 MOV (wide immediate) 因为是 64 位（wide）的数值（immediate）

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
sf	1	0	1	0	0	1	0	1	hw		imm16																Rd
1(64位系统)	1	0	1	0	0	1	0	1	00		00000 00000000 001（16 bit 的数值）																00001 （寄存器1）
	opc

所以，最终的 32 位二进制为 1/1010010 1/00/00000 00000000 001/00001，十六进制为 d280 0021，即汇编的 mov x1, #0x1，将数值 1 存到寄存器1 中。

1/1010010 1/00/00000 00000000 010/00010 同理，十六进制为 d280 0042，即汇编的 mov x2, #0x2，将数值 2 存到寄存器2 中。

ADD

读懂了 MOV 之后，ADD 也就比较容易看到。

ADD 指令的含义是：

add r0,r1,r2 // load r0 with r1+r2

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
sf	0	0	0	1	0	1	1	shift		0	Rm					imm6						Rn					Rd
1(64位系统)	0	0	0	1	0	1	1	00		0	00010(寄存器2)					000000(数值0)						00001(寄存器1)					00000(寄存器0)
	op	S

二进制 1/0001011 00/0/00010 000000/00 001/00000 的十六进制为 8b02 0020，即汇编的 add x0, x1, x2，将寄存器 1 和寄存器 2 相加，并赋值给寄存器 0。

RET

RET 就当做个巩固练习吧。

31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
1	1	0	1	0	1	1	0	0	1	0	1	1	1	1	1	0	0	0	0	0	0	Rn					0	0	0	0	0
1	1	0	1	0	1	1	0(1Byte)	0	1	0	1	1	1	1	1(1Byte)	0	0	0	0	0	0	011 110					0	0	0	0	0
							Z		op											A	M						Rm

二进制 11010110 01011111 00000/011 110/00000 的十六进制为 d65f 03c0（其中 rn 的含义可以参考其他资料查看，我没继续查了。。。），即汇编 ret 命令。