Performance Engineering Of Software Systems

03 Dec 2023 •

课程主页: link

Lec3. Bit Hacks

交换两个整数

int x = 1; int y = 2;

x = x ^ y;
y = x ^ y;
x = x ^ y;

求两个数的最小值，减少分支，但是未必能比O3优化好

int x = 1; int y = 2;

int min_value = y ^ ((x ^ y) & -(x < y)); // x < y 被隐式转成0或1

加法取模，限定 \(0 \le x < n\)，\(0 \le y < n\)，计算\((x + y) \ mod \ n\)

z = x + y;
result = z - (n & -(z >= n));

进位到最近的2的幂

uint64_t n;
--n;  // 处理n已经是2的幂的情况
n |= n >> 1;   // 整个过程就是把最高位的1一直铺到后面的位置
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
n |= n >> 32;
++n;

取最低位的1

r = x & (-x);

求bit为1的数量，但是可能不管如何都不如内置函数__builtin_popcount快

for (int r = 0; x != 0; r++) {
	x &= x - 1; // x消去最末尾的1
}

x86通用寄存器有多个名字，代表不同bits

%rax 8byte, %eax 低位4bytes, %ax 低位2个byte %al 最低位1byte %ah 最低位第二个byte

本课程和clang，objdump， perf一样使用AT&T语法。op A B 格式中B存放结果。

SSE/AVX support single precision and double precision scalar floating-point arithmetic
x87 instructions support single-, double-, and extended-precision scalar floating-point arithmetic

编译器一般偏好使用 SSE 指令，因为使用更简单.

SSE instructions use two-letter suffixes to encode the data type.

现代处理器一般都有 vector 硬件支持SIMD

Modern SSE instruction sets support vector operations on integer, single-precision, and doubleprecision floating-point values.
AVX instructions support vector operations on single-precision, and double-precision floatingpoint values.
AVX2 instructions add integer-vector operations to the AVX instruction set.
AVX-512 (AVX3) instructions increase the register length to 512 bits and provide new vector operations

SSE 指令使用128 bit XMM寄存器，一次最多两个操作数，AVX可以使用256-bit YMM寄存器，且一次最多有3个操作数

	SSE	AVX/AVX2
Floating-point	addpd	vaddpd (开头的v表示avx)
Integer	paddq (开头的p表示integer)	vpaddq

教学一般讲解5-stage流水线，实际上Intel可能有14-19 pipeline stages