Counting Leading Zeros in a Byte
Counting bits is simple in concept, but it presents some interesting nuances.
Consider this problem: how many leading zeros does an unsigned byte value contain?
The most obvious approach is to test each bit from most significant to least significant until you find a set bit. In C++:
int clz(unsigned char x) {
int n = 0;
for (int i = 7; i >= 0; --i) {
if ((x & (1 << i)) == 0) ++n;
else break;
}
return n;
}
The x86-64 assembly generated by GCC 15.1 -O2 uses the BT — Bit Test instruction, so there is no need for shifting:
movzx edi, dil ; Zero-extend DIL (x) into EDI
xor eax, eax ; Clear EAX; this will accumulate the zero count
.L3:
mov edx, 7 ; Start with bit index 7 (MSB of an 8-bit value)
sub edx, eax ; Adjust index downward by the counted zeros
bt edi, edx ; Test bit [EDX] of EDI: CF = value of that bit
jc .L1 ; If CF=1 we’ve found the first ‘1’ - exit
add eax, 1 ; Otherwise, increment zero count
cmp eax, 8 ; Have we tried all 8 bits?
jne .L3 ; If not, loop back and test the next lower bit
.L1:
ret ; Return; EAX holds the count of leading zero bits
Of course, there are ways to make this function faster1 . But let’s see what kind of support we can find in C and C++ standard libraries nowadays.
With C programming language, the C23 standard introduced bit utilities2 declared in header stdbit.h
. The interesting one is stdc_leading_zeros_uc which takes an unsigned char
as the argument and returns the number of leading zero bits. Unfortunately, it takes time to implement standards, and at this point I don’t have access to any C compiler that supports the standard bit utilities. In the meantime, it is possible to use compiler intrinsics which are compiler-specific.
On another hand, C++ introduced the header file <bit> in the C++20 revision of the standard which is well supported by major compilers. Among other things, the header declares the function std::countl_zero which we can use to count leading zero bits in a byte3.
Here is our function rewritten to use std::countl_zero:
#include <bit>
int clz(unsigned char x) {
return std::countl_zero(x);
}
The generated assembly ( x86-64 g++ 15.1 -O2 -std=c++20) is obviously pretty different:
mov eax, 8 ; assume 8 leading zeros
and edi, 255 ; mask to low 8 bits
je .L1 ; if zero, return 8
bsr edi, edi ; find index of highest set bit
xor edi, 31 ; compute (31 - index)
lea eax, [rdi-24] ; final count = 7 - index
.L1:
ret
The bulk of the work is done by the BSR — Bit Scan Reverse instruction, which stores the index of the most significant set bit to the destination operand. Note how we had to separately check for a zero byte first - the result of BSR is undefined if no bits are set.
Can we do even better? The LZCNT — Count the Number of Leading Zero Bits was introduced as a part of Bit manipulation instructions sets (BMI1) extension for x86-x64. Obviously, not all processors support BMI1 extensions, but there are ways to nudge the compiler to use them. For GCC, we can use the -mlzcnt option.
Here is the generated assembly for the same C++ code (x86-64 g++ 15.1 -O2 -std=c++20 -mlzcnt):
movzx edi, dil ; zero-extend input
xor eax, eax ; clear count
mov edx, 8 ; fallback for zero input
lzcnt eax, edi ; raw count in EAX
sub eax, 24 ; adjust for 8-bit domain
test edi, edi ; set ZF if input was zero
cmove eax, edx ; if zero, return 8
ret
Here, lzcnt is used instead of bsr. There is some additional work because we are looking for 8-bit bytes whereas lzcnt works with words, but other than that it is straightforward.
Of course, once we use the -mlzcnt option, the code is not going to work on CPUs that do not support BMI1 extensions4. Would it be worth checking the CPU capabilities at runtime and applying lzcnt if possible but falling back to bsr if not? Microsoft VC++ developers seem to think so5. Here is the assembly for the same code generated with MSVC v19 x64 /O2 /std:c++20
cmp DWORD PTR __isa_available, 5
movzx eax, cl
jl fallback ; if no BMI1, use fallback
; LZCNT path
lzcnt cx, ax
movzx eax, cx
sub eax, 8
ret
fallback:
bsr ecx, eax
jne nonzero
mov eax, 8 ; input was zero
ret
nonzero:
mov eax, 7
sub eax, ecx ; 7 – MSB index
ret
As we can see, Microsoft’s implementation of the C++ Standard Library checks the value of __isa_available variable6 to check for support of BMI1 extensions. The branch is easily optimized by modern branch predictors, as the value remains the same over the lifetime of a process.
Some approaches are listed in the article: Fast, Deterministic, and Portable Count Leading Zeros (CLZ) Be sure to check out the comments.
The official proposal can be found here: N3022 Modern Bit Utilities
For a more detailed description of the function, I suggest the article: Understanding std::countl_zero in C++20
There is some confusion coming from the fact that lzcnt
is encoded as rep bsr
therefore if lzcnt
is not present bsr
will be executed instead. It will not give the same result, though, and it is misleading to expect the machine code - level graceful fallback to bsr
if lzcnt
is missing.
A very interesting discussion on the topic: <bit>: Is the __isa_available check for lzcnt worth the cost?
__isa_available is defined in the C runtime library but is not officially documented.