Counting Leading Zeros in a Byte

Counting bits is simple in concept, but it presents some interesting nuances.

Aug 10, 2025

Consider this problem: how many leading zeros does an unsigned byte value contain?

The most obvious approach is to test each bit from most significant to least significant until you find a set bit. In C++:

int clz(unsigned char x) {
    int n = 0;
    for (int i = 7; i >= 0; --i) {
        if ((x & (1 << i)) == 0) ++n;
        else break;
    }
    return n;
}

The x86-64 assembly generated by GCC 15.1 -O2 uses the BT — Bit Test instruction, so there is no need for shifting:

        movzx   edi, dil     ; Zero-extend DIL (x) into EDI
        xor     eax, eax     ; Clear EAX; this will accumulate the zero count
.L3:
        mov     edx, 7       ; Start with bit index 7 (MSB of an 8-bit value)
        sub     edx, eax     ; Adjust index downward by the counted zeros
        bt      edi, edx     ; Test bit [EDX] of EDI: CF = value of that bit
        jc      .L1          ; If CF=1 we’ve found the first ‘1’ - exit
        add     eax, 1       ; Otherwise, increment zero count
        cmp     eax, 8       ; Have we tried all 8 bits?
        jne     .L3          ; If not, loop back and test the next lower bit
.L1:
        ret                  ; Return; EAX holds the count of leading zero bits

Of course, there are ways to make this function faster1 . But let’s see what kind of support we can find in C and C++ standard libraries nowadays.

With C programming language, the C23 standard introduced bit utilities2 declared in header stdbit.h. The interesting one is stdc_leading_zeros_uc which takes an unsigned char as the argument and returns the number of leading zero bits. Unfortunately, it takes time to implement standards, and at this point I don’t have access to any C compiler that supports the standard bit utilities. In the meantime, it is possible to use compiler intrinsics which are compiler-specific.

On another hand, C++ introduced the header file <bit> in the C++20 revision of the standard which is well supported by major compilers. Among other things, the header declares the function std::countl_zero which we can use to count leading zero bits in a byte3.

Here is our function rewritten to use std::countl_zero:

#include <bit>

int clz(unsigned char x) {
    return std::countl_zero(x);
}

The generated assembly ( x86-64 g++ 15.1 -O2 -std=c++20) is obviously pretty different:

    mov     eax, 8                ; assume 8 leading zeros
    and     edi, 255              ; mask to low 8 bits
    je      .L1                   ; if zero, return 8
    bsr     edi, edi              ; find index of highest set bit
    xor     edi, 31               ; compute (31 - index)
    lea     eax, [rdi-24]         ; final count = 7 - index
.L1:
    ret

The bulk of the work is done by the BSR — Bit Scan Reverse instruction, which stores the index of the most significant set bit to the destination operand. Note how we had to separately check for a zero byte first - the result of BSR is undefined if no bits are set.

Can we do even better? The LZCNT — Count the Number of Leading Zero Bits was introduced as a part of Bit manipulation instructions sets (BMI1) extension for x86-x64. Obviously, not all processors support BMI1 extensions, but there are ways to nudge the compiler to use them. For GCC, we can use the -mlzcnt option.

Here is the generated assembly for the same C++ code (x86-64 g++ 15.1 -O2 -std=c++20 -mlzcnt):

    movzx   edi, dil              ; zero-extend input
    xor     eax, eax              ; clear count
    mov     edx, 8                ; fallback for zero input
    lzcnt   eax, edi              ; raw count in EAX
    sub     eax, 24               ; adjust for 8-bit domain
    test    edi, edi              ; set ZF if input was zero
    cmove   eax, edx              ; if zero, return 8
    ret

Here, lzcnt is used instead of bsr. There is some additional work because we are looking for 8-bit bytes whereas lzcnt works with words, but other than that it is straightforward.

Of course, once we use the -mlzcnt option, the code is not going to work on CPUs that do not support BMI1 extensions4. Would it be worth checking the CPU capabilities at runtime and applying lzcnt if possible but falling back to bsr if not? Microsoft VC++ developers seem to think so5. Here is the assembly for the same code generated with MSVC v19 x64 /O2 /std:c++20

    cmp     DWORD PTR __isa_available, 5
    movzx   eax, cl
    jl      fallback              ; if no BMI1, use fallback

    ; LZCNT path
    lzcnt   cx, ax
    movzx   eax, cx
    sub     eax, 8
    ret

fallback:
    bsr     ecx, eax
    jne     nonzero
    mov     eax, 8               ; input was zero
    ret

nonzero:
    mov     eax, 7
    sub     eax, ecx             ; 7 – MSB index
    ret

As we can see, Microsoft’s implementation of the C++ Standard Library checks the value of __isa_available variable6 to check for support of BMI1 extensions. The branch is easily optimized by modern branch predictors, as the value remains the same over the lifetime of a process.

Some approaches are listed in the article: Fast, Deterministic, and Portable Count Leading Zeros (CLZ) Be sure to check out the comments.

The official proposal can be found here: N3022 Modern Bit Utilities

For a more detailed description of the function, I suggest the article: Understanding std::countl_zero in C++20

There is some confusion coming from the fact that lzcnt is encoded as rep bsr therefore if lzcnt is not present bsr will be executed instead. It will not give the same result, though, and it is misleading to expect the machine code - level graceful fallback to bsr if lzcnt is missing.

A very interesting discussion on the topic: <bit>: Is the __isa_available check for lzcnt worth the cost?

__isa_available is defined in the C runtime library but is not officially documented.

Programming at the right level

Counting Leading Zeros in a Byte

Counting bits is simple in concept, but it presents some interesting nuances.