popcount, i.e. “population count”, is a CPU instruction that counts the number of 1-bits in a machine word. With the -mpopcnt option, compilers like GCC and Clang will generate this instruction when needed. For example, std::bitset::count (or std::popcount since C++20). See this example. There’s even a more powerful instruction extension called AVX512VPOPCNTDQ, which process 512bits simultaneously. You can enable it via -mavx512vpopcntdq.
The question I want to share today is what if our platform doesn’t support these instructions? Only software-level algorithms can solve it.
The key point here is n &= n - 1; Whenever executing this operation, the rightmost 1-bit and 0-bits after are inverted, so, the number of it is the count of 1-bits. See an example with n = 13 (1101),
For a 32-bit integer, grouped by 2-bit and sum the 1-bits in each group separately, and for 4-bit, 8-bit… finally the 32-bit as a whole. It’s a divide-and-conquer strategy. The summing problem for 32bits is divided into 2 summing problems for 16bits, then 4 problems for 8 bits, and so on. There are only $log_2(32) = 5$ steps.