You're right about truncating the constants, though I also believe you'll need to adjust the number of constants based on the bit size. From the Wikipedia article you linked on

Hamming Weight:

For processors lacking those features, the best solutions known are based on adding counts in a tree pattern.

The height of that tree depends on the number of bits at the bottom.

If you were to look at the first algorithm listed:

//types and constants used in the functions below

//uint64_t is an unsigned 64-bit integer variable type (defined in C99 version of C language)

const uint64_t m1 = 0x5555555555555555; //binary: 0101...

const uint64_t m2 = 0x3333333333333333; //binary: 00110011..

const uint64_t m4 = 0x0f0f0f0f0f0f0f0f; //binary: 4 zeros, 4 ones ...

const uint64_t m8 = 0x00ff00ff00ff00ff; //binary: 8 zeros, 8 ones ...

const uint64_t m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ...

const uint64_t m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones

const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3...

//This is a naive implementation, shown for comparison,

//and to help in understanding the better functions.

//This algorithm uses 24 arithmetic operations (shift, add, and).

int popcount64a(uint64_t x)

{

x = (x & m1 ) + ((x >> 1) & m1 ); //put count of each 2 bits into those 2 bits

x = (x & m2 ) + ((x >> 2) & m2 ); //put count of each 4 bits into those 4 bits

x = (x & m4 ) + ((x >> 4) & m4 ); //put count of each 8 bits into those 8 bits

x = (x & m8 ) + ((x >> 8) & m8 ); //put count of each 16 bits into those 16 bits

x = (x & m16) + ((x >> 16) & m16); //put count of each 32 bits into those 32 bits

x = (x & m32) + ((x >> 32) & m32); //put count of each 64 bits into those 64 bits

return x;

}

The constant

`m32` would by useless as a bitmask on 32-bit operations, as would the

`x >> 32` operation. Accounting for the halving/truncating of the constants, and removal of dead operations, the 32-bit equivalent should be:

const uint32_t m1 = 0x55555555; //binary: 0101...

const uint32_t m2 = 0x33333333; //binary: 00110011..

const uint32_t m4 = 0x0f0f0f0f; //binary: 4 zeros, 4 ones ...

const uint32_t m8 = 0x00ff00ff; //binary: 8 zeros, 8 ones ...

const uint32_t m16 = 0x0000ffff; //binary: 16 zeros, 16 ones ...

int popcount32a(uint32_t x)

{

x = (x & m1 ) + ((x >> 1) & m1 ); //put count of each 2 bits into those 2 bits

x = (x & m2 ) + ((x >> 2) & m2 ); //put count of each 4 bits into those 4 bits

x = (x & m4 ) + ((x >> 4) & m4 ); //put count of each 8 bits into those 8 bits

x = (x & m8 ) + ((x >> 8) & m8 ); //put count of each 16 bits into those 16 bits

x = (x & m16) + ((x >> 16) & m16); //put count of each 32 bits into those 32 bits

return x;

}

That is, one less tree operation for half the number of bits. Or one extra tree operation if you double the number of bits.

Well, I think I got wildly off topic on this thread