Some years ago, a friend asked me how to (efficiently) get the position of some bit with value 1 inside an (unsigned) integer. For example, taking 32 bit integers and taking the least significant bit (LSB) as the bit 0, these are valid answers:

0xa9e7da24 --> 5
0x1d56b8b0 --> 28
0x9459ffbb --> 0
0x9f0c2a38 --> 24

If we reduce the problem to a more specific one, finding the *lowest* bit with value 1 inside an integer, it’s easy to get a solution just by doing a bitwise AND against a moving mask:

size_t bitscan(uint32_t a)
{
size_t i;
uint32_t mask = 1;
for (i = 0; i < 32 && !(a & mask); i++)
mask <<= 1;
return i;
}

but it’s relatively slow. Can we do it faster?

The answer is clearly yes if we are using x86 assembly, as this instruction set has a specific instruction for this purpose: Bit Scan Forward (BSF). But, even in C, we can do better by a three step process:

- First we isolate the least significant bit
*that is set* by using: *a* & (-*a*).
- Then we take the residue of dividing this power of two by 37.
- Finally, we look up the associated bit position, using the residue obtained in the previous step as an index.

Why the first step works? We can start with the zero: doing a bitwise AND against -0 gives us just zero. As the zero doesn’t have any 1 bits to start with, we can say we’ve “isolated” the least significant one, though in a somewhat vacuous sense.

To check the nonzero integers, let’s define the bit expansion of an unsigned integer *a*:

Then the result of doing –*a* (remember we are dealing with unsigned integers) will be:

If the number is nonzero, there must a least significant nonzero bit. Let’s call its position *M*. Writing *a* & (-*a*) now:

(definition of *M*)

(geometric series sum)

(AND bit by bit)

we get the isolated *M*^{th} bit, as desired.

We can look for the smallest divisor giving different residues for 2^{0}, 2^{1}, …, 2^{31} via a small piece of code:

def get_modulo(n):
m = 1
while True:
if len(set(2**i % m for i in range(n))) == n:
return m
m += 1
if __name__ == '__main__':
print(get_modulo(32))

Now we just have to put the bit numbers in their associated positions inside the lookup table. We can calculate the table using the following Python code:

def get_lut(max_exp, modulo):
lut = [-1] * modulo
for i in range(max_exp + 1):
assert lut[2 ** i % modulo] == -1
lut[2 ** i % modulo] = i
return lut
def print_table(table):
print('{%s}' % ', '.join('%d' % t for t in table))
if __name__ == '__main__':
print_table(get_lut(32, 37))

Then, putting these two pieces together, we get the following C function:

int lut_bit_pos(uint32_t a)
{
const int bit_pos_lut[37] =
{
-1, 0, 1, 26, 2, 23, 27, 32, 3, 16,
24, 30, 28, 11, -1, 13, 4, 7, 17,
-1, 25, 22, 31, 15, 29, 10, 12, 6,
-1, 21, 14, 9, 5, 20, 8, 19, 18
};
return bit_pos_lut[(a & (-a)) % 37];
}

As we are working with 32 bit integers, we can do a full test… though not using Ideone (it exceeds the runtime limits).

##### Edited 5/30/11 – Bit sets

The original problem was to handle a set of task slots, each slot having two possible states: occupied or empty. This set required an efficient support of the following operations:

- Checking if a slot is occupied.
- Occupying/freeing a slot.
*Getting a free slot if one is available.*

As this problem doesn’t ask us for the slot number, we can use an opaque type and internally codify the slot as its associated power of two:

typedef uint32_t set_t;
typedef int32_t set_aux_t;
typedef uint32_t slot_t;
#define NUM_SLOTS 32
#define EMPTY_SET ((set_t)-1)
#define FULL_SET ((set_t)0)
#define IS_VALID_SLOT(slot) ((slot)&&!((slot)&((slot)-1)))
#define SLOT_FROM_INDEX(slot_idx) (1<<(slot_idx))
#define GET_EMPTY_SLOT(set) ((set)&(-(set_aux_t)(set)))
#define SET_SLOT(set, slot) ((void)((set)&=~(slot)))
#define CLEAR_SLOT(set, slot) ((void)((set)|=(slot)))
#define IS_SLOT_SET(set, slot) (!((set)&(slot)))

We can check the implementation using Ideone.