Some years ago, a friend asked me how to (efficiently) get the position of some bit with value 1 inside an (unsigned) integer. For example, taking 32 bit integers and taking the least significant bit (LSB) as the bit 0, these are valid answers:

0xa9e7da24 --> 5 0x1d56b8b0 --> 28 0x9459ffbb --> 0 0x9f0c2a38 --> 24

If we reduce the problem to a more specific one, finding the *lowest* bit with value 1 inside an integer, it’s easy to get a solution just by doing a bitwise AND against a moving mask:

size_t bitscan(uint32_t a) { size_t i; uint32_t mask = 1; for (i = 0; i < 32 && !(a & mask); i++) mask <<= 1; return i; }

but it’s relatively slow. Can we do it faster?

The answer is clearly yes if we are using x86 assembly, as this instruction set has a specific instruction for this purpose: Bit Scan Forward (BSF). But, even in C, we can do better by a three step process:

- First we isolate the least significant bit
*that is set*by using:*a*& (-*a*). - Then we take the residue of dividing this power of two by 37.
- Finally, we look up the associated bit position, using the residue obtained in the previous step as an index.

Why the first step works? We can start with the zero: doing a bitwise AND against -0 gives us just zero. As the zero doesn’t have any 1 bits to start with, we can say we’ve “isolated” the least significant one, though in a somewhat vacuous sense.

To check the nonzero integers, let’s define the bit expansion of an unsigned integer *a*:

Then the result of doing –*a* (remember we are dealing with unsigned integers) will be:

If the number is nonzero, there must a least significant nonzero bit. Let’s call its position *M*. Writing *a* & (-*a*) now:

(definition of *M*)

(AND bit by bit)

we get the isolated *M*^{th} bit, as desired.

def get_modulo(n): m = 1 while True: if len(set(2**i % m for i in range(n))) == n: return m m += 1 if __name__ == '__main__': print(get_modulo(32))

Now we just have to put the bit numbers in their associated positions inside the lookup table. We can calculate the table using the following Python code:

def get_lut(max_exp, modulo): lut = [-1] * modulo for i in range(max_exp + 1): assert lut[2 ** i % modulo] == -1 lut[2 ** i % modulo] = i return lut def print_table(table): print('{%s}' % ', '.join('%d' % t for t in table)) if __name__ == '__main__': print_table(get_lut(32, 37))

Then, putting these two pieces together, we get the following C function:

int lut_bit_pos(uint32_t a) { const int bit_pos_lut[37] = { -1, 0, 1, 26, 2, 23, 27, 32, 3, 16, 24, 30, 28, 11, -1, 13, 4, 7, 17, -1, 25, 22, 31, 15, 29, 10, 12, 6, -1, 21, 14, 9, 5, 20, 8, 19, 18 }; return bit_pos_lut[(a & (-a)) % 37]; }

As we are working with 32 bit integers, we can do a full test… though not using Ideone (it exceeds the runtime limits).

##### Edited 5/30/11 – Bit sets

The original problem was to handle a set of task slots, each slot having two possible states: occupied or empty. This set required an efficient support of the following operations:

- Checking if a slot is occupied.
- Occupying/freeing a slot.
*Getting a free slot if one is available.*

As this problem doesn’t ask us for the slot number, we can use an opaque type and internally codify the slot as its associated power of two:

typedef uint32_t set_t; typedef int32_t set_aux_t; typedef uint32_t slot_t; #define NUM_SLOTS 32 #define EMPTY_SET ((set_t)-1) #define FULL_SET ((set_t)0) #define IS_VALID_SLOT(slot) ((slot)&&!((slot)&((slot)-1))) #define SLOT_FROM_INDEX(slot_idx) (1<<(slot_idx)) #define GET_EMPTY_SLOT(set) ((set)&(-(set_aux_t)(set))) #define SET_SLOT(set, slot) ((void)((set)&=~(slot))) #define CLEAR_SLOT(set, slot) ((void)((set)|=(slot))) #define IS_SLOT_SET(set, slot) (!((set)&(slot)))

We can check the implementation using Ideone.