When building a parser, it’s often useful to be able to quickly detect occurrences of some character in a string. The fastest methods in x86 processors involve the usage of SSE/AVX, but I was interested in starting by building a portable version using only bitwise operations.

### Problem definition

Given a constant byte value `d`

, we want to create a function `get_matches()`

that takes an unsigned integer `b`

as a parameter and returns another of the same size. Each byte of the return value should be either `0x80`

if the byte of the same position in `b`

is equal to `d`

or `0x00`

otherwise.

For example, for `d`

equal to `0x20`

and `b`

being a 64 bit integer, we should have:

find_matches(0x1312202000200212) -> 0x0000808000800000
find_matches(0x0001020304050607) -> 0x0000000000000000
find_matches(0x0010203040506070) -> 0x0000800000000000

### Simple implementation

It’s often useful to start by writing a simple implementation of the desired function, to test it and clarify the problem:

template <uint8_t d, typename int_type>
int_type get_matches_naive(int_type b)
{
int_type ret = 0;
for (size_t i = 0; i < sizeof(int_type); i++)
if (((b >> i * CHAR_BIT) & 0xff) == d)
ret |= (int_type)0x80 << i * CHAR_BIT;
return ret;
}

But this implementation is of course not very efficient. How can we use bitwise instructions to our advantage?

### Bitwise implementation

We can start by getting an integer with zeroes only in those bytes where the bytes of `b`

are equal to `d`

. It’s easy to do that by using XOR against an integer mask containing repeated instances of `d`

. Applying it to the previous examples:

0x1312202000200212 ^ 0x2020202020202020 -> 0x3332000020002232
0x0001020304050607 ^ 0x2020202020202020 -> 0x2021222324252627
0x0010203040506070 ^ 0x2020202020202020 -> 0x2030001060704050

After that, we should translate the `0x00`

bytes to `0x80`

and the other values to `0x00`

. We can use subtraction from a mask composed by repeating instances of `0x80`

and then do a bitwise AND with the same mask to isolate the high bits of each byte:

(0x8080808080808080 - 0x3332000020002232) & 0x8080808080808080 -> 0x0000808000800000
(0x8080808080808080 - 0x2021222324252627) & 0x8080808080808080 -> 0x0000000000000000
(0x8080808080808080 - 0x2030001060704050) & 0x8080808080808080 -> 0x0000800000000000

It would seem we have solved our problem… but that’s not true. The solution works for all these examples because the subtracted bytes are smaller than `0x80`

. If we try an example with higher valued bytes,

(0x8080808080808080 - 0x20300010607040aa) & 0x8080808080808080 -> 0x0000800000000080

we get false positives and also borrows, that can affect other subtractions. The problem of the higher valued bytes can be solved by clearing the high bits of every input byte, doing an AND against a `0x7f`

mask, but that will make us unable to distinguish between `0x80`

and `0x00`

bytes.

It would seem we are just moving the problem from one place to the other, without solving it. But, in fact, we can easily remove the `0x80`

bytes by doing an AND with the result of doing a bitwise NOT of the original value, that used `0x00`

to indicate the matches.

The resulting function would be something like the following:

template <uint8_t d, typename int_type>
int_type get_matches_bitwise(int_type b)
{
// based on
// https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord
constexpr int_type mask_0x01 = (int_type)~0 / (int_type)0xff;
constexpr int_type mask_0x7f = (int_type)0x7f * mask_0x01;
constexpr int_type mask_0x80 = (int_type)0x80 * mask_0x01;
constexpr int_type mask_d = d * mask_0x01;
int_type matches_as_0x00 = mask_d ^ b;
int_type matches_as_0x80 = (mask_0x80 - (matches_as_0x00 & mask_0x7f)) &
~matches_as_0x00 & mask_0x80;
return matches_as_0x80;
}

### Testing the bitwise implementation

We have seen that complex bitwise expressions can have some hidden mistakes. How can we make sure this expression doesn’t have one?

For 16 bit integers it’s quite fast to prove it by just trying all possible input values, though it can be a bit tricky to arrange covering the input template parameter:

#include <cassert>
#include <climits>
#include <cstdint>
#include <cstdio>
template <uint8_t d, typename int_type>
int_type get_matches_naive(int_type b)
{
int_type ret = 0;
for (size_t i = 0; i < sizeof(int_type); i++)
if (((b >> i * CHAR_BIT) & 0xff) == d)
ret |= (int_type)0x80 << i * CHAR_BIT;
return ret;
}
template <uint8_t d, typename int_type>
int_type get_matches_bitwise(int_type b)
{
// based on
// https://graphics.stanford.edu/~seander/bithacks.html#HasLessInWord
constexpr int_type mask_0x01 = (int_type)~0 / (int_type)0xff;
constexpr int_type mask_0x7f = (int_type)0x7f * mask_0x01;
constexpr int_type mask_0x80 = (int_type)0x80 * mask_0x01;
constexpr int_type mask_d = d * mask_0x01;
int_type matches_as_0x00 = mask_d ^ b;
int_type matches_as_0x80 = (mask_0x80 - (matches_as_0x00 & mask_0x7f)) &
~matches_as_0x00 & mask_0x80;
return matches_as_0x80;
}
template <uint8_t d, typename int_type>
void test()
{
int_type b = 0;
do
{
assert((get_matches_naive<d, int_type>(b) ==
get_matches_bitwise<d, int_type>(b)));
} while(--b != 0);
}
template <uint8_t d, typename int_type>
struct aux
{
static void do_test()
{
test<d, int_type>();
aux<d - 1, int_type>::do_test();
}
};
template <typename int_type>
struct aux<0, int_type>
{
static void do_test()
{
test<0, int_type>();
}
};
int main()
{
aux<0xff, uint16_t>::do_test();
// TAKES TOO LONG FOR IDEONE
//aux<0xff, uint32_t>::do_test();
return 0;
}

32 bit integers take a bit more of time to test, but it’s still feasible to go over all of them in about an hour. But with 64 bit values it would take *many* years to go over all the possible inputs.

In the next post we will see how we can use Z3 to do that verification in less than a second.