Compact string tables in C

Introduction

The simplest way to store a string table in C is to follow this “pattern”:

/* in a header file */
typedef enum
{
  str_first,
  str_second,
  /* add more string ids here */
} str_id_t;

/* in a source file */
static const char str_table[STR_MAX_LEN + 1][] = {
  "This is the first message.",
  "This is the second message.",
  /* add more strings here */
}

This method is unarguably very simple and makes accessing strings very time efficient, as getting the string pointer only requires a multiplication and an addition. But the disadvantages are substantial:

  • the ids for each string are not positioned near the strings, making synchronization errors more likely;
  • the same storage space is used for all strings, leading to massive memory waste in most cases.

The rest of this post will be dedicated to explore an alternative method that solves these problems at the expense of requiring an initialization step and making string accesses a bit slower.

The method

See this update.

The main idea is to exploit the fact that a macro can be undefined and redefined to make a single macro expression take different values. In that way, an expression such as

MSG( str_id_first, "This is the first message." )

can be evaluated first as defining an enumeration value, and later as defining a string constant. This allows us to avoid the first problem, making the synchronization between the string ids an their associated string constants much easier.

To solve the second problem we can concatenate all the strings and make the accesses via an offset table. We can fill the sizes of each element using the sizeof operator over the string constants, but computing the offsets will require a runtime initialization step to do the necessary additions.

Putting together these two ideas we get:

Header file – str_table.h
#include <stdlib.h>

#if !defined(STR_TABLE_NORMAL) && !defined(STR_TABLE_STR_CONSTS) &&\
    !defined(STR_TABLE_STR_OFFSETS)
#define STR_TABLE_NORMAL
#endif

#if defined(STR_TABLE_NORMAL)

/* defines ids */
#define MSG(id,str ) id,
typedef enum
{

#elif defined(STR_TABLE_STR_CONSTS)

/* defines string constants */
#define MSG(id, str) str
static const char _table[] =

#elif defined(STR_TABLE_STR_OFFSETS)

/* defines string offsets */
#define MSG(id, str) sizeof(str) - 1,
static size_t _offsets[] = {
  0,

#endif

MSG(str_first, "This is the first message.")
MSG(str_second, "This is the second message.")

#if defined(STR_TABLE_NORMAL)

  str_id_last
} str_id_t;

void str_table_init(void);
void str_table_get(char* buffer, size_t buffer_size, str_id_t str_id);

#elif defined(STR_TABLE_STR_CONSTS)

;

#elif defined(STR_TABLE_STR_OFFSETS)

};

#endif

#undef MSG
Source file – str_table.c
#define STR_TABLE_NORMAL
#include "str_table.h"
#undef STR_TABLE_NORMAL

#define STR_TABLE_STR_CONSTS
#include "str_table.h"
#undef STR_TABLE_STR_CONSTS

#define STR_TABLE_STR_OFFSETS
#include "str_table.h"
#undef STR_TABLE_STR_OFFSETS

#include <string.h>

void str_table_init(void)
{
  size_t i;
  for (i = 1; i < sizeof(_offsets) / sizeof(_offsets[0]); i++)
    _offsets[i] += _offsets[i-1];
}

void str_table_get(char* buffer, size_t buffer_size, str_id_t str_id)
{
  if (_offsets[str_id+1] - _offsets[str_id] >= buffer_size)
    return;
  memcpy(buffer, &_table[_offsets[str_id]],
         _offsets[str_id+1] - _offsets[str_id]);
  buffer[_offsets[str_id+1] - _offsets[str_id]] = '\0';
}
Testing

We cannot run a multiple file C program online (AFAIK), but we can simulate the inclusions manually and run it at Codepad, where it gives this output:

This is the first message.
This is the second message.

Edit 2/14 21:30

Reading this comment by Arseny Kapoulkine I realized that my previous solution is, in fact, badly overengineered and wasteful. 😀 In fact, for most purposes, we can avoid using enums at all and just use this well known solution (though it’s not really a string table…):

Header file – str_table.h
#ifdef STR_TABLE_C
#define MSG( id, str ) extern char id[sizeof(str)];
#else
#define MSG( id, str ) char id[sizeof(str)] = str;
#endif

MSG(str_first, "This is the first message.")
MSG(str_second, "This is the second message.")
Source file – str_table.c
#define STR_TABLE_C
#include "str_table.h"

But if we want to be able to iterate through the string table, Arseny’s functional solution is a good solution (though it’s hard to follow for people like me that doesn’t know functional programming very well :-D).

Advertisement

2 thoughts on “Compact string tables in C

  1. Your solution is over engineered (no need to lump all strings in contiguous memory chunk and to build offset table) and restrictive (one header for each enum). Here’s how it should look:

    http://codepad.org/IkYzrbiu

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s