GPGPU with WebGL: solving Laplace’s equation

This is the first post in what will hopefully be a series of posts exploring how to use WebGL to do GPGPU (General-purpose computing on graphics processing units). In this installment we will solve a partial differential equation using WebGL, the Laplace’s equation more specifically.

Discretizing the Laplace’s equation

The Laplace’s equation, \nabla^2 \phi = 0, is one of the most ubiquitous partial differential equations in physics. It appears in lot of areas, including electrostatics, heat conduction and fluid flow.

To get a numerical solution of a differential equation, the first step is to replace the continuous domain by a lattice and the differential operators with their discrete versions. In our case, we just have to replace the Laplacian by its discrete version:

\displaystyle \nabla^2 \phi(x) = 0 \rightarrow \frac{1}{h^2}\left(\phi_{i-1\,j} + \phi_{i+1\,j} + \phi_{i\,j-1} + \phi_{i\,j+1} - 4\phi_{i\,j}\right) = 0,

where h is the grid size.

If we apply this equation at all internal points of the lattice (the external points must retain fixed values if we use Dirichlet boundary conditions) we get a big system of linear equations whose solution will give a numerical approximation to a solution of the Laplace’s equation. Of the various methods to solve big linear systems, the Jacobi relaxation method seems the best fit to shaders, because it applies the same expression at every lattice point and doesn’t have dependencies between computations. Applying this method to our linear system, we get the following expression for the iteration:

\displaystyle \phi_{i\,j}^{(k+1)} = \frac{1}{4}\left(\phi_{i-1\,j}^{(k)} + \phi_{i+1\,j}^{(k)} + \phi_{i\,j-1}^{(k)} + \phi_{i\,j+1}^{(k)}\right),

where k is a step index.

Solving the discretized problem using WebGL shaders

If we use a texture to represent the domain and a fragment shader to do the Jacobi relaxation steps, the shader will follow this general pseudocode:

  1. Check if this fragment is a boundary point. If it’s one, return the previous value of this point.
  2. Get the four nearest neighbors’ values.
  3. Return the average of their values.

To flesh out this pseudocode, we need to define a specific representation for the discretized domain. Taking into account that the currently available WebGL versions don’t support floating point textures, we can use 32 bits RGBA fragments and do the following mapping:

R: Higher byte of \phi.
G: Lower byte of \phi.
B: Unused.
A: 1 if it’s a boundary value, 0 otherwise.

Most of the code is straightforward, but doing the multiprecision arithmetic is tricky, as the quantities we are working with behave as floating point numbers in the shaders but are stored as integers. More specifically, the color numbers in the normal range, [0.0, 1.0], are multiplied by 255 and rounded to the nearest byte value when stored at the target texture.

My first idea was to start by reconstructing the floating point numbers for each input value, do the required operations with the floating numbers and convert the floating point numbers to color components that can be reliably stored (without losing precision). This gives us the following pseudocode for the iteration shader:

// wc is the color to the "west", ec is the color to the "east", ...
float w_val = wc.r + wc.g / 255.0;
float e_val = ec.r + ec.g / 255.0;
// ...
float val = (w_val + e_val + n_val + s_val) / 4.0;
float hi = val - mod(val, 1.0 / 255.0);
float lo = (val - hi) * 255.0;
fragmentColor = vec4(hi, lo, 0.0, 0.0);

The reason why we multiply by 255 in place of 256 is that we need val_lo to keep track of the part of val that will be lost when we store it as a color component. As each byte value of a discrete color component will be associated with a range of size 1/255 in its continuous counterpart, we need to use the “low byte” to store the position of the continuous component within that range.

Simplifying the code to avoid redundant operations, we get:

float val = (wc.r + ec.r + nc.r + sc.r) / 4.0 +
	(wc.g + ec.g + nc.g + sc.g) / (4.0 * 255.0);
float hi = val - mod(val, 1.0 / 255.0);
float lo = (val - hi) * 255.0;
fragmentColor = vec4(hi, lo, 0.0, 0.0);

The result of running the full code, implemented in GLSL, is:

Solving the Laplace's equation using a 32x32 grid. Click the picture to see the live solving process (if your browser supports WebGL).

As can be seen, it has quite low resolution but converges fast. But if we just crank up the number of points, the convergence gets slower:

Incompletely converged solution in a 512x512 grid. Click the picture to see a live version.

How can we reconcile these approaches?


The basic idea behind multigrid methods is to apply the relaxation method on a hierarchy of increasingly finer discretizations of the problem, using in each step the coarse solution obtained in the previous grid as the “starting guess”. In this mode, the long wavelength parts of the solution (those that converge slowly in the finer grids) are obtained in the first coarse iterations, and the last iterations just add the finer parts of the solution (those that converge relatively easily in the finer grids).

The implementation is quite straightforward, giving us fast convergence and high resolution at the same time:

Multigrid solution using grids from 8x8 to 512x512. Click the picture to see the live version.


It’s quite viable to use WebGL to do at least basic GPGPU tasks, though it is, in a certain sense, a step backward in time, as there is no CUDA, floating point textures or any feature that helps when working with non-graphic problems: you are on your own. But with the growing presence of WebGL support in modern browsers, it’s an interesting way of partially accessing the enormous computational power present in modern video cards from any JS application, without requiring the installation of a native application.

In the next posts we will explore other kinds of problem-solving where WebGL can provide a great performance boost.


Compact string tables in C


The simplest way to store a string table in C is to follow this “pattern”:

/* in a header file */
typedef enum
  /* add more string ids here */
} str_id_t;

/* in a source file */
static const char str_table[STR_MAX_LEN + 1][] = {
  "This is the first message.",
  "This is the second message.",
  /* add more strings here */

This method is unarguably very simple and makes accessing strings very time efficient, as getting the string pointer only requires a multiplication and an addition. But the disadvantages are substantial:

  • the ids for each string are not positioned near the strings, making synchronization errors more likely;
  • the same storage space is used for all strings, leading to massive memory waste in most cases.

The rest of this post will be dedicated to explore an alternative method that solves these problems at the expense of requiring an initialization step and making string accesses a bit slower.

The method

See this update.

The main idea is to exploit the fact that a macro can be undefined and redefined to make a single macro expression take different values. In that way, an expression such as

MSG( str_id_first, "This is the first message." )

can be evaluated first as defining an enumeration value, and later as defining a string constant. This allows us to avoid the first problem, making the synchronization between the string ids an their associated string constants much easier.

To solve the second problem we can concatenate all the strings and make the accesses via an offset table. We can fill the sizes of each element using the sizeof operator over the string constants, but computing the offsets will require a runtime initialization step to do the necessary additions.

Putting together these two ideas we get:

Header file – str_table.h
#include <stdlib.h>

#if !defined(STR_TABLE_NORMAL) && !defined(STR_TABLE_STR_CONSTS) &&\

#if defined(STR_TABLE_NORMAL)

/* defines ids */
#define MSG(id,str ) id,
typedef enum

#elif defined(STR_TABLE_STR_CONSTS)

/* defines string constants */
#define MSG(id, str) str
static const char _table[] =

#elif defined(STR_TABLE_STR_OFFSETS)

/* defines string offsets */
#define MSG(id, str) sizeof(str) - 1,
static size_t _offsets[] = {


MSG(str_first, "This is the first message.")
MSG(str_second, "This is the second message.")

#if defined(STR_TABLE_NORMAL)

} str_id_t;

void str_table_init(void);
void str_table_get(char* buffer, size_t buffer_size, str_id_t str_id);

#elif defined(STR_TABLE_STR_CONSTS)


#elif defined(STR_TABLE_STR_OFFSETS)



#undef MSG
Source file – str_table.c
#include "str_table.h"

#include "str_table.h"

#include "str_table.h"

#include <string.h>

void str_table_init(void)
  size_t i;
  for (i = 1; i < sizeof(_offsets) / sizeof(_offsets[0]); i++)
    _offsets[i] += _offsets[i-1];

void str_table_get(char* buffer, size_t buffer_size, str_id_t str_id)
  if (_offsets[str_id+1] - _offsets[str_id] >= buffer_size)
  memcpy(buffer, &_table[_offsets[str_id]],
         _offsets[str_id+1] - _offsets[str_id]);
  buffer[_offsets[str_id+1] - _offsets[str_id]] = '\0';

We cannot run a multiple file C program online (AFAIK), but we can simulate the inclusions manually and run it at Codepad, where it gives this output:

This is the first message.
This is the second message.

Edit 2/14 21:30

Reading this comment by Arseny Kapoulkine I realized that my previous solution is, in fact, badly overengineered and wasteful. 😀 In fact, for most purposes, we can avoid using enums at all and just use this well known solution (though it’s not really a string table…):

Header file – str_table.h
#ifdef STR_TABLE_C
#define MSG( id, str ) extern char id[sizeof(str)];
#define MSG( id, str ) char id[sizeof(str)] = str;

MSG(str_first, "This is the first message.")
MSG(str_second, "This is the second message.")
Source file – str_table.c
#define STR_TABLE_C
#include "str_table.h"

But if we want to be able to iterate through the string table, Arseny’s functional solution is a good solution (though it’s hard to follow for people like me that doesn’t know functional programming very well :-D).