# Quines [remix]

As the quines in the previous post were criticized as boring and ordinary :-P, I did a remix:

#include <stdio.h>
#define s char s[]
s="#if 0\nimport json;r=json.dumps\nprint'#include <stdio.h>\\n#define s char s[]\\ns=%s;\\n%s'%(r(s),s)\n\"\"\" \"\n#elif 1\n#undef s\nint main(void)\n{\n  char *t = s;\n  printf(\"#include <stdio.h>\\n#define s char s[]\\ns=\\\"\");\n  while (*t)\n  {\n    if (*t == '\\n')\n      printf(\"\\\\n\");\n    else if (*t == '\"')\n      printf(\"\\\\\\\"\");\n    else if (*t == '\\\\')\n      printf(\"\\\\\\\\\");\n    else\n      printf(\"%c\", *t);\n    t++;\n  }\n  printf(\"\\\";\\n%s\\n\", s);\n  return 0;\n}\n#elif 0\n\" \"\"\"\n#endif";
#if 0
import json;r=json.dumps
print'#include <stdio.h>\n#define s char s[]\ns=%s;\n%s'%(r(s),s)
""" "
#elif 1
#undef s
int main(void)
{
char *t = s;
printf("#include <stdio.h>\n#define s char s[]\ns=\"");
while (*t)
{
if (*t == '\n')
printf("\\n");
else if (*t == '"')
printf("\\\"");
else if (*t == '\\')
printf("\\\\");
else
printf("%c", *t);
t++;
}
printf("\";\n%s\n", s);
return 0;
}
#elif 0
" """
#endif


You can check that it works on the web or by downloading the file and testing it:

$python polyquine.c | diff polyquine.c -$ gcc -ansi -pedantic -Wall polyquine.c -o polyquine && ./polyquine | diff polyquine.c -


I’m aware that there are some impressive examples out there, but I haven’t analyzed them to avoid spoiling the fun.

# Quines

#### Introduction

One interesting programming exercise is to write a quine, a program that outputs its own source code (excluding empty programs!). If we naively try to write it just by using the print statement we will get into an infinite regression:

print 'print \' print \\\'print \\\\\\\'...


The basic problem is that the instructions required to write other instructions lead to an infinite recursion. So how can we avoid this problem?

#### The trick

One general way to do it is the one used by Von Neumann to make his abstract self-replicating machines and by nature itself. The trick is to handle the same instructions in two different ways: as instructions to be followed and as data to be copied.

#### Writing a C quine

As we will need to access the same data in two different ways, we must define a variable to avoid duplicating it:

char s[] = "<data will go here>";


It’s easy to write the code that prints everything before the data,

int main(void)
{
printf("char s[] = \"");


but now we need to print the data as represented inside the string. Fortunately, of the characters we use, only the newline, the quote and the backlash require escape sequences:

  char *t = s;
while (*t)
{
if (*t == '\n')
printf("\\n");
else if (*t == '"')
printf("\\\"");
else if (*t == '\\')
printf("\\\\");
else
printf("%c", *t);
t++;
}


Finally, we print the data as a normal string and exit main():

  printf("%s", s);
return 0;
}


This program is finished with exception of filling in the data in variable s. As this is quite tedious to do by hand, we will do it using a program:

char s[] = "\";\nint main(void)\n{\n  printf(\"char s[] = \\\"\");\n  char *t = s;\n  while (*t)\n  {\n    if (*t == '\\n')\n      printf(\"\\\\n\");\n    else if (*t == '\"')\n      printf(\"\\\\\\\"\");\n    else if (*t == '\\\\')\n      printf(\"\\\\\\\\\");\n    else\n      printf(\"%c\", *t);\n    t++;\n  }\n  printf(\"%s\", s);\n  return 0;\n}\n";


Putting it all together, we get a program that prints its own source:

char s[] = "\";\nint main(void)\n{\n  printf(\"char s[] = \\\"\");\n  char *t = s;\n  while (*t)\n  {\n    if (*t == '\\n')\n      printf(\"\\\\n\");\n    else if (*t == '\"')\n      printf(\"\\\\\\\"\");\n    else if (*t == '\\\\')\n      printf(\"\\\\\\\\\");\n    else\n      printf(\"%c\", *t);\n    t++;\n  }\n  printf(\"%s\", s);\n  return 0;\n}\n";
int main(void)
{
printf("char s[] = \"");
char *t = s;
while (*t)
{
if (*t == '\n')
printf("\\n");
else if (*t == '"')
printf("\\\"");
else if (*t == '\\')
printf("\\\\");
else
printf("%c", *t);
t++;
}
printf("%s", s);
return 0;
}


But, though it works in GCC, it’s not a strict C99 file:

cc1: warnings being treated as errors
prog.c: In function ‘main’:
prog.c:4: error: implicit declaration of function ‘printf’
prog.c:4: error: incompatible implicit declaration of built-in function ‘printf’


This can be fixed by including the stdio.h header:

#include <stdio.h>
char s[] = "\";\nint main(void)\n{\n  printf(\"#include <stdio.h>\\nchar s[] = \\\"\");\n  char *t = s;\n  while (*t)\n  {\n    if (*t == '\\n')\n      printf(\"\\\\n\");\n    else if (*t == '\"')\n      printf(\"\\\\\\\"\");\n    else if (*t == '\\\\')\n      printf(\"\\\\\\\\\");\n    else\n      printf(\"%c\", *t);\n    t++;\n  }\n  printf(\"%s\", s);\n  return 0;\n}\n";
int main(void)
{
printf("#include <stdio.h>\nchar s[] = \"");
char *t = s;
while (*t)
{
if (*t == '\n')
printf("\\n");
else if (*t == '"')
printf("\\\"");
else if (*t == '\\')
printf("\\\\");
else
printf("%c", *t);
t++;
}
printf("%s", s);
return 0;
}


#### Doing a Python quine

Doing a Python quine is much easier, because the built-in function repr() gives a string representation of an object, allowing us to skip most of the code of the C quine. Using it we can get this natural three line quine:

s = "print 's = %s' % repr(s)\nprint s"
print 's = %s' % repr(s)
print s


If we want to do code golfing, we can:

• Remove unnecesary spaces and line breaks.
• Printing the representation of the data and the data using only one print statement.

This give us a much shorter quine:

s="print's=%s;%s'%(repr(s),s)";print's=%s;%s'%(repr(s),s)


though still longer than the record one.

# Top k – Solution

[Answer to the problem in Top k.]

This was the problem:

Given a list of n numbers, get the top k ones in O(n) time. (Partial credit for o(n log n) solutions.)

##### n log k solution

The easiest way to solve this problem in o(n log n) time is to scan the n elements, keeping track of the greatest k elements seen up to the current element. Using a heap to store the top k elements, we get the following algorithm:

#define SWAP(type, a, b) {type swap_temp = a; a = b; b = swap_temp;}

static void minheap_pushdown(int *h, size_t hs, size_t i)
{
size_t j = 0;
if (2 * i + 2 < hs)
j = (h[2 * i + 1] < h[2 * i + 2]) ? 2 * i + 1 : 2 * i + 2;
else if (2 * i + 1 < hs)
j = 2 * i + 1;
if (j != 0 && h[j] < h[i])
{
SWAP(int, h[i], h[j]);
minheap_pushdown(h, hs, j);
}
}

static void minheap_raise(int *h, size_t i)
{
if (i == 0)
return;
if (h[i] < h[(i - 1) / 2])
{
SWAP(int, h[i], h[(i - 1) / 2]);
minheap_raise(h, (i - 1) / 2);
}
}

void top_k_nlogk_cp(int *top_k, size_t k, const int *l, size_t n)
{
size_t i;

for (i = 0; i < k; i++)
{
top_k[i] = l[i];
minheap_raise(top_k, i);
}

for (i = k; i < n; i++)
{
if (l[i] > top_k[0])
{
top_k[0] = l[i];
minheap_pushdown(top_k, k, 0);
}
}
}


As minheap_pushdown() and minheap_raise() have O(log k) cost, the total cost of this solution will be O(n log k). This is better than O(n log n) if k grows slower than any power of n (if k is constant or O(log n), for example).

The previous solution is enough to get partial credit, but we can do better 😀

##### Linear time solution

Quicksort is based on a linear time partition operation. This operation starts with an array and an element of this array, called pivot, and ends with a partially sorted array, having the pivot in his sorted position, smaller elements before it and bigger ones after it. Then doing a single partition operation would suffice… if we could guarantee that the chosen pivot is the (nk+1)-th element of the sorted array 😀

We cannot do that, but we can do something that is asymptotically equivalent: doing a quicksort-style recursion, though only in the partition that contains the desired pivot. The result is this (converting the tail recursion to iteration):

int *top_k_n_ip(int *l, size_t n, size_t k)
{
size_t lo, hi, pos, i, j;
int pivot;

lo = 0;
hi = n;
pos = n - k;

while (hi - lo > 1)
{
i = lo + 1;
j = hi - 1;
pivot = l[lo];

while (1)
{
while (i < hi && l[i] <= pivot)
i++;
while (j > lo && l[j] > pivot)
j--;
if (i > j)
break;
SWAP(int, l[i], l[j]);
}

SWAP(int, l[lo], l[j]);

if (j < pos)
lo = j + 1;
else if (j > pos)
hi = j;
else
break;
}

return l + n - k;
}


Now we need to check that this gives us a linear-time algorithm. The execution time of the body of the while (1) loop is bounded from above by C times hilo, where C is a constant. If we assume that the chosen pivot divides the l [ lo .. hi ] array section in two roughly equal parts (this assumption can be made rigorous by choosing the pivot carefully), we get

$T(n) \leq A + (B + C \cdot n) + (B + C \cdot n/2) + \hdots + (B + C \cdot 1)$

where A, B and C are constants.

Summing all the terms containing B and taking out all the C factors we get:

$T(n) \leq A + \lceil\log_2 n\rceil B + C (1 + \hdots + n/2 + n)$

$T(n) \leq A + \lceil\log_2 n\rceil B + 3 C n$

$T(n) = O(n)$

As we know that any correct algorithm must employ $\Omega(n)$ time (it needs to inspect all elements), we also know that this algorithm has $\Theta(n)$ time complexity. Now let’s check that against reality 😀

##### Benchmarks

We will benchmark the following algorithm implementations:

Execution times for all the algorithms when k is fixed at 30.

As k has a fixed value in the previous plot, the O(n log k) algorithm is effectively an O(n) algorithm. In fact, it performs better than the truly linear algorithms, probably due to cache effects.

Another thing that can be seen in these plots is the difficulty of seeing the difference between n and n log n in a log-log plot, but that is something to be expected.

Execution times for all the algorithms when k is 30% of n.

Here the qualitative behavior of the O(n log k) solution is the expected one. The C implementation of the O(n) algorithm looks surprisingly good, with a performance competitive to the C++ implementation included as part of the standard library.

# Top k

Given a list of n numbers, get the top k ones in O(n) time.

This problem is too easy to solve in O(n log n) time

def top_k_n_log_n(l, k):
sl = sorted(l)
return sl[-k:]


but o(n log n) solutions can be presented for partial credit. 😛

# Searching for bits in C

Some years ago, a friend asked me how to (efficiently) get the position of some bit with value 1 inside an (unsigned) integer. For example, taking 32 bit integers and taking the least significant bit (LSB) as the bit 0, these are valid answers:

0xa9e7da24 --> 5
0x1d56b8b0 --> 28
0x9459ffbb --> 0
0x9f0c2a38 --> 24


If we reduce the problem to a more specific one, finding the lowest bit with value 1 inside an integer, it’s easy to get a solution just by doing a bitwise AND against a moving mask:

size_t bitscan(uint32_t a)
{
size_t i;
for (i = 0; i < 32 && !(a & mask); i++)
return i;
}


but it’s relatively slow. Can we do it faster?

The answer is clearly yes if we are using x86 assembly, as this instruction set has a specific instruction for this purpose: Bit Scan Forward (BSF). But, even in C, we can do better by a three step process:

• First we isolate the least significant bit that is set by using: a & (-a).
• Then we take the residue of dividing this power of two by 37.
• Finally, we look up the associated bit position, using the residue obtained in the previous step as an index.

Why the first step works? We can start with the zero: doing a bitwise AND against -0 gives us just zero. As the zero doesn’t have any 1 bits to start with, we can say we’ve “isolated” the least significant one, though in a somewhat vacuous sense.

To check the nonzero integers, let’s define the bit expansion of an unsigned integer a:

$\displaystyle a = \sum_{i=0}^N a_i\,2^i$

Then the result of doing –a (remember we are dealing with unsigned integers) will be:

$\displaystyle -a = \sum_{i=0}^N (1-a_i)\,2^i + 1$

If the number is nonzero, there must a least significant nonzero bit. Let’s call its position M. Writing a & (-a) now:

$\displaystyle a\,\&\, (-a) = \left(\sum_{i=0}^N a_i\,2^i\right)\&\left(\sum_{i=0}^N (1-a_i)\,2^i + 1\right)$

$\displaystyle = \left(\sum_{i=M+1}^N a_i\,2^i + 1\cdot2^M + \sum_{i=0}^{M-1}0\cdot2^i\right)\&$

$\displaystyle \left(\sum_{i=M+1}^N (1-a_i)\,2^i + (1-1)\cdot2^M + \sum_{i=0}^{M-1}(1-0)\cdot2^i + 1\right)$ (definition of M)

$\displaystyle = \left(\sum_{i=M+1}^N a_i\,2^i + 2^M\right)\&\left(\sum_{i=M+1}^N (1-a_i)\,2^i + \sum_{i=0}^{M-1}2^i + 1\right)$

$\displaystyle = \left(\sum_{i=M+1}^N a_i\,2^i + 2^M\right)\&\left(\sum_{i=M+1}^N (1-a_i)\,2^i + 2^M\right)$ (geometric series sum)

$\displaystyle = \sum_{i=M+1}^N a_i(1-a_i)\,2^i + 2^M$ (AND bit by bit)

$\displaystyle = \sum_{i=M+1}^N 0\cdot 2^i + 2^M = 2^M$

we get the isolated Mth bit, as desired.

def get_modulo(n):
m = 1
while True:
if len(set(2**i % m for i in range(n))) == n:
return m
m += 1

if __name__ == '__main__':
print(get_modulo(32))


Now we just have to put the bit numbers in their associated positions inside the lookup table. We can calculate the table using the following Python code:

def get_lut(max_exp, modulo):
lut = [-1] * modulo
for i in range(max_exp + 1):
assert lut[2 ** i % modulo] == -1
lut[2 ** i % modulo] = i
return lut

def print_table(table):
print('{%s}' % ', '.join('%d' % t for t in table))

if __name__ == '__main__':
print_table(get_lut(32, 37))


Then, putting these two pieces together, we get the following C function:

int lut_bit_pos(uint32_t a)
{
const int bit_pos_lut[37] =
{
-1, 0, 1, 26, 2, 23, 27, 32, 3, 16,
24, 30, 28, 11, -1, 13, 4, 7, 17,
-1, 25, 22, 31, 15, 29, 10, 12, 6,
-1, 21, 14, 9, 5, 20, 8, 19, 18
};
return bit_pos_lut[(a & (-a)) % 37];
}


As we are working with 32 bit integers, we can do a full test… though not using Ideone (it exceeds the runtime limits).

##### Edited 5/30/11 – Bit sets

The original problem was to handle a set of task slots, each slot having two possible states: occupied or empty. This set required an efficient support of the following operations:

• Checking if a slot is occupied.
• Occupying/freeing a slot.
• Getting a free slot if one is available.

As this problem doesn’t ask us for the slot number, we can use an opaque type and internally codify the slot as its associated power of two:

typedef uint32_t set_t;
typedef int32_t set_aux_t;
typedef uint32_t slot_t;

#define NUM_SLOTS 32

#define EMPTY_SET ((set_t)-1)
#define FULL_SET ((set_t)0)

#define IS_VALID_SLOT(slot) ((slot)&&!((slot)&((slot)-1)))
#define SLOT_FROM_INDEX(slot_idx) (1<<(slot_idx))

#define GET_EMPTY_SLOT(set) ((set)&(-(set_aux_t)(set)))

#define SET_SLOT(set, slot) ((void)((set)&=~(slot)))
#define CLEAR_SLOT(set, slot) ((void)((set)|=(slot)))

#define IS_SLOT_SET(set, slot) (!((set)&(slot)))


We can check the implementation using Ideone.

# Intermezzo: permutations

As I haven’t progressed very much with my WebGL apps due to a variety of reasons/excuses (lack of willpower, a crash course on nuclear reactors engineering since the Fukushima nuclear crisis started, etc.), I decided to make a quick post showing a solution to a simple problem. In other words, the main purpose of this post is just to avoid leaving a month without posts 😀

##### The problem

Given an arithmetic expression without parentheses, insert them to get a specified result. For example, if the expression is 2+2/2 and the desired result is 2, there is a single solution: (2+2)/2. On the other hand, if the desired result is 0 there is no solution.

It’s easy to see the answers in simple cases like the last one. But it’s not so easy to see if we can get a result of 18 by adding parentheses to 18-6/2+4-1*2… so let’s write code to do the hard work for us.

##### A solution

The parentheses only change the order in which the different operators apply and the effects of any operation order can be obtained with some parenthesization. So, if we search for a permutation of the order in which the operators are applied that gives us the desired result, we are getting a solution for the original problem.

This new problem is not exactly the same as the original one by two reasons that can be exemplified as follows:

• Given the expression 2+2*2+2, the orders of operation 1-3-2 and 3-1-2 are associated to the same parenthesized expression: (2+2)*(2+2).
• The parenthesizations (2+2)+2, 2+(2+2) and 2+2+2 are equivalent due to the associativity of addition.

But, as this only gives us some redundant solutions that can be easily eliminated with some post-processing, it’s not a problem from the correctness point of view. Being able to remove this redundant solutions without needing to enumerate them would be a great thing performance wise, but it doesn’t seem easy and it’s not required for small expressions (10! < 107).

###### Permutations in lexicographic order

To avoid accidentally skipping some permutations, it’s useful to follow a specific order when enumerating them. In this section we will see how we can go through all the permutations of an array [0, 1, 2, …, n – 1] in lexicographic order.

Given some permutation of the array, that we will call a, the next one will probably share some prefix and will differ in the order of the last elements. As we cannot get a lexicographically greater permutation by permuting a suffix that is in reverse order (starting from the greater elements), we need to permute a suffix that is not in reverse order. Then we will make our first step to search for this suffix:

Find the largest index k such that a[k] < a[k + 1].

The specification of the largest index is equivalent to search for the minimal suffix to permute, something required to get a permutation that is lexicographically immediate. If this suffix cannot be found, we are at the last permutation.

Now we need to decide what to permute. It’s clear that we cannot get a greater permutation just by permuting the suffix a[k+1 .. n-1], as they are in reverse order. So we need to replace a[k] by some element and the obvious choice is to search for the immediately greater element in the suffix or, as the suffix a[k+1 .. n-1] is in reverse order:

Find the largest index l such that a[k] < a[l].

The lexicographically smallest permutation starting with the original a[l] can be obtained by appending all the other elements of the original suffix in ascending order. But, as the suffix is in descending order and l is the largest index such that a[k] < a[l], we can get the same result in an easier way:

Swap a[k] with a[l].
Reverse the sequence from a[k + 1] up to and including the final element a[n].

The final code in JS can be seen here.

###### Evaluating an arithmetic expression with different orders of operations

I started using an array whose members were alternately numbers and operators. The main problem is that, while it’s easy to go from an operator to the adjacent numbers in an array, this is not what we want to do. The desired behavior is to go from an operator to the operands, and they can be the result of a sequence of previous operations.

To solve this problem in a moderately efficient way, I adapted the parent node concept from the disjoint-set data structure. Each operation adds the result to the operator node and puts this node as the parent node of both operands, allowing efficient access to the value of the whole operand from any node that is included in it. The code that does this operations can be seen at the SVN repository of this blog.

##### Testing the solution

The whole solution can be tested at the same SVN repository, but let’s see some specific examples:

Input expression: 3+2*2-1*0.5
Input result 1: 3
Output: (3+2*2-1)*0.5
Input result 2: 6
Output: 3+2*(2-1*0.5)

Input expression: 18-6/2+4-1*2
Input result: 18
Output 1: ((18-6)/2+4-1)*2
Output 2: 18-(6/(2+4)-1)*2

# GPGPU with WebGL: solving Laplace’s equation

This is the first post in what will hopefully be a series of posts exploring how to use WebGL to do GPGPU (General-purpose computing on graphics processing units). In this installment we will solve a partial differential equation using WebGL, the Laplace’s equation more specifically.

##### Discretizing the Laplace’s equation

The Laplace’s equation, $\nabla^2 \phi = 0$, is one of the most ubiquitous partial differential equations in physics. It appears in lot of areas, including electrostatics, heat conduction and fluid flow.

To get a numerical solution of a differential equation, the first step is to replace the continuous domain by a lattice and the differential operators with their discrete versions. In our case, we just have to replace the Laplacian by its discrete version:

$\displaystyle \nabla^2 \phi(x) = 0 \rightarrow \frac{1}{h^2}\left(\phi_{i-1\,j} + \phi_{i+1\,j} + \phi_{i\,j-1} + \phi_{i\,j+1} - 4\phi_{i\,j}\right) = 0$,

where $h$ is the grid size.

If we apply this equation at all internal points of the lattice (the external points must retain fixed values if we use Dirichlet boundary conditions) we get a big system of linear equations whose solution will give a numerical approximation to a solution of the Laplace’s equation. Of the various methods to solve big linear systems, the Jacobi relaxation method seems the best fit to shaders, because it applies the same expression at every lattice point and doesn’t have dependencies between computations. Applying this method to our linear system, we get the following expression for the iteration:

$\displaystyle \phi_{i\,j}^{(k+1)} = \frac{1}{4}\left(\phi_{i-1\,j}^{(k)} + \phi_{i+1\,j}^{(k)} + \phi_{i\,j-1}^{(k)} + \phi_{i\,j+1}^{(k)}\right)$,

where $k$ is a step index.

##### Solving the discretized problem using WebGL shaders

If we use a texture to represent the domain and a fragment shader to do the Jacobi relaxation steps, the shader will follow this general pseudocode:

1. Check if this fragment is a boundary point. If it’s one, return the previous value of this point.
2. Get the four nearest neighbors’ values.
3. Return the average of their values.

To flesh out this pseudocode, we need to define a specific representation for the discretized domain. Taking into account that the currently available WebGL versions don’t support floating point textures, we can use 32 bits RGBA fragments and do the following mapping:

R: Higher byte of $\phi$.
G: Lower byte of $\phi$.
B: Unused.
A: 1 if it’s a boundary value, 0 otherwise.

Most of the code is straightforward, but doing the multiprecision arithmetic is tricky, as the quantities we are working with behave as floating point numbers in the shaders but are stored as integers. More specifically, the color numbers in the normal range, [0.0, 1.0], are multiplied by 255 and rounded to the nearest byte value when stored at the target texture.

My first idea was to start by reconstructing the floating point numbers for each input value, do the required operations with the floating numbers and convert the floating point numbers to color components that can be reliably stored (without losing precision). This gives us the following pseudocode for the iteration shader:

// wc is the color to the "west", ec is the color to the "east", ...
float w_val = wc.r + wc.g / 255.0;
float e_val = ec.r + ec.g / 255.0;
// ...
float val = (w_val + e_val + n_val + s_val) / 4.0;
float hi = val - mod(val, 1.0 / 255.0);
float lo = (val - hi) * 255.0;
fragmentColor = vec4(hi, lo, 0.0, 0.0);


The reason why we multiply by 255 in place of 256 is that we need val_lo to keep track of the part of val that will be lost when we store it as a color component. As each byte value of a discrete color component will be associated with a range of size 1/255 in its continuous counterpart, we need to use the “low byte” to store the position of the continuous component within that range.

Simplifying the code to avoid redundant operations, we get:

float val = (wc.r + ec.r + nc.r + sc.r) / 4.0 +
(wc.g + ec.g + nc.g + sc.g) / (4.0 * 255.0);
float hi = val - mod(val, 1.0 / 255.0);
float lo = (val - hi) * 255.0;
fragmentColor = vec4(hi, lo, 0.0, 0.0);


The result of running the full code, implemented in GLSL, is:

Solving the Laplace's equation using a 32x32 grid. Click the picture to see the live solving process (if your browser supports WebGL).

As can be seen, it has quite low resolution but converges fast. But if we just crank up the number of points, the convergence gets slower:

Incompletely converged solution in a 512x512 grid. Click the picture to see a live version.

How can we reconcile these approaches?

##### Multigrid

The basic idea behind multigrid methods is to apply the relaxation method on a hierarchy of increasingly finer discretizations of the problem, using in each step the coarse solution obtained in the previous grid as the “starting guess”. In this mode, the long wavelength parts of the solution (those that converge slowly in the finer grids) are obtained in the first coarse iterations, and the last iterations just add the finer parts of the solution (those that converge relatively easily in the finer grids).

The implementation is quite straightforward, giving us fast convergence and high resolution at the same time:

Multigrid solution using grids from 8x8 to 512x512. Click the picture to see the live version.

##### Conclusions

It’s quite viable to use WebGL to do at least basic GPGPU tasks, though it is, in a certain sense, a step backward in time, as there is no CUDA, floating point textures or any feature that helps when working with non-graphic problems: you are on your own. But with the growing presence of WebGL support in modern browsers, it’s an interesting way of partially accessing the enormous computational power present in modern video cards from any JS application, without requiring the installation of a native application.

In the next posts we will explore other kinds of problem-solving where WebGL can provide a great performance boost.