Ian Forster-Lewis PrologPF homepage

Between 1995 and 1998 I researched parallel logic and functional programming at the Computer Laboratory of Cambridge University, supervised by William Clocksin. My research extended the prior work led by William on a particular parellelization technique suitable for Prolog programs.

My target environment for parallel computation was a multi-processor machine where the communication time between processing elements was many many multiples of the instruction cycle time (e.g. 1,000,000 times). The best example of a system of this type is the internet. It always surprises me that the bulk of the parallel processing research that is going on assumes some specialised machine with incredible communications fabric between the nodes, while the internet represents the most powerful computing machine ever thought of, if only we could harness the combined processing power.

There is other research and practical work going on to harness the power of the many machines connected to the internet, but this almost exclusively involves problems and programs which can be trivially divided into independent partial pieces of work to be assigned to each processor. Examples include scanning digitized radio signals for a pattern, or trial-and-error attempts to decrypt encrypted text. This isn't really parallel programming - it's simply sending the same program with different data to multiple machines. The research done at Cambridge tackles the complex problem of partitioning and distributing work where the original program was not written for this trivial approach.

The chart below shows the real achieved speedup for a particular program (placing irregular shapes called pentominoes on a rectangular grid). On the horizontal axis, G is the number of processors harnessed (1 to 30 general purpose Unix workstations in the Cambridge Computer Lab) while the vertical axis shows the factor by which the program was speeded up. With 30 processors used, the maximum speedup is times 17. While this might not seem anywhere near optimal, note that the partitioning of the work across the processors was automatic with the original program not being written to specially accomodate parallel processing.

My work built upon prior research guided by William Clocksin, with the previous best achieved result for the same problem illustrated by the lower line on the chart. The incremental contribution of my research was to

  1. Improve the work allocation technique to reduce the communication overhead. The technique is a derivative of the previous technique (Breadth-First Partitioning, BFP), called Splitting with Oracles and Kappa (SOK). SOK allows the work to be assigned more evenly, and more work to be issued in each assignment. If you want to know more you'll have to read the dissertation.
  2. In BFP, a parameter controlling the work allocation has to be chosen manually (i.e. the depth into the execution tree at which the breadth-first calculation is capped). The results shown in the chart for BFP are for a carefully selected optimal depth particular to the specific problem. The SOK algorithm has no such manually selected parameter, so partitioning of the work can be said to be truly automatic.
  3. The modified Abstract Machine needed to support the special instructions used in the earlier techniques was replaced by equivalent and much simplified source-level macros, enabling the research to be performed using any Prolog environment.
  4. A functional high-level programming language similar to typeless ML was implemented in pure Prolog compatible with the parallelization technique used, such that the technique could be extended to functional as well as logic programming.

There are groups using the internet as a parallel computer, e.g. The Search for Extra Terrestrial Intelligence, and some hacker groups. This effort generally calls the target environment a Beowulf computer, and as far as I can tell generally involves trivially partitionable problems, i.e. the distribution of the calculation is hard-coded into the application.

The research done at Cambridge tackles the much harder problem of automatically partitioning a program to run on multiple machines. We got some good results with an interesting technique using things called oracles and a partitioning algorithm called 'splitting with oracles and kappa'. The essential method is to treat the program as a great big bushy tree, and to chop that tree up it up and recursively distribute it for execution.

My dissertation is titled PrologPF, parallel logic and functions on the Delphi machine (1.4MB), the link is to the PDF, and the dissertation is also available as postscript (3.2Mb).

Further background is available (in chronological order):

The dissertation could equally be titled 'How to get a program to run really fast on the internet'. This work was part of the ongoing effort of the Delphi project. Not every program is amenable to this method of speedup (in fact, very few of them are), but the target problem set is a very interesting one. The technique requires the problem have an underlying very large and very bushy execution tree - AI problems often fit this category. The goal driving my research is to turn the internet into a great AI machine - SkyNet here we come.

Oracles

The idea is each path into the execution tree of a Prolog program is represented as a sequence of integers. E.g. for a binary tree if left=0 and right=1, then starting at the root a path into the tree going left-left-right-left-right would be [0,0,1,0,1] (this leads to intermediate node B in the diagram). A path can lead to failure, an intermediate node in the tree, or success. The basic idea of PrologPF execution is to take an initial pass through the tree within a limited depth, collect the intermediate paths (oracles) found, and distribute these out to multiple processors to continue the search from these paths. A computer which interprets these oracles and continues the execution while accumulating further oracles is called a Delphi Machine. The PrologPF research showed that oracle creation, allocation, and execution can be interleaved in some execution strategies such that actual oracles (the numbered paths in the tree) need not be accumulated or communicated.

Here's a simple diagram of the staged search method of the execution tree:

Taking the process one phase at a time, firstly the search proceeds with an automatic fail introduced at a set depth limit:

Then the routes taken to the open search points are recorded. These are the oracles:

Simplistically, these oracles are distributed to the path processors to continue the execution from those points. The same effect can be achieved by the path processors duplicating the initial search phase:

Functions

PrologPF extends standard Prolog to allow the use of functions. The implementation of functions in PrologPF is unrelated to the parallelization technique, and stands alone as an independent piece of work.

The use of functions for deterministic execution is important in PrologPF because the parallelization technique is incompatible with the use of cut (i.e. !).

The syntax used for functions in PrologPF is similar to ML, is particularly neat, supports higher-order functions (i.e. functions themselves as arguments), and was tested with the complete list of programming exercises from the Cambridge University undergraduate ML programming course.

Here is an example of a factorial function declaration in PrologPF:

fun fact(1) = 1;
  fact(N) = N * fact(N - 1).

The alternate declarative equalities for the function definition above matches the style used in ML. PrologPF also has a special implementation of an if function, which evaluates the 'then' part or the 'else' part depending on the condition (i.e. the normal behaviour you would expect). Because PrologPF is capable of executing many parts of a program in parallel, the special 'if' function is important. Otherwise the 'condition', the 'then' and the 'else' parts would all rush off for parallel execution with more complex results. Here is the factorial function written to use the special if funtion:

fun fact(N) = if (N = 1)
      then 1
      else N * fact(N - 1).

The provision of functions in PrologPF is explained in detail in Chapter 5 of the dissertation, with detailed examples given in that chapter and Chapter 6.

SkyNet

SkyNet is the communications system that collects active workstations in the network and allocates the work.