Typed worklist
1 Overview
2 Intended use
3 Theoretical interest
4 The worklist macro

Typed worklist

Georges Dupéron <[email protected]>

 (require typed-worklist) package: typed-worklist

1 Overview

When traversing a graph, one might want to keep track of the already-processed nodes, and the ones which still have to be processed. The same goes when implementing algorithms for which work on an element may queue more elements.

If there is only one type of element, then the worklist function’s type is simple:

( (In Out) ( (Listof In)
               ( In (Values Out (Listof In)))
               (Listof Out)))

This package provides a worklist function (with a lightweight macro wrapper) which can handle n worklists, where each worklist holds tasks of different types. Furthermore, the worker for each element type may enqueue elements into arbitrary worklists.

This macro should be useful when traversing heterogeneous graphs, or when implementing algorithms which use several waiting queues (worklists) holding elements of different types.

2 Intended use

This package is mainly intended to be used by the phc-graph packageThe development of phc-graph is currently stalled, as I’m investigating solutions based on turnstile..

The goal of phc-graph is to implement generalised folds (catamorphisms) over heterogeneous graphs. Heterogeneous graphs can be conceptualised as trees where nodes can have different types (e.g. nodes of type A have children of type B, which have children of type C, which have children of type A, and each node type can hold different metadata), generalised to allow backward arcs (so that the trees become true graphs). This sort of transformation can for example occur when transforming an abstract syntax tree within a pass of a compiler.

A simple technique for traversing and processing such a data structure is to have a worklist, which remembers the already-visited nodes, as well as a queue of nodes that still need to be visited. A polymorphic example-worklist1 function which can be used to handle two sets of processed/to-process nodes from a homogeneous graph would have the following signature:


(example-worklist1 roots proc)  (Listof Out)

  roots : (Listof In)
  proc : ( In (Values Out (Listof In)))
Signature for a hypothetical function which takes an initial list of tasks of type In, and a processing function which consumes one task, returning a result of type Out, and a list of new tasks.

The hypothetical function would return the results obtained after processing each task, making sure that each task is processed only once (i.e. if a task is returned multiple times by processing functions, it is only added once to the queue).

However, when the tasks to handle have heterogeneous types In₁ Inₙ, the type of the worklist function is not simple to express. Using Typed Racket’s variadic polymorphic types, also sometimes mentioned as "polydots" (of the form ( (A B X ...) τ)) along with intersection types, it is possible to encode the type of worklist.

Typed Racket is not (yet?) able to reliably enforce the fact that two variadic polymorphic type variables must have the same length. The trick is to wrap the Inᵢ and Outᵢ types with structs I and O, which serve as type-level tags. Then, a single variadic type variable ranges over the union types (U (I Inᵢ) (O Outᵢ)) .... Finally, intersection types are used to project the I or O parts of the type variable.

The implementation of the worklist function is not trivial, and calling it requires some amount of boilerplate to correctly instantiate its type, wrap the inputs with I, and extract the results out of the O. The worklist macro takes care of this boilerplate.

3 Theoretical interest

It is worth noting that the core of the worklist implementation is a function. The macro does not generate the whole implementation, instead it merely generates a lightweight wrapper and takes care of the instantiation of the function’s polymorphic type. When worklists with a large number of task types are used, this function-based implementation can (hopefully) reduce the typechecking time, compared to a macro-based implementation which would generate a large chunk of code needing to be typechecked. Also, the guarantees on the correctness of the code are better, since the function-based implementation is typechecked once and for all (the macro-generated wrapper could still fail to typecheck, but it is a smaller and simpler piece of code).

Finally, it is a remarkable feat that Typed Racket is able to express the type of such a function (and is also able to typecheck its implementation!), as this problem would require at a first look some flavor of type-level computation, dependent types or similar advanced typing features. Comparatively, the mechanisms used here (variadic polymorphic variables and intersection types) are refreshingly simple.

4 The worklist macro


(worklist roots [procᵢ …ₙ] [Inᵢ Outᵢ] …ₙ)

  roots : (List (Listof Inᵢ) …ₙ)
  procᵢ : ( Inᵢ (List Outᵢ (Listof Inᵢ) …ₙ))
  Inᵢ : Type
  Outᵢ : Type
Executes the corresponding procᵢ on each element of each worklist. The procᵢ takes a value of type Inᵢ and returns an output of type Outᵢ, as well as n lists of new inputs which are added to the worklists.

The worklists are initialised with the given roots.

The whole expression has the following result type:

(List (HashTable Inᵢ Outᵢ) …ₙ)

Within a worklist, duplicate elements are only processed once.