The key features of NestStep are:
The language design of NestStep was developed 1998-2000 by
Christoph W. Kessler,
then at the University of Trier, Germany.
NestStep is well-suited for parallel applications that match the BSP
model, such as most parallel linear algebra computations, iterative
solvers for partial differential equations etc., see for instance
Bisseling's book for a presentation of BSP algorithms in scientific computing.
A superstep
consists of
In order to simplify cost predictions,
the BSP machine model is characterized by only four
parameters: the number p of processors,
the per-processor byte transfer rate g
the latency L (i.e. the minimum time
between subsequent synchronizations), and the processor speed, s.
Additionally, some properties of the program dramatically influence
the run time, in particular the communication behaviour.
By convention, the maximum total number of bytes in all outgoing or incoming messages
for a single processor at the end of a specific superstep is denoted by h.
A communication pattern that bounds the maximum communication indegree
and outdegree of any processor by h is called h-relation.
Hence, a superstep performing balanced local work w and requiring the
system to realize an h relation in the communication phase
has time complexity w+gh+L.
The BSP model, as originally defined, has no support for shared memory.
Also, in BSP there is no support for processor subset synchronization,
i.e. for nesting of supersteps. Thus, programs can only exploit
one-dimensional parallelism or must apply a
flattening-transformation
that converts nested parallelism to flat parallelism.
However, automatic flattening by the compiler
has only been achieved for SIMD parallelism (e.g. NESL).
More links on BSP:
NestStep introduces
static and dynamic nesting of supersteps, and thus directly
supports nested parallelism.
There are several variants of nesting of supersteps, marked by
parameter(s) of the
A first version of the NestStep-C to C front end compiler,
written by Magnus Holm during his final thesis (2010) project, is
now available, along with an extended run-time system for Cell BE
(nested supersteps are not supported yet).
Christoph W. Kessler:
NestStep: Nested Parallelism and Virtual Shared Memory for the BSP Model.
Proc. PDPTA'99 Int. Conf. on Parallel and Distributed Processing Techniques and Applications, Las Vegas, June 28 - July 1, 1999. Vol. II, pp. 613-619. CSREA Press.
Christoph W. Kessler:
NestStep: Nested Parallelism and Virtual Shared Memory for the BSP Model.
The Journal of Supercomputing, vol. 17(3), 2000.
Christoph W. Kessler:
Parallelism and Compilers.
Habilitation thesis, Universit\"at Trier, Germany, Dec. 2000.
Christoph W. Kessler:
Managing irregular remote accesses to distributed shared arrays in a bulk-synchronous parallel programming environment.
Proc. CPC'01 9th Int. Workshop on Compilers for Parallel Computers, Edinburgh (Scotland), pp. 195-204, June 2001.
C.W. Kessler
Managing Distributed Shared Arrays in a Bulk-Synchronous Parallel Environment.
Concurrency and Computation: Practice and Experience, vol. 16:133-153, 2004.
Joar Sohl:
A Scalable Run-Time System for NestStep on Cluster Supercomputers.
Master thesis, LITH-IDA-EX--06/011--SE, Dept. of Computer and Information Science,
Linköpings universitet, March 2006.
Håkan Mattsson and Christoph Kessler:
Towards a Bulk-Synchronous Distributed Shared Memory Programming Environment for Grids.
In: J. Dongarra, K. Madsen and J. Wasniewski, eds.,
Proc. PARA'04 Workshop on State-of-the-art
in Scientific Computing, Lyngby, Denmark, June 2004, Springer LNCS 3732,
pp. 519-526, 2006.
Christoph Kessler, Peter Fritzson, Mattias Eriksson:
NestStepModelica: Mathematical Modeling and Bulk-Synchronous Parallel Simulation.
PARA-06 Workshop on state-of-the-art in scientific and parallel computing,
Umeå, Sweden, June 18-21, 2006. The full version is published at
Christoph Kessler, Peter Fritzson, Mattias Eriksson:
NestStepModelica: Mathematical Modeling and Bulk-Synchronous Parallel Simulation.
Springer LNCS vol. 4699, 2007.
Daniel Johansson, Mattias Eriksson, Christoph Kessler:
Bulk-synchronous parallel computing on the CELL processor.
PARS'07: 21. PARS - Workshop, Hamburg, Germany, May 31-Jun 1, 2007.
GI/ITG-Fachgruppe Parallel-Algorithmen, -Rechnerstrukturen und -Systemsoftware (PARS).
Markus Ålind, Mattias Eriksson, Christoph Kessler:
The results can also be found in Markus Ålinds master thesis:
Introduction
NestStep is a partitioned global address space (PGAS) language
for the bulk-synchronous parallel (BSP) programming model.
NestStep is designed as an
explicitly parallel extension of existing imperative programming
languages such as C or Java.
Several instances of the same NestStep program
run on different processors
and communicate via an interconnection network
and the NestStep run time system, which
emulates a CRCW-BSP computer with a virtually shared memory
on top of a message-passing system.
The NestStep language extensions have been defined for Java
(NestStep-Java, 1998), for
C (NestStep-C, 2000) and for the imperative part of
Modelica (NestStepModelica, 2006).
Implementations of a NestStep run-time system have been done
in Java (1999) and C (2000, 2006).
See the
implementation and download information further below for technical
details of the implementations.
The current version of the NestStep-C run-time system is available for
download.
In contrast, NestStep is not
appropriate for all problems that inherently require sequential consistency
or bilateral mutual exclusion synchronization, because this conflicts
with the superstep consistency of NestStep.
In certain cases, such applications could be transformed into bulk-synchronous form
without introducing prohibitive overhead. An elaboration on this topic can
be found in the
PhD thesis by Arturo Gonzalez-Escribano (2003).
BSP model of parallel computation
The BSP (bulk-synchronous parallel) model,
as introduced 1990 by Valiant
and implemented e.g. by the
Oxford BSPlib library
and the
Paderborn University BSP library
for many parallel architectures,
structures a parallel computation of p processors
into a sequence of supersteps that are separated by
global barrier synchronization points.
(1) a phase of local computation of each processor, where only
local variables (and locally held copies of remote variables)
can be accessed, and
(2) a communication phase that sends some data to the processors that
may need them in the next superstep(s), and then
waits for incoming messages and
processes them.
The separating barrier after a superstep is not necessary
if it is implicitly covered by (2). For instance,
if the communication pattern in (2) is a complete exchange,
a processor can proceed to the next superstep as soon as it has received
a message from every other processor.
Nested parallelism by nesting of supersteps
In NestStep a superstep is marked by a step
keyword
preceding a statement.
Such step
statements denote supersteps that are executed by
entire groups of processors in a bulk-synchronous way.
neststep
statement.
SPMD execution in NestStep
The main()
function in a NestStep-C program,
or resp. the main
method of a class whose name has been passed as parameter to
the NestStep-Java driver program, is executed
by all available processors specified in the host file.
Their number will remain constant throughout program execution
(SPMD-model of parallel program execution).
These started processors form the so-called root group.
Groups
can be dynamically
subdivided during program execution into subgroups, following
the static nesting structure of the supersteps.
All processors within the same group execute the same step
statement.
Example programs 1 (NestStep-Java)
Example programs 2 (NestStep-C)
NestStep implementations
The run-time system of NestStep-Java
was implemented in Java in 1998 but soon abandoned because of
bad performance, in particular,
Java's (then) extremely slow object serialization and deserialization
needed for the communication between different JVMs.
In 2000, the run-time system of a
compiler for a NestStep-C was
implemented (in C) on top of MPI.
In 2005/2006, the NestStep-C run-time system was reimplemented on top of
the Tlib task library by Th. Rauber and G. Rünger, which provides
group descriptors and hierarchical group splitting used in the implementation
of our nested supersteps. On top of Tlib,
the NestStep Runtime System adds supersteps and the emulation of
virtually shared variables and arrays with programmable BSP consistency.
Together with the layers below it,
the NestStep Runtime System provides thus a virtual shared-memory BSP
machine; direct access from the NestStep-C level
to Tlib or MPI functionality is not intended
(albeit not actively prohibited).
NestStep-C Implementations
The NestStep-C implementation consists of the following components:
2 versions are available:
Cluster-NestStep-C using MPI for interprocessor communication, and
Cell-NestStep-C running on the Cell BE, a heterogeneous multicore processor.
Cluster-NestStep-C:
Based on an older version by C. Kessler (2000)
Cell-NestStep-C:
In 2007, another version of the NestStep-C run-time system was
implemented for the CELL processor:
Based on an earlier
version by
Daniel Johansson (2007).
Uses the
Cell-NestStep-C run-time system.
NestStep-Modelica
The NestStepModelica implementation combines an extension
of the Modelica language and compiler to support the NestStep constructs,
with the NestStep-C run-time system.
The language extensions, implemented in MetaModelica,
are limited to the imperative part of the Modelica language, namely
imperative computations of computationally heavy but side-effect free user functions
occuring on the right-hand side of the ordinary differential equation system generated
from a Modelica program. These computations are encapsulated in
Modelica's algorithm
construct.
The extended compiler generates C code that is linked with the NestStep-C run-time system.
Available from C. Kessler on request.
System requirements (tested on a Windows machine):
Java 2 SDK SE 1.6.x (or later), ANTLRv2, GCC.
BlockLib: A Skeleton Library for Cell Broadband Engine.
Proc. Int. Workshop on Multicore Software Engineering (IWMSE-2008) at ICSE-2008, Leipzig, Germany, May 2008. ACM.
Markus Ålind:
Skeleton library for Cell Broadband Engine.
Master thesis, IDA, Linköping Institute of Technology,
LIU-IDA/LITH-EX-A--08/002--SE, March 2008.
Related projects:
NestStep is a so-called PGAS (partitioned global address space)
parallel programming language. Other such languages include
UPC Universal Parallel C (and its predecessors Split-C, PCP and AC),
X11, Titanium, and Co-Array Fortran.
Some links:
Some major differences in NestStep
are: programmable update strategies,
built-in support of parallel reductions and multiprefix, and
the BSP superstep program structure with a simpler memory consistency model.
This page by Christoph W. Kessler (chrke \at ida.liu.se)