Research project assistant position available
I am recruiting 1-2 student assistants ("amanuens") 20%-50% for a large research
project in distributed, parallel and heterogeneous
computing for the edge/cloud continuum.
The work (C/C++ programming, system administration, help with experiments) could be suitably
combined with an internal master thesis project in the same topic area.
More information and application link:
Utlysning /
Announcement
(Deadline 6 August 2024. Application is only via the application system linked from the announcement, not via email.)
Open PhD student position in a new project on Parallel AI planning algorithms at Linköping University (to be co-supervised).
More information and application link:
Utlysning /
Announcement
(Deadline 30 August 2024. Application is only via the application system linked from the announcement, not via email.)
Recent news and upcoming events:
Our research paper "Interactive Performance Visualization and Analysis of Execution Traces for Pattern-based Parallel Programming" has been accepted for the 17th International Symposium on High-Level Parallel Programming and Applications, July 2024.
The third generation of SkePU, a single-source high-level programming framework for heterogeneous parallel systems developed by our group as a long-term open-source effort since 2010, has been identified by EU Innovation Radar as a notable innovation outcome of the EU H2020 EXA2PRO project in April 2022.
Our research paper Temperature-Aware Energy-Optimal Scheduling of Moldable Streaming Tasks onto 2D-Mesh-Based Many-Core CPUs with DVFS
will be presented at 24th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2021)
in conjunction with IPDPS 2021, Portland, Oregon USA, 21 May 2021.
SkePU-3, our open-source framework for single-source high-level programming of heterogeneous parallel systems and clusters, is available on github since April 2020
Connecting Education and Research Communities for an Innovative Resource Aware Society (CERCIRAS)
EU COST action, about resource-aware parallel computing in cyberphysical systems, 2021-2024
SkePU:
Auto-tunable skeleton programming framework for Multicore CPU and multi-GPU systems
A framework for single-source, high-level, structured, portable parallel programming of heterogeneous systems.
Open-source software project since 2010. Current version: SkePU 3.
Mapping and Scheduling Moldable-Parallel Streaming Tasks on Many-Core Processors
Energy optimization at the application level is an important concern in
effectively using today's multi- and many-core CPU architectures.
For task-based streaming applications such as video processing pipelines,
application-level resource (core and core type) allocation,
mapping, scheduling,
and voltage/frequency level selection are important techniques that
together control both time and energy usage of the application.
In this work we consider the combinatorial optimization problem of allocating cores, mapping, and discrete
voltage/frequency scaling for programs consisting of
moldable (i.e., parallelizable) streaming tasks (also
known as actors) to multi- and manycore processors,
in order to optimize energy consumption given a throughput constraint.
Our fundamental approach, called Crown Scheduling, introduces an artificial
tree-like hierarchy of allocatable groups of cores, called the crown,
which dramatically reduces the size of the solution space, thus making even optimal solutions
feasible for small to medium sized task graphs, but has only negligible
impact on solution quality (i.e., overall energy use).
In more recent work the approach has been generalized e.g.
to also cover heterogeneous multi-/many-core architectures and architectures with DVFS islands,
to improve DVFS flexibility by dynamic schedule adaptation,
and to eliminate tasking overheads by optimal task fusion.
Language and tool infrastructure for energy-aware application synthesis,
system modeling, performance and energy modeling,
optimization techniques and autotuning
for holistic energy optimization for heterogeneous multicore systems
Skeleton and Pattern Based Programming Environments
BlockLib:
Skeleton programming library for Cell/B.E.
PRT Pattern Recognition Tool:
Generic tool for automated recognition of computational patterns in legacy C programs,
e.g. for pattern-based automatic parallelization.
SeRC-OpCoReS: Optimized Composition and Runtime Support for e-Science, 2011-2018,
Swedish e-Science Research Center (SeRC),
core section on
Parallel and Distributed Algorithms and Tools (2011-2015) and
Parallel Software and Data Engineering (2016-2018).
Integrated Code Generation for Instruction-Level Parallel Architectures
OPTIMIST: Optimization algorithms for integrated code generation
OPTIMIST is a retargetable, highly optimizing code generator
for superscalar, VLIW, clustered VLIW, DSP and embedded processor architectures.
To achieve high code quality,
it simultaneously considers the optimization problems for
instruction selection (including cluster assignment and
resource allocation),
instruction scheduling,
and register allocation.
Partially funded 2001-2007 by CENIIT
and 2004-2005 by SSF RISE.
Integrated Software Pipelining
Optimal code generation for loops, integrating both instruction selection,
cluster assignment,
scheduling and register allocation including optimal spill code generation and scheduling,
for embedded, VLIW and clustered VLIW processors.
Funded 2006-2008 and 2010-2012
by Vetenskapsrådet (VR)
and 2006-2011 by the CUGS
graduate school.
REPLICA
project (contract research).
This VTT project developed a reconfigurable shared memory chip multiprocessor supporting strong memory consistency
(CRCW PRAM on a chip). We developed a
high-level parallel programming language, a compiler backend and system support
for the REPLICA architecture.
DSP Platform for Emerging Telecommunication and Multimedia (ePUMA)
Optimizing DSP streaming applications for memory access cost
on a new reconfigurable chip multiprocessor.
WP3: Classification of memory access patterns in DSP applications;
program analysis for memory access structures, and
automatic selection of most suitable network configuration for
parallel memory access.
Funded 2008-2011 by SSF
PRT Pattern Recognition Tool
Generic tool for automated recognition of computational patterns in legacy C programs,
e.g. for pattern-based automatic parallelization.
On-chip pipelining of memory-intensive computations
on multi-/manycore processors (Cell/B.E. and Intel SCC)
Restructuring memory-intensive, streamable computations such as parallel mergesort
to use on-chip forwarding of intermediate data between Cell SPEs
allows to reduce the overall volume of off-chip memory accesses,
making the application less memory bound and resulting in faster computation.
We develop mapping algorithms that optimize trade-offs between computational load balance,
on-chip buffer requirements and on-chip communication volume in on-chip pipelining.
Applied to mergesort on Cell, this speeds up the dominating global
merge phase of CellSort by up to 70% on QS-20 and up to 143% on PlayStation-3,
see our paper at Euro-Par 2010.
Fork:
Fork95 Language Definition and Compiler
for the SB-PRAM,
a scalable, massively parallel shared memory MIMD computer
with uniform memory access time that works synchronously at the instruction level.
The complete project is described in a book.
The compiler and tools developed for the SB-PRAM have been used for research purposes
and in programming labs for
teaching parallel algorithms.
SPARAMAT
A tool for automatic detection of sparse matrix computations and data structures
in application programs by static and dynamic pattern matching techniques,
which can be used for automatic parallelization and aggressive program transformations.
(The successor of the former
PARAMAT
project at Saarbrücken.)
Funded 1997-2000 by Deutsche Forschungsgemeinschaft (DFG)
NestStep
Design and implementation of a MIMD parallel global address space (PGAS) language
based on the BSP (bulk-synchronous parallel) programming model,
supporting shared variables and nested parallelism
on top of message passing architectures.
NestStep provides deadlock-free, deterministic parallel execution with
BSP-compliant synchronicity and memory consistency.
NestStep has been implemented for MPI clusters and for the
heterogeneous multicore processor Cell/B.E.
Interactive Invasive Parallelization
User-guided composition of parallel software with an incremental
aspect-oriented parallelization approach.
Covers both automatic parallelization,
skeleton-based structured parallel programming and semiautomatic
program restructuring.
Support for automatic roundtrip engineering in aspect weaving.
Part of the RISE project
funded 2002-2005 and 2006-2007
by SSF.