Due to heavy workload and unexpected changes of course responsibilities,
this Multicore Computing course, originally announced for spring 2013,
was moved to early fall 2013. The course starts on monday 26 august 2013 at 13:15 in room Donald Knuth. |
Recommended for
Graduate (CUGS, CIS, ...) students interested in the areas of
parallel computer architecture, parallel programming, software engineering,
optimization, compiler construction, or algorithms and complexity.
The course is hosted by the
CUGS graduate school
as an Advanced Course.
Undergraduate and master-level students are kindly referred
to the similar course
TDDD56 Multicore and GPU Programming (ht2) instead.
Registration (internal access only).
(If you have no IDA research account,
please contact the CUGS secretary, Anne Moe (annes \at ida.liu.se)
for registration.)
List of participants 2013 (internal access only).
Organization
Lectures (ca. 32h),
programming exercises, optional
theoretical exercises for self-assessment, programming lab assignment,
student presentations.
The course will be held as block course with two intensive weeks
in Linköping.
The course was last given
in 2011 and 2009.
Goals
The course emphasizes fundamental aspects of shared-memory
and accelerator-based parallel programming, such as
shared memory parallel architecture concepts, programming models,
performance models,
parallel algorithmic paradigms, parallelization techniques and
strategies, scheduling algorithms, optimization, composition of
parallel programs, and
concepts of modern parallel programming languages and systems.
Practical exercises help to apply the theoretical concepts
of the course to solve concrete problems in a real-world
multicore system.
Prerequisites
Data structures and algorithms (e.g. TDDB57) are absolutely required;
knowledge in complexity theory and compiler construction is useful.
Some basic knowledge of computer architecture is assumed.
A basic course in concurrent programming
(e.g. TDDB68) is recommended.
Programming in C and some familiarity with Linux (or similar OS)
is necessary for the practical exercises.
Contents/Schedule
Lecture block in 2 intensive weeks (weeks 35, 36) in Linköping:
monday 26/8 kl 13-16, tuesday, wednesday 9-12, 13:15-16, thursday 29/8 kl 9-12 monday 2/9 kl 13-16, tuesdag, wednesday kl 9-12, 13:15-16, thursday 5/9 kl 9-12Labs start in week 35+36 and continue v37, v38.
(still under construction, but the dates are fixed)
The mapping of topics to time slots as given below is approximate; if necessary, we might dynamically shift contents back and forth as appropriate.
Lectures are given in the room
Donald Knuth.
Lab sessions are given in the room
Konrad Zuse (IDA Multicore Lab)
Day | Time | Room | Lecturer | Topic |
26 aug 2013 | 13:15-13:30 | D. Knuth | C. Kessler | Organization and overview |
13:30-16:00 | D. Knuth | C. Kessler |
The Multicore Challenge:
Introduction to Multicore architecture concepts and trends. SIMD Computing. | |
27 aug | 09:15-12:00 | D. Knuth | C. Kessler | Parallel programming with threads and tasks. Shared memory architecture concepts and performance issues |
13:30-15:00 | D. Knuth | C. Kessler | Review of parallel computing theory, Part I PRAM model. Parallel time, cost, speedup. Amdahl's Law, Gustafsson's Law, Brent's Theorem. Scalability. NC. Basic data-parallel algorithmic building blocks: Parallel prefix sums, list ranking. Delay model. BSP model. LogP model. - Algorithmic design pattern: Parallel divide-and-conquer. | |
15:30-17:00 | D. Knuth / K. Zuse | N. Melot | Lab introduction and CPU lab kick-off | |
28 aug | 09:15-12:00 | D. Knuth | C. Kessler |
Parallel computing theory (cont.) Parallel sorting algorithms |
13:15-16:00 | D. Knuth | C. Kessler | Nonblocking synchronization | |
29 aug | 08:15-10:00 | D. Knuth | I. Ragnemalm |
GPU Architecture, CUDA. |
29 aug | 10:15-12:00 | K. Zuse (Multicore Lab) | N. Melot | CPU lab session |
Day | Time | Room | Lecturer | Topic |
2 sep | 13:15-15:00 | D. Knuth | N. Melot | Lesson (theory exercise solving session) Please prepare the theory exercises before the lesson. |
2 sep | 15:15-16:00 | D. Knuth | C. Kessler | Algorithmic multithreading with Cilk |
3 sep | 09:15-10:30 | D. Knuth | C. Kessler |
Generic parallel programming with skeletons:
Concepts and generative realization for GPUs with
SkePU. Mid-term evaluation. |
3 sep | 10:45-12:00 | D. Knuth | W. Löwe | Bridging models and machines: BSP, LogP. Slides (2011) |
3 sep | 13:15-14:30 | D. Knuth | W. Löwe | Simulations between models. Slides (2011) |
3 sep | 14:45-17:00 | D. Knuth | W. Löwe |
Static clustering and scheduling of task graphs. Scheduling malleable tasks. Slides (2011) |
4 sep | 09:15-10:00 | D. Knuth | C. Kessler | Algorithmic design pattern: On-Chip pipelining for Cell. |
4 sep | 10:15-12:00 | D. Knuth / K. Zuse |
U. Dastgeer | OpenCL; GPU-Lesson and Lab kick-off: GPU programming with CUDA and OpenCL (continue on your own) |
4 sep | 13:15-16:00 | D. Knuth | C. Kessler | Parallelization of sequential programs Run-time parallelization |
5 sep | 09:15-10:00 | D. Knuth | C. Kessler |
Guidelines for Paper Presentations. Distribution of papers for student presentations. Paper assignment |
5 sep | 10:15-11:45 | D. Knuth | C. Kessler |
Autotuning overview. Optimized composition of performance-aware parallel components. Performance portability for heterogeneous multicore systems: The PEPPHER Approach. |
5 sep | 11:45-12:00 | D. Knuth | C. Kessler | Wrap-up and evaluation. |
Day | Time | Room | Lecturer | Topic |
9 sep | 10:00-12:00 | K. Zuse | N. Melot | CPU lab session |
13 sep | 10:00-12:00 | K. Zuse | U. Dastgeer | GPU lab session |
17 sep | 10:00-12:00 | K. Zuse | U. Dastgeer | GPU lab session |
18 sep | 13:15-17:00 | D. Knuth | Student presentation session | |
19 sep | 10:00-12:00 | K. Zuse | U. Dastgeer | GPU lab session |
20 sep | 13:00-16:00 | D. Knuth | Written exam |
In 2013 we offer a single lab suite covering both CPU and GPU labs (3hp). Both parts need to be done.
Supervision and examination: Nicolas Melot
Prerequisites for the labs: C programming, Linux, and a basic course in concurrent programming and operating systems (e.g. TDDB68).
Supervision and examination: Usman Dastgeer
Exercises
Some theory exercises are collected here.
Literature
See also the Literature list for TDDD56.
Background reading on the theory of parallel computation and on parallel algorithms:
Further literature references may be added later.
Teachers
Examination
TEN1: Written exam, 3hp.
UPG2:
Paper
presentation, opposition, and written summary
(1-2 pages, deadline 4/10/2013) of the presented paper, 1.5hp.
Credit
7.5hp if all examination moments are fulfilled.
Partial credits can be given, too, but the written exam must be passed.
If you already have taken TDDD56 Multicore and GPU Programming earlier, which overlaps with most of this course, you need only do the presentation moment (UPG2) and can get 1.5hp for the course.
Comments
There is a partial overlap in contents with the earlier course
FDA125 Advanced Parallel Programming which was last given 2007,
corresponding to 1p = 1.5hp in the written exam.
It overlaps to a major extent (ca. 5hp) with the master-level course
TDDD56 Multicore and GPU Programming
and to a minor extent (1.5hp) with the master-level course
TDDC78 Programming of Parallel Computers.
In contrast to FDA125 and TDDC78, we focus in this course on shared memory and multicore systems, also heterogeneous multicore and GPU architectures, and add more material on software engineering techniques for parallel systems, such as design patterns for synchronization and optimized composition of parallel components. We also give more emphasis to parallelization techniques that are especially useful for multicore systems, such as SIMD programming, speculative parallelization, transactional programming and lock-free synchronization. Scheduling for shared memory systems will be considered in more detail, in particular scheduling algorithms for malleable task graphs.