Author
Thomas E Anderson,Brian N Bershad,Edward D Lazowska,Henry M Levy
Abstract
'Threads' are the vehicle for concurrency in many approaches to parallel
programming. Threads separate the notion of a sequential execution stream
from the other aspects of traditional UNIX-like processes such as address
spaces and I/O descriptors. The goal is to make the expression and control
of parallelism sufficiently cheap that the "natural parallel decomposition"
of an application can be exploited by the programmer or compiler with acceptable
overhead, even in the case where this decomposition is relatively fine-grained.
Threads can be supported either by the operating system kernel or by user
level library code in the application address space. Neither approach has been
fully satisfactory. The performance of kernel-level threads, although at
least an order of magnitude better than that of traditional processes, has
been atleast an order of magnitude 'worse' than that of user-level threads.
User-level threads, on the other hand, have suffered from lack of system
integration, leading to poor performance or even incorrect behavior in the
presence of "real world" factors such as multiprogramming, I/O, and page
faults. Thus the parallel programmer has been faced with a difficult dilemma:
employ kernel-level threads, which "work right" but perform poorly, or employ
user-level threads, which typically will perform but sometimes are seriously
deficient.
This paper addresses this dilemma. First, we argue that the performance of kernel
level threads is inherently worse than that of user-level threads, rather than
this being an artifact of exsisting implementations; we thus argue that managing
parallelism at the user level is essential to high-performance parallel computing.
Next, we argue that the lack of system integration exhibited by user-contemporary
multiprocessor operating systems; we thus argue that kernel-level threads or
processes, as currently conceived, are the 'wrong abstraction' on which to
support user-level management of parallelism. Finally, we describe the design
implementation, and performance of a new kernel interface and user-level thread
package that together provide the same functionality as kernel-level threads
without compromising the performance and flexibility advantages of managing
parallelism at the user level within each applicatins address space.