Why Threads?
- The primary motivation for using threads is to realize potential program performance gains.
- When compared to the cost of creating and managing a process, a thread can be created with much less OS overhead. 
- Managing threads requires fewer system resources than managing processes. 
- Threaded programming models offer significant advantages over message-passing programming models along with some disadvantages as well.
- Software Portability;
- Threaded applications can be developed on serial machines and run on parallel machines without any changes. 
- This ability to migrate programs between diverse architectural platforms is a very significant advantage of threaded APIs. 
- Latency Hiding; 
- One of the major overheads in programs (both serial and parallel) is the access latency for memory access, I/O,  and communication.
- By allowing multiple threads to execute on the same processor, threaded APIs enable this latency to be hidden. 
- In effect, while one thread is waiting for a communication operation, other threads can utilize the CPU, thus masking associated overhead.
- Scheduling and Load Balancing; 
- While writing shared address space parallel programs, a programmer must express concurrency in a way that minimizes overheads of remote interaction and idling. 
- While in many structured applications the task of allocating equal work to processors is easily accomplished,
- In unstructured and dynamic applications (such as game playing and discrete optimization) this task is more difficult.
- Threaded APIs allow the programmer 
- to specify a large number of concurrent tasks 
- and support system-level dynamic mapping of tasks to processors with a view to minimizing idling overheads. 
 
- Ease of Programming, Widespread Use 
- Due to the mentioned advantages, threaded programs are significantly easier to write (!) than corresponding programs using message passing APIs. 
- With widespread acceptance of the POSIX thread API, development tools for POSIX threads are more widely available and stable. 
- Threaded applications offer potential performance gains and practical advantages over non-threaded applications in several other ways:
- Overlapping CPU work with I/O: For example, a program may have sections where it is performing a long I/O operation. While one thread is waiting for an I/O system call to complete, CPU intensive work can be performed by other threads.
- Priority/real-time scheduling: tasks which are more important can be scheduled to supersede or interrupt lower priority tasks.
- Asynchronous event handling: tasks which service events of indeterminate frequency and duration can be interleaved. For example, a web server can both transfer data from previous requests and manage the arrival of new requests. 
- A number of vendors provide vendor-specific thread APIs. 
- Standardization efforts have resulted in two very different implementations of threads.
- Microsoft has its own implementation for threads, which is not related to the UNIX POSIX standard or OpenMP. 
- POSIX Threads.  Library based; requires parallel coding.
- The IEEE specifies a standard 1003.1c-1995 (latest 1003.1, 2004), POSIX API. 
- C Language only. Very explicit parallelism; requires significant programmer attention to detail. 
- Commonly referred to as Pthreads. 
- POSIX has emerged as the standard threads API, supported by most vendors. 
- The concepts themselves are largely independent of the API and can be used for programming with other thread APIs (NT threads, Solaris threads, Java threads, etc.) as well. 
 
- OpenMP. Compiler directive based; can use serial code.
- Jointly defined by a group of major computer hardware and software vendors. 
- The OpenMP Fortran API was released October 28, 1997. 
- The OpenMP C/C++ API was released in late 1998.
- Portable / multi-platform, including Unix and Windows NT platforms
- Can be very easy and simple to use - provides for ``incremental parallelism``.
 
- MPI 
 on-node communications, on-node communications,
- Threads 
 on-node data transfer. on-node data transfer.
- MPI libraries usually implement on-node task communication via shared memory, which involves at least one memory copy operation (process to process).
- For Pthreads there is no intermediate memory copy required because threads share the same address space within a single process. 
- There is no data transfer. 
- It becomes more of a cache-to-CPU or memory-to-CPU bandwidth (worst case) situation. 
- These speeds are much higher. 
- Programs having the following characteristics may be well suited for Threads:
- Work that can be executed, or data that can be operated on, by multiple tasks simultaneously.
- Block for potentially long I/O waits.
- Use many CPU cycles in some places but not others.
- Must respond to asynchronous events.
- Some work is more important than other work (priority interrupts).
Common models for thread programming:
- Manager/worker: a single thread, the manager assigns work to other threads, the workers. Typically, the manager handles all input and distribute work to the other tasks. At least two forms of the manager/worker model are common:
- static worker pool,
- dynamic worker pool.
 
- Pipeline: a task is broken into a series of suboperations, each of which is handled in series, but concurrently, by a different thread. An automobile assembly line best describes this model.
- Peer: similar to the manager/worker model, but after the main thread creates other threads, it participates in the work. 
Cem Ozdogan
2011-09-28