Shared Memory Architecture and Their Parallel Programming Models

Parallel Programming Models

Programming model is made up of the languages and libraries that create an abstract view of the machine The programming model enables us to identify
Control
- How is parallelism created?
- What orderings exist between operations
Data:
- what data is private vs. shared?
- How is logically shared data accessed or communicated

Program is a collection of threads of control, can be created mid-execution
Each thread has a set of private variables, e.g., local stack variables
Also a set of shared variables, e.g., static variables
- Threads communicate implicitly by writing and reading shared variables

POSIX: Portable Operating System Interface
- Interface to Operating System utilities
PThraeds: The POSIX threading interface
- System calls to create and synchronize threads
- Should be relatively uniform across UNIX-like OS platforms -PThreads contain support for
- Creating parallelism
- Synchronizing
- No explicit support for communication, because shared memory is implicit; A pointer to shared data is passed to a thread

Threads interact in a multiprogrammed system
- To share resources
- To coordinate their execution
Arbitrary interleaving of thread executions can have unexpected consequences
- We need a way to restrict the possible interleaving of executions
- Scheduling is invisible to the application => cannot know when we lose control of the CPU and another thread/process runs
Synchronization is the mechanism that gives us this control

What happens when 2 or more concurrent threads manipulate a shared resource without any synchronization?
- The outcome depends on the order in which accesses take place
- This is called a race condition
We need to ensure that only one thread at a time can manipulate the shared resource
- So that we can reason about correct program behaviour
- We need synchronization

Given:
- A set of n threads
- A set of resources shared between threads
- A segment of code which accesses the shared resources, called the critical section
We want to ensure that:
- Only one thread at a time can execute in the critical section
- All other threads are forced to wait on entry
- When a thread leaves the CS, another can enter
Mutex locks
- Typically associated to a resource, to ensure one access at a time, to that resource
- Ensure mutual exclusion to a critical section
- For mutexes, a thread go to sleep when they see the lock is busy

Threads that reach the barrier stop until all threads have reached it as well
- If execution has stages, barrier ensures that data needed from a previous stage is ready

OpenMP = Open specification for Multi-Processing
Motivation: capture common usage and simplify programming
OpenMP will:
- Allow a programmer to separate a program into serial regions and parallel regions, rather than P concurrently-executing threads
- Hide stack management
- Provide synchronization constructs
OpenMP will not:
- Parallelize automatically
- Guarantee speedup
- Provide freedom from data races

Start with a single (master) thread
Fork: Create a group of worker threads
- Everything in a parallel block is executed by every thread
Join: All threads synchronize at the end of the parallel block
- Execution continues with only the initial thread
Threads can do the exact same work, share the same tasks (work sharing), or perform distinct tasks in parallel