OpenMP
This was an intermediate tutorial which I joined in the latter half. There are alot
of resources available for
openMP, including the two
tutorials at SC2012.
Below are a few of my notes. By no means is this a complete overview of openMP.
Tutorial Notes
OMP Barriers
Barriers are synchronization points, where all OMP threads pause until all reach the
barrier. Barriers come in two flavors, explict and implict.
Implict/implied barriers exist at end of loop constructs and parallel regions.
(Otherwise flow would unexpectedly continue.)
Of course there is an override for this: the 'nowait' directive. (Add at your own
risk.) Would only make sense if the subsequent
processing has no data dependencies on the loop/parallel region.
#pragma omp parallel for nowait
for (i=0; i<N; ++i)
{
a[i] = 0.0;
}
// some threads might continue early
// but we can do some more work, then issue an explict barrier:
#pragma omp barrier
Comparison of synchronization primatives between gcc and Intel compiler showed huge
cost when using gcc. I have checked, at the data presented was for an older version
of gcc.
'nowait' override
omp single:
first thread to hit executes block
end of block implied boundry
mix with single with nowait -- tricky
BOF Notes
OpenMP Future
- 15th birthday of openmp
- 1997 "new std to govern PC's with multiple chips"
- OpenMP mission -- "... to fulfill a need without changing the language"
- OpenACC is spinoff from 4 members, with plans for possible re-integration into openmp at later date.
- OpenMP Calendar
- OpenMP Tech report 1
- Release candidate 1 for openmp-4.0
Tech presentation from lang chair
- openmp 3.1 July 2011
- openmp 4.0 nearing completion
- rc1 comment draft
- rc2 feb 2013
- early June spec release
- Feedback from non-members is welcome
- OpenMP 5.0 comment draft @SC14
- OpenMP-4:
- Comment ticket process
- Includes:
- SIMD directives
- extended support for affinity control
- additional support for Fortran 2003
- support for user-defined reductions
- Proposed OMP 4 SIMD directives:
- #pragma omp simd [clause ] (chunk following loop in SIMD chunks)
- SIMD clauses:
- safelen(length) -- limits
- linear -- list vars
- aligned <-- C++11 has this already
- private, lastprivate, reduction, collapse
- firstprivate? (couldn't find a use case)
- What happens if loop contains func calls?
- #pragma omp declare simd -- decorates function
- both parallelize and vectorize simultaneously
- #pragma omp parallel for simd
- Proposed OMP 5 directives/features
- OMP_PLACES to specify threads, cores, sockets
- proc_bind(master | close | spread)
- ignored if OMP_PROC_BIND is false
- omp_get_proc_bind()
- Fortran 2003 support
- taskgroups
- cancel tasks
- new data environment
- OpenMP accelerator model
- target directives
- target
- target data
- target update
- target mirror
- target linkable
- new runtime funcs
- Synchronization capabilities supported in openmp, but GPU's have weak memory
sync models
Examples:
#pragma omp target device(acc0) map(B,C)
#pragma omp parallel for reduction(+:sum)
for (...)
#pragma omp target declare
void func(...) -- compile func for device
Other/Cilk BOF:
I attended a BOF titled "OpenMP -- Is this the best we can do"?
It turned out to be a
Cilk BOF.
A panel preached their views on cilk vs. openMP, all seeming to prefer cilk.
The reaction from the attendees was not so unanimous. The panels points boiled down
to:
- It is simple and easy to learn, only two keywords: cilk_spawn and cilk_sync.
- There is no confusion about the data environment.
- OpenMP has "too many knobs".
- They claimed it is easy to get 80% of full performance with cilk, but admitted
the next 20% required constructs not found in cilk.
Criticisms of the panel boiled down to:
- OpenMP is well established.
- OpenMP supports C/C++ and Fortran.
- OpenMP constructs suppliment existing codes. Cilk forces re-writing of existing
codes.
Like many 'silver-bullets', each method for expressing/exploiting parallelism has
its quirks.
A discussion comparing cilk vs. Intel's TBB ensued. The panel mentioned these points:
- compiler(cilk) vs. library(TBB) implementation
- cilk work stealing vs dedicated threads
--
JoeBrandt - 2012-11-25