This was an intermediate tutorial which I joined in the latter half. There are alot
of resources available for openMP
, including the two
tutorials at SC2012.
Below are a few of my notes. By no means is this a complete overview of openMP.
Barriers are synchronization points, where all OMP threads pause until all reach the
barrier. Barriers come in two flavors, explict and implict.
Implict/implied barriers exist at end of loop constructs and parallel regions.
(Otherwise flow would unexpectedly continue.)
Of course there is an override for this: the 'nowait' directive. (Add at your own
risk.) Would only make sense if the subsequent
processing has no data dependencies on the loop/parallel region.
#pragma omp parallel for nowait
for (i=0; i<N; ++i)
a[i] = 0.0;
// some threads might continue early
// but we can do some more work, then issue an explict barrier:
#pragma omp barrier
Comparison of synchronization primatives between gcc and Intel compiler showed huge
cost when using gcc. I have checked, at the data presented was for an older version
first thread to hit executes block
end of block implied boundry
mix with single with nowait -- tricky
- 15th birthday of openmp
- 1997 "new std to govern PC's with multiple chips"
- OpenMP mission -- "... to fulfill a need without changing the language"
- OpenACC is spinoff from 4 members, with plans for possible re-integration into openmp at later date.
- OpenMP Calendar
- OpenMP Tech report 1
- Release candidate 1 for openmp-4.0
Tech presentation from lang chair
- openmp 3.1 July 2011
- openmp 4.0 nearing completion
- rc1 comment draft
- rc2 feb 2013
- early June spec release
- Feedback from non-members is welcome
- OpenMP 5.0 comment draft @SC14
- Comment ticket process
- SIMD directives
- extended support for affinity control
- additional support for Fortran 2003
- support for user-defined reductions
- Proposed OMP 4 SIMD directives:
- #pragma omp simd [clause ] (chunk following loop in SIMD chunks)
- SIMD clauses:
- safelen(length) -- limits
- linear -- list vars
- aligned <-- C++11 has this already
- private, lastprivate, reduction, collapse
- firstprivate? (couldn't find a use case)
- What happens if loop contains func calls?
- #pragma omp declare simd -- decorates function
- both parallelize and vectorize simultaneously
- #pragma omp parallel for simd
- Proposed OMP 5 directives/features
- OMP_PLACES to specify threads, cores, sockets
- proc_bind(master | close | spread)
- ignored if OMP_PROC_BIND is false
- Fortran 2003 support
- cancel tasks
- new data environment
- OpenMP accelerator model
- target directives
- target data
- target update
- target mirror
- target linkable
- new runtime funcs
- Synchronization capabilities supported in openmp, but GPU's have weak memory
#pragma omp target device(acc0) map(B,C)
#pragma omp parallel for reduction(+:sum)
#pragma omp target declare
void func(...) -- compile func for device
I attended a BOF titled "OpenMP -- Is this the best we can do"?
It turned out to be a Cilk
A panel preached their views on cilk vs. openMP, all seeming to prefer cilk.
The reaction from the attendees was not so unanimous. The panels points boiled down
- It is simple and easy to learn, only two keywords: cilk_spawn and cilk_sync.
- There is no confusion about the data environment.
- OpenMP has "too many knobs".
- They claimed it is easy to get 80% of full performance with cilk, but admitted
the next 20% required constructs not found in cilk.
Criticisms of the panel boiled down to:
- OpenMP is well established.
- OpenMP supports C/C++ and Fortran.
- OpenMP constructs suppliment existing codes. Cilk forces re-writing of existing
Like many 'silver-bullets', each method for expressing/exploiting parallelism has
A discussion comparing cilk vs. Intel's TBB ensued. The panel mentioned these points:
- compiler(cilk) vs. library(TBB) implementation
- cilk work stealing vs dedicated threads