OpenMP

This was an intermediate tutorial which I joined in the latter half. There are alot of resources available for openMP, including the two tutorials at SC2012. Below are a few of my notes. By no means is this a complete overview of openMP.

Tutorial Notes

OMP Barriers

Barriers are synchronization points, where all OMP threads pause until all reach the barrier. Barriers come in two flavors, explict and implict. Implict/implied barriers exist at end of loop constructs and parallel regions. (Otherwise flow would unexpectedly continue.) Of course there is an override for this: the 'nowait' directive. (Add at your own risk.) Would only make sense if the subsequent processing has no data dependencies on the loop/parallel region.

#pragma omp parallel for nowait
for (i=0; i<N; ++i)
{
    a[i] = 0.0;
}
// some threads might continue early
// but we can do some more work, then issue an explict barrier:
#pragma omp barrier

Comparison of synchronization primatives between gcc and Intel compiler showed huge cost when using gcc. I have checked, at the data presented was for an older version of gcc.

'nowait' override

omp single: first thread to hit executes block end of block implied boundry

mix with single with nowait -- tricky

BOF Notes

OpenMP Future
  • 15th birthday of openmp
  • 1997 "new std to govern PC's with multiple chips"
  • OpenMP mission -- "... to fulfill a need without changing the language"
  • OpenACC is spinoff from 4 members, with plans for possible re-integration into openmp at later date.
  • OpenMP Calendar
  • OpenMP Tech report 1
  • Release candidate 1 for openmp-4.0

Tech presentation from lang chair
  • openmp 3.1 July 2011
  • openmp 4.0 nearing completion
    • rc1 comment draft
    • rc2 feb 2013
    • early June spec release
  • Feedback from non-members is welcome
  • OpenMP 5.0 comment draft @SC14
  • OpenMP-4:
    • Comment ticket process
    • Includes:
      • SIMD directives
      • extended support for affinity control
      • additional support for Fortran 2003
      • support for user-defined reductions

  • Proposed OMP 4 SIMD directives:
    • #pragma omp simd [clause ] (chunk following loop in SIMD chunks)
    • SIMD clauses:
      • safelen(length) -- limits
      • linear -- list vars
      • aligned <-- C++11 has this already
      • private, lastprivate, reduction, collapse
      • firstprivate? (couldn't find a use case)
  • What happens if loop contains func calls?
    • #pragma omp declare simd -- decorates function
    • both parallelize and vectorize simultaneously
    • #pragma omp parallel for simd
  • Proposed OMP 5 directives/features
    • OMP_PLACES to specify threads, cores, sockets
    • proc_bind(master | close | spread)
    • ignored if OMP_PROC_BIND is false
    • omp_get_proc_bind()
    • Fortran 2003 support
    • taskgroups
    • cancel tasks
    • new data environment


  • OpenMP accelerator model
    • target directives
      • target
      • target data
      • target update
      • target mirror
      • target linkable
    • new runtime funcs
      • omp_get_device_num()
    • Synchronization capabilities supported in openmp, but GPU's have weak memory
sync models

Examples:
#pragma omp target device(acc0) map(B,C)
#pragma omp parallel for reduction(+:sum)
for (...)

#pragma omp target declare
void func(...) -- compile func for device


Other/Cilk BOF:

I attended a BOF titled "OpenMP -- Is this the best we can do"? It turned out to be a Cilk BOF. A panel preached their views on cilk vs. openMP, all seeming to prefer cilk.

The reaction from the attendees was not so unanimous. The panels points boiled down to:
  1. It is simple and easy to learn, only two keywords: cilk_spawn and cilk_sync.
  2. There is no confusion about the data environment.
  3. OpenMP has "too many knobs".
  4. They claimed it is easy to get 80% of full performance with cilk, but admitted
the next 20% required constructs not found in cilk.

Criticisms of the panel boiled down to:
  1. OpenMP is well established.
  2. OpenMP supports C/C++ and Fortran.
  3. OpenMP constructs suppliment existing codes. Cilk forces re-writing of existing
codes. Like many 'silver-bullets', each method for expressing/exploiting parallelism has its quirks.

A discussion comparing cilk vs. Intel's TBB ensued. The panel mentioned these points:
  1. compiler(cilk) vs. library(TBB) implementation
  2. cilk work stealing vs dedicated threads

-- JoeBrandt - 2012-11-25
Topic revision: r1 - 2012-11-26, JoeBrandt
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback