Introduction:

There are plenty of nice documentation and examples online including Wikipedia entry on OpenMP.
Partitioning

Always try to partion on the object that needs to be written into. For example if you are going to write results in a "Matrix" make sure you make each child deal with some rows exclusively.
Private variables and objects of class.

Openmp tends not to handle private/protected members very well. So you are better off making a local copy or access via the c style pointer storage if it is a casa::Array object
Dealing with casa::Arrays

Matrix::row or Cube::xyPlane are thread unsafe. So you are better off if you can use Block<Vector > instead of Matrix and send element of the Block rather than the Matrix::row or Matrix::Column to each thread.

Or again you can make sure you get a c style pointer of the Array that is contiguous by using Array::getStorage and use the c style indexing over the different axes
Complex Matrix phasor(nx, ny);
//You have already defined nx, ny, startRow and endRow 
Bool delphase; 

Complex * phasorstor=phasor.getStorage(delphase);
//using a private variable Vector<Double> length nx 
Bool del; 

const Double * visfreqstor=interpVisFreq_p.getStorage(del);
Int irow;
#pragma omp parallel default(none) private(irow) firstprivate(visfreqstor,phasorstor) shared(startRow, endRow, nx)
  {
#pragma omp for
  for (irow=startRow; irow<=endRow;irow++){
    doSomeMath(visfreqstor, nx, phasorstor, irow);
  }  
}//end pragma parallel
//Now make sure that  the contiguous array be pput back into the object in case getStorage did a copy
 phasor.putStorage(phasorstor, delphase);

And the method dosomemath will look like the following

void MyClass::doSomeMath(const Double*& freq, const Int& nvchan, Complex*& phasor, const Int& row){
    
    Int rowoff=row*nvchan;
    Double phase;
    Vector<Double> pos(2);
    for (Int f=0; f<nvchan; ++f){
      phase=-Double(2.0)*C::pi*some_compute_intensive_stuff*(freq[f])/C::c;
      // Note c-style array element indexing of Matrix
      phasor[rowoff+f]=Complex(cos(phase), sin(phase));
    
    }

}
 

In the above case the openmp will send a irow (or a bunch depeding on the chunk size you define in the SCHEDULE directive, if you choose to do so) to each thread ..effectively populating a row of the matrix

Two things to note ...the advantage of hiding the child process work in a function 1) allow the compiler to optimize the inner loops nicely (especially it seems gcc sometimes does not respect -funfoldloop optimization in presence of openmp directive). 2) It is easier to read.

-- KumarGolap - 2012-02-14
Topic revision: r2 - 2012-02-14, KumarGolap
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback