CASA Statistics Framework Meeting

Tuesday August 18th, 2014, DSOC 317 @ 8AM MT

  • Polycom Video: 192.33.117.12##8108
  • Voice: 434-817-6523

Attendees:

  • Socorro: Rob, Tak, Kumar, Juergen, George, Sanjay, Urvashi, Jim.
  • Charlottesville: Dave.
  • Garching: Justo, Sandra.

  • Apologies: Lindsey & Jeff (who are attending a pipeline meeting). Susan working on 4.2.2 blocker.

Discussion

  • Review of background leading to this proposal.
    • Discussion of changes to imstat.
    • Discussion of bugs in visstat with large data sets.

  • Review of the merits:

  • Review of concerns with a common framework:
    • Many times of data input can be foreseen, including images and visbuffers. The concern is that morphing the data into the required format to be ingested by a common, more generic framework is ineffective and high effort, leading to a preference for a more specific implementation within a given task.
      • Can address this by ensuring that all current use cases are incorporated in the requirements capture. New framework should be sufficiently flexible to not require undue data manipulation for ingestion by the statistics framework.
    • Existing implementations in casacore are not thread safe, which limits their use. Imager has had to implement something outside of casacore to address this.
      • New framework should be thread safe.
    • Difficult to disentangle statistics from fitting. Can think of many statistics as estimates of goodness of fit to a constant (mean or median). May want to consider other estimates of a central tendency, expressed in polynomial form.
      • Should assess use cases and decide if this is within scope. At a minimum, may want to leave hooks for future framework expansion. Can treat this as a 2nd level feature in an iterative/incremental development plan.

  • Review of Scope:
    • New framework, with some code reuse, designed to meet the requirements captured.
      • More likely to meet all our needs but higher effort.
    • Alternative: A more limited approach with a common location for statistics utilities, expanded on an ad hoc basis.
      • Lower effort, but less optimized and maintainable.
    • Will continue with requirements capture before making a decision.

  • Start of requirements capture.
    • Reviewed requirements draft page.
    • May want to be able to pick axes within an array or have greater flexibility in data selection / input. * ACTION: George to propose revised language on data selection / input.
    • May want to explicitly call out a requirement for incremental feeding / data input. Provision for an accumulator.
      • ACTION: Jim to propose language for a requirement.
    • When considering the new noise estimating algorithms, the inclusion of polynomial fitting is relevant. Can include feature or restrict to algorithms that do not require polynomial fitting. (TBD)
      • ACTION: Dave to evaluate how feasible addressing Peter Teuben's request for a probability that a distribution is Gaussian as a potential use case for polynomial fitting within the framework.
      • ACTION: Urvashi to provide other use cases for fitting, and/or note deficiencies in current code implementation. (Statwt & uv continuum subtraction was given as an example)
    • Include variance within calculated values.
      • ACTION: Rob to revise list.
    • Include a provision for passing flags and weights as an option along with the data set. Treat flags as a weight of 0.
      • ACTION: Rob to revise and incorporate.

  • Implementation thoughts and considerations:
    • Should research available algorithms for new ways of calculating expensive statistics such as the median.
    • May be able to calculate some statistics in an incremental way to eliminate the resource constraints with passing around large data sets.
      • Lattices framework in casacore provides some of this functionality already.
    • Consider leveraging the STL algorithms module.
    • Consider leveraging the "R" package, and evaluate its API.
    • Location within the code base to be considered at a future meeting.

  • Any Other Business (AOB):
    • N/A

Items Deferred for a Future Meeting

  • Scope
    • New framework, built to spec (with some code reuse) vs. limited approach with a common location for statistics utilities, expanded on an ad hoc basis.
  • Requirements:
    • Inclusion of polynomial fitting, and other parametrized versions of a description of central tendency.
  • Implementation issues:
    • Location within the code base.
    • Consider leveraging the STL algorithms module.
    • Consider leveraging the "R" package, and evaluate its API.
    • Should research available algorithms for new ways of calculating expensive statistics such as the median.

Action Item / Issue List

Item # Date Opened Description Leads Status Status Notes
01 8/19/14 Revise / expand language on data selection / input. George Open  
02 8/19/14 Add a requirement for incremental feeding / data input. Provision of an accumulator. Jim Open  
03 8/19/14 Evaluate use cases for polynomial fitting within the statistics framework Dave Open  
04 8/19/14 Provide other use cases for fitting, and/or note deficiencies in current code. Urvashi Complete  
05 8/19/14 Add variance to calculated statistics Rob Complete  
06 8/19/14 Add flags and weights as optional inputs to requirements. Rob Complete  

-- RobSelina - 2014-08-18
Topic revision: r3 - 2014-08-19, RobSelina
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback