ICRAR/NRAO Collaboration (April 13-14 2015)

NGAS Meeting Notes (April 15-16 2015)

Table of Contents

Agenda

Time Location Topic Lead/Presenter
08:30 317 NGAS: Current Status and Future Plans Wicenec
09:45 152 NGAS at the NRAO Benson/Robnett/Hatz
10:30 152 Replication Errors  
12:00   Working Lunch  
13:15 152 Operational Topics  
14:15 152 Bigger Issue Topics  
16:00 152 NGAS and the New NRAO Archive Hiriart
16:30 152 Plans for Thursday  

The first topic will be a presentation in AOC 317 that is open to developers and IT staff, afterwords we will meet in AOC 152 to discuss the subsequent topics.

List of Topics

NGAS: Current Status and Future Plans

  • Overview of NGAS
    • History of NGAS & What it Does
    • Future plans for NGAS
      • separation of plugins from core distribution is coming in ICRAR , repository of plugins
    • Outline of NGAS architecture
      • Standard python structure with buildout
    • Comparing ALMA's branch of NGAS to MWA's NGAS
      • Core is similar, but a bit out of date, plugins are different. ALMA/ESO has a few not used by ICRAR.

  • MWA's NGAS Setup
    • Overview of MWA's NGAS Setup
    • Size and scope of their current system (MWA)?
    • How do they (MWA) manage SW, HW, operations of day today tracking?
    • What does the data look like (lots of small files, fewer large files?)
    • What do they do with very large files (.5 TB)?

NGAS at the NRAO

  • Overview of Our NGAS Setup: NM to VA and SCO to VA
    • the SCO replication is mirror-based, not subscription based
  • Are we doing anything we shouldn't? What could we be doing better?
    • Don't treat it like a black box
    • Join the user's group/committee
    • Limit development to plugins if at all possible, don't touch the core
  • guidelines for what should be in a plugin
    • file type operations, not business logic

Replication Errors

  • Why do we have failures?
    • poor implementation of replication in our version, re-implemented in ICRAR's branch
    • possible database woes with old Oracle
  • How to detect/debug failures? Monitoring, alerts? Scripts we can use?
    • nagios alerts, check the ports on the client to make sure they are listening, check to see if they respond to a simple command
    • have all alerts and errors sent to a mailing list, investigate changing the logging settings
  • How to recover from failures?
    • clear out subscription list and re-subscribe, giving earlier start date

Operational Topics

  • What are the pitfalls of running NGAS in production?
  • What are the best practices for operations?
    • Start with a good deployment, design it to replace it in pieces
  • Operational tools: dashboard, monitoring?
    • nothing yet, some developments being worked on at ICRAR
  • How "hands on" is the support, how is the labor distributed?

Bigger Issue Topics

  • What are advantages (if any) of running one version for ALMA and VLA+?
    • Maybe we standardize on the ALMA/ESO core with ICRAR plugins, if they are compatible, or we try to persuade ALMA/ESO to look at the ICRAR version.
  • Are we running the right versions, advantages of using a newer version NGAS with other RDBMs? How well tested? Postgresql?
    • ICRAR's NGAS has been running postgresql, no reason we can't switch
    • switching basically involves creating the basic database structures on a new database, re-configuring the nodes to use it instead, and restarting them.
      • we need to test this soon
  • How do we track offline data sets?
    • Data can be marked as 'offline', but there are no other capabilities for tracking it.
  • Who 'owns' it?
    • plans are to open it up on github
  • Who else is developing it besides ICRAR?
    • Basically nobody

NGAS and the New NRAO Archive

  • New archive and ideal front end, should we do new archive differently?
    • Look into replacing datafetcher task with NGAS plugin
  • Other examples of front end interfaces?

Plans for Thursday

Action Items

Short Term: 1 to 2 Weeks
Status Description Assigned To
ALERT! find NRAO representatives for the NGAS User Group and join it Stephan
DONE create an 'ngas-alerts' mailing list Stephan
ALERT! configure the alert system on the clients and servers to report to the ngas-alerts mailing list John
DONE send us the DDL for postgres Andreas
DONE point us at a simple status command we can execute remotely to test whether a node is functional Andreas
ALERT! clean out subscription list, unsubscribe one of the CV clients, subscribe it again with 'epoch' start date John
ALERT! sanity check node configuration files, look at logging and alerts Andreas, John, Stephan
ALERT! schedule a follow-up meeting NRAO-side in a week or two, to see where we are Stephan
ALERT! Share existing documentation for EC2 processing, storage installation, and mappings Andreas
Medium Term: 1 month
Status Description Assigned To
ALERT! if the earlier subscription task worked, do the same for all the clients John
ALERT! see what we can do to adopt the newer subscription service in our older systems Andreas
ALERT! implement nagios alerts for all the clients that report to the mailing list KScott
ALERT! investigate replacing datafetcher with requesthandler Rafael
ALERT! Propose which NGAS version to be used for NRAO Archive, MWA vs. ALMA Stephan
ALERT! Turn on new subscription service, either directly or backport plugin, depending on NGAS version used Stephan
ALERT! Share profiling notes, strategies James & Andreas
Long Term: 2+ months
Status Description Assigned To
ALERT! oversee move of current NGAS database to newer server or RDBMs Stephan
ALERT! investigate having ALMA use ICRAR NGAS, work with Andreas on selling points Morgan
ALERT! Investigate turning on streaming CRC checks, health checks against CRC Stephan
ALERT! Investigate integrating JPEG2000 into CARTA Andreas
ALERT! Investigate whether GLEAM can be ported to use the standard CASA/pipeline parallel infrastructure Andreas
ALERT! Consider whether writing a joint ngCasaCore / Data Models paper would be a worthwhile exercise Jeff & Andreas
ALERT! Investigate ICRAR survey management in time for VLASS PDR (August) Stephan
ALERT! Work together to standardize as much as possible an Amazon EC2 image and user (PI) documentation by June 30 James & Andreas
ALERT! Consider a calibration pipeline and NGAS transfer infrastructure between NRAO and ICRAR for CHILES James & Andreas
ALERT! discuss change control/version control/dev ops versus software development Stephan, Morgan & David

Topic attachments
I Attachment Action Size Date WhoSorted ascending Comment
2015-04-13-ICRAR-Agenda.pdfpdf 2015-04-13-ICRAR-Agenda.pdf manage 59 K 2015-04-17 - 18:17 BrianGlendenning ICRAR Agenda
2015-04-14-ICRAR-meeting.pdfpdf 2015-04-14-ICRAR-meeting.pdf manage 2 MB 2015-04-17 - 18:21 BrianGlendenning Glendenning presentation on current & future NRAO situation
CASA-Parallelization.pdfpdf CASA-Parallelization.pdf manage 222 K 2015-05-01 - 18:00 JamesRobnett Robnett presentation on CASA benchmarking
NRAO_Computing_Plans.pdfpdf NRAO_Computing_Plans.pdf manage 52 K 2015-05-01 - 18:01 JamesRobnett Robnett presentation on near term computing infrastructure plans
2015-04-ICRAR.pdfpdf 2015-04-ICRAR.pdf manage 1 MB 2015-04-21 - 12:58 JeffKern CASA Presentation
AATPPIAndNGAS.pdfpdf AATPPIAndNGAS.pdf manage 315 K 2015-04-15 - 16:19 StephanWitz Rafael's AAT and NGAS presentation
NAASC_NGAS_2015-04.pdfpdf NAASC_NGAS_2015-04.pdf manage 44 K 2015-04-14 - 14:54 StephanWitz NGAS at the NAASC
NRAO2015_NGAS.pdfpdf NRAO2015_NGAS.pdf manage 14 MB 2015-04-15 - 11:23 StephanWitz Andreas' NGAS presentation
NgasCreateRole-PostgreSQL.sqlsql NgasCreateRole-PostgreSQL.sql manage 633 bytes 2015-04-16 - 10:54 StephanWitz DDL for role creation for Postgres
NgasCreateTables-PostgreSQL.sqlsql NgasCreateTables-PostgreSQL.sql manage 6 K 2015-04-16 - 10:54 StephanWitz DDL for table creation for Postgres
VLT-MAN-ESO-19400-2739-V3.pdfpdf VLT-MAN-ESO-19400-2739-V3.pdf manage 3 MB 2015-04-14 - 22:03 StephanWitz NGAS User's Manual, Circa 2004
Topic revision: r13 - 2015-05-01, JamesRobnett
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback