You are here: NRAO Public Wiki>Software Web>TWikiUsers>MikeMcCarty>AstroComputing2010 (2010-12-17, MikeMcCarty) Edit wiki text Edit Attach Print version

The Future of AstroComputing

Dec. 16 & 17 San Diego Supercomputing Center

Background

The goal of this conference is to clarify the big issues for the next ~5 years in astrophysical computation and data, and to bring leaders in the field and at the main funding agencies and industrial organizations to meet with key computational astrophysicists, especially from the University of California and other West Coast institutions including Stanford, Caltech, and the University of Washington.

Website: http://hipacc.ucsc.edu/FOA2010.html - PDFs of talks included

December 16

Opening Remarks (Joel Primack): hipacc.ucsc.edu
- New Center for HPC AstroComputing at UC.
- Annual Conferences (this is the second one this year). Two more next year!
- Summer Schools - Next year hosted by UC Berkeley/LBNL/NERSC
- Two opportunities per year for funding for focus groups
- Proposals: NSF Physic Frontier, NSF Teir 3, looking for more funding opportunities

Ted Kisner (LBNL) * Astrophysical Data Processing on Heterogeneous Many-core Systems - Observational Data Analysis!
- Data processing calculation in pipelines. Calculation categorizations. Image operations, timestream operations, spherical geometry
- Why many-core systems? (see Kathy Yelick's talk) - power is a finite resource (flops/watt), traditional cpu less useful
- Many-core systems today - Constrained by data movement across multi-level memory hierarchy
- Goal: Scale Relevant Calculations to new systems. Libraries aren't cross-platform (cuda, opencl, intel). Many are proprietary and lock you into a platform.
- One Path Forward - Middle Layer Tools. Use opencl for cross-platform support. build a collection of kernels for common data processing operations. Tune based on detected hw properties + simple parameter space search. Provide a high-level interface to access libraries. This guy is working on this now.
- Gave example of Pixelization of Detector Pointing - Currently using HEALPix software library. Instead re-implement in opencl. Demonstrated cross-platform kernels with order of magnitude performance increase using home grown opencl kernel running on GPUs.
- Interested in collaboration. I asked about real-time vs off-line data processing. Showed interest.

Robert Fisher (University of Massachusetts Dartmouth) Barriers to Computing at Scale : Hardware, Algorithms, and Modeling
- Turbulence Flow Modeling
- IBM Bias

Michael Schneider (LLNL) Fast generation of ensembles of cosmological N-body simulations via mode resampling
- Takahashi et al (2009) Using n-body simulations to estimate the matter power spectrum covariance for observations. Not successful PT or HM not accurate enough parameter estimation.
- Tried a different approach that was more successful. Interesting, but its seems awful specific to the science.

Alex Szalay (JHU) * How Large Simulations and Databases can Play Nicely with One Another
- Data Access is hitting a wall. Xfering all the data is not feasible. Need better metadata. Databases can help.
- Scientific Data analysis today. Mostly done offline. Similar to NRAO. Scientist hitting the "data wall". Universities are hitting the "power wall".
- Managing Continuing Growth. Software becoming a new kind of instrument.
- Cosmological Simulation data 30 TB -> 500 TB
- Analysis and Databases. Much of the statistical analysis traditionally performed on files can be better performed in a database.
- Provides motivations for a relational database. Think about the data structures. speeds up path to science answer.
- Provided case study. Building tree structure in the database design for galaxy interactions.
- Visualizing Large Simulations. Remote vis. Easier to send HD 3D video stream than the data. Vis is becoming IO limited.
- Real Time Interactions with TB. Aquarius simulations. Real-time interactive on a single GeForce 9800. hierarchical merging of particles over an octree.
- Need smarter more integrated databases.
- This study use MSSQL in collaboration with Microsoft, but no reason why this wouldn't work with some else, Postgres.

Istvan Szapudi (IfA, Honolulu) Algorithms for Higher Order Statistics
- Computational resource grow exponentially and data grow at the same rate. The algorithm must scale to keep up (n log n).
- Recent Challenges. Disk growing in size, but not as fast in speed.

Tony Tyson (UCD)* LSST: Petascale Opportunities and Challenges
- LSST produces 13TB per night. 200 PB total
- LSST science drivers.
- Data will be xfered daily from Chille!
- Data processing is embarrassingly parallel. Considered uninteresting.
- Classification of events is a challenge.
- Generating LSST data now for data management pipeline development work. This data is available for people to "play with".
- Lots of new combinations of analyics to be done with the data
- Discovering the unexpected. How to do that when the data is too large to look at? Only look at a representative part. Assisted machine learning process. Look for things we know then find outliers. Data dimension reduction problem.
- Explore the data by having people look at the metadata space with machines flagging. Machines do data qualitity by looking for outliers.
- Use databases for indexing.
- SDSC director offered to use Gorgon (new SDSC machine) for algorithm research for LSST data analysis.
- DOE has developed sophisticated data exploration algorithms. LSST is trying to form collaborations with them.

Matthew Turk (UCSD) Open Source Astrophysics and Analysis
- Reproducibility and collaboration.
- Use open source methodologies in science.
- Case study on Enzo (astro sim. code). Uses Hg.
- Showed off yt (yt.enzotools.org) for doing viz. (Python, Cython, C)

Julian Borrill (LBNL & UCB)* Cosmic Microwave Background Data Analysis at The Petascale And Beyond
- Evolution of CMD data sets. Getting larger like everything else.
- Simulation and map-making. Generate Monte Carlo realizations of the data. Challenge is achievable scaling on algorithms and implementations.
- Optimization. CMB datasets imply strong scaling. IO and communication performance grow while the data does not.
- Described an algorithm MADmap
- On-the-fly focal plane pointing for map making.
- Communication bottleneck. processes produce sub-maps which must be combined across all processes. MPI_Allreduce does not scale well. optimized by rolling their own reducer.
- Post-Planck, they're more interested in weak scaling
- There is no solution, only a process...

Peter Nugent (LBNL)* Wide Field Surveys and Real-Time Analysis*
- Searching for transients.
- Using Palomar Oschin Schmidt telescope. Analysis at NERSC
- Wide variety of science on many telescopes.
- Pipeline: 50 GB / night 128 MB / 90s data shipped via network transfer to NERSC for "Real-Time Analysis"
- Machine learning classification
- Survey has some EVLA data.
- Exploring Aster Database. Allow one to embed machine learned algorithms.

Kathy Yelick (NERSC & LBLL) Keynote: How HPC architecture and software are evolving toward exascale
- Exascale is about energy efficient computing. Usual scaling predicts 200 MW for a 1 Exaflop machine by 2018. The goal is 20 MW.
- Challenges to exascale. algorithms need to change.
- More tiny cores is more power efficient than fewer larger cores.
- Communication-Avoiding Algorithms. consider sparse iterative methods. Can we lower data movement costs?
- DOE is establishing collaborative computing environments which brings together hardware engineers, algorithm developers, software developers, and scientists to rewrite flagship codes and then focus on other codes.
- Similar paradigm shift as the move from vector programs.
- Seeing more interests in Partitioned Global Address Space (PGAS) languages, such as Chapel.

Dec. 17

Risa Wechsler (Stanford University) Simulation Challenges for Next Generation Galaxy Surveys
- Data is harder than flops.

More simulation talks....

Robert Harkness (UCSD/SDSC) Petascale Cosmology using ENZO
- Robert is Kraken's biggest user running codes at 93,750 cores in decitcated mode.
- Talk consisted of mostly simulations, but the one thing that was interesting was that he said his weapon of choice going forward with Blue Waters (NCSA's expected 20 PetaFlop machine) will be PGAS in Unified Parallel C (UPC).

Michael Norman (SDSC) The Future of Enzo*
- SDSC Resources. ESNet, NSF TG, CENIC, Internet2, starTap networks. Hosts dozens of "data resources"
- New project called, Data Oasis. 2PB (2011) -> 6 PB (2013) Side-wide file system
- SDSC has a Data-Intensive Strategy.
- Systems: Trestles (100 TFlop, 20 TB RAM, 150 TB Disk, compute cluster on TG), Dash (5 TFlop, 4 TB of shared memory, prototype to Gordon), Gordon
- Gordon: First Data-Intensive HPC system in production fall 2011. (250 TFlops, 64 TB Ram, 256 TB SSD, 4 PB Disk) Virtual shared memory supernodes using the SSDs.
- New director of SDSC, Michael Norman, is an astrophysicists. Naturally wants SDSC to support his science more.
- Cello - The Next Gen. of Enzo will use an object-oriented design.

Trilinos - A C++ library for doing scientific and engineering calculations on large-scale datasets.

-- MikeMcCarty - 2010-12-16

Topic revision: r9 - 2010-12-17, MikeMcCarty

Software

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback