Chaosmint.com
Detailed Notes from Virginia Tech Supercomputer Presentation: Part 1
Sept 5, 2003
This month, Virginia Tech announced that they would be building one of the Top Supercomputers in the world from approximately 1,100 of
Apple's new Dual 2.0GHz PowerMac G5s. As expected, this
drew the interest of the Apple and Tech communities.
The following represents some extensive notes of a September 4th, 2003 information session on the project which was held at the Donaldson Brown Hotel and Conference
Center auditorium.
The detailed notes were taken by Myuuchan, a second year engineer at Virginia Tech and submitted by Cless. Thanks for the detailed notes!
Bracketed comments represent comments of the note-taker.
"Terascale Computing Facility"
Opening remarks:
An advisory committee has been appointed for governance of
use. [Yay.]
Slide One
Computational Science and Engineering Institute[?]
Goals:
- to build a world class facility
- to provide high performance network to tie in with computational grids
- connect supercomputers, visualizations, and data storage
Slide Two
Goals and Scope
- support research in computational science and engineering
- dual usage: production & experimental (apps)
- create beneficial collaboration [scribble scribble, I need to write things I can read]
Slide Three
TCF
- based on 64 bit architechture
- employs high bandwidth low latency communications fabric
- operational for production [apps?] in Fall 2003; fully operational by the end of the year
Slide Four
Choosing the Right Architechture
- cost vs. performance (purely)
- total cost $5.2 million includes system itself, memory, storage, and communication fabrics
- one of the cheapest systems of its kind
Slide Five
Architectural Options [or something like that]
- Dell - too expensive [one of the reasons for the project being so "hush hush" was that dell was exploring pricing options during
bidding]
- Sun (sparc) - required too many processors, also too expensive
- IBM/AMD (opteron) - required twice the number of processors and was twice the price in the desired configuration; had no chassis
available
- HP (itanium) - ditto
- Apple (IBM PPC970) - system available with chassis for lowest price
Slide Six
Nodes
- dual PPC970 2GHz
- Each node has:
- 4 GB RAM
- 160 GB serial ATA
- 176 TB total secondary storage
- 4 head nodes
- 1 management node
- most powerful "homebuilt" supercomputer in the world
Slide Seven
Reliability
- commodity clusters have issues due to the large number of units
- VT developed transparent fault tolerance system called "Deja Vu"
- collaborated with PSC
- can recover from just about every failure, i.e. someone hits the wrong switch, OS crashes, things fail in general, power loss,
etc
- This system has been ported to the G5 and will be deployed in the TCF
Slide Eight
Primary C [omputer??] Architechture
- working with Mellanox for infiniband solutions
- the system is [obviously] based on infiniband technology
- full switch network 20 Gbps, full duplex
- 24 96 port switches in "fat-tree topology"
Slide Nine
Secondary Com[????munications? ...puter?]
- Gigabit fast ethernet management backplane
Slide Ten
National Lambda Rail (nationwide optical network)
- all networking equipment [at least for this locale] is CISCO
- the following organizations are involved with NLR:
- CENIC
- CISCO
- Duke
- Florida Consortium
- Georgia Tech
- Internet2
- MATP
- PNWGP [pacific northwest group]
- Texas [I imagine a university, not the whole state. "Yes, Texas backs National Lambda Rail, yessir."]
- additional player: PSC [yeah I don't know either]
- VT leads Washington DC point of presence
- DC node goes active in the first half of 2004
Slide Eleven [maybe]
Software
- Mac OSX
- Why not linux? Not enough support.
- Mellanox does Inifiniband drivers and HCA
- MPI (parallel communications libraries)
- Argonne National labs to get MPI-2 for the system
- C, C++ compilers - IBM xlc and gcc 3.3
- Fortran 95/90/77 Compilers - IBM xlf and NAGWare
Slide Twelve [I should give up soon]
Sustainability Model
Organizations that could make use of the facility or have already expressed interest:
- Federal organizations
- NSF, CyberinfrastructureProgram
- NIH, DOE, DARPA, DoD, AFOSR[??], ONR [?????]
- Industry (the system can attract industrial interest)
- External Research Partners
- National Labs, Supercomputer Centers, NASA, NIA
Slide Thirteen [I never learn]
Access
- internal access not based soley on research funding contributed
- priorities might be established based on contribution at a later time
- provide easy access for investigators [I missed the end of this line]
- external access determined on a cost recovery basis
Slide Fourteen
Future
- Computational Science and Engineering is a long-term project
- Current facility will be followed with a second in 2006
Slide Fifteen
Timeline
- Oct. 1st - preliminary operations
- Oct. 1st - Mid Nov. - performance optimization and benchmarking
- Mid Nov. - available for initial apps ("hero-users" [heh, i.e. the poor suckers who test out the initiall config])
- available to any user with operating MPI coverage [huh?]
- Jan 2004 - fully operational
...Continued: Detailed Notes from Virginia Tech
Supercomputer Presentation:
Part
2
Chaosmint.com | MacRumors.com | Mac-P2P.com