An IBM patent filing sheds light on the architecture of the upcoming BlueGene/Q “Sequoia” system, as well as a potential successor, which is could become the first 100 PFlop supercomputer: The system will have almost 8.4 million compute cores which will consume almost 16 MW.
IBM is well on its way to achieve the next milestone in supercomputing: BlueGene/Q is estimated to hit a peak performance of 20 Pflop/s, when it will go into operation as “Sequoia” supercomputer at the Lawrence Livermore National Laboratory in 2012. However, its architecture is now described in a patent that lifts the compute performance to 107 PFlop/s. This would be about 12 times the compute horsepower that is posted by K computer, a Japanese system that claimed the top spot in the Top500 ranking back in June with a peak performance of 8.8 PFlop/s. Five years ago, the industry-leading system was BlueGene/L, which stood at just 280.6 TFlop/s. If IBM’s calculations are correct, then this new BlueGene/Q-based system could be 381 times faster than BlueGene/L.
A massive patent filing (#20110219208) from January of this year with more 649 pages and 2263 individual claims and descriptions explains that the basic architecture of the system consists of 1024 compute node ASICS that are built into 512 racks (a total of 524,288 nodes and 8,388,608 cores.) Each compute node holds BlueGene/Q’s 4-way hardware-threaded quad-core PowerPC A2 CPU architecture that effectively creates a processing system with 16 cores for each node. IBM said that each unit, in fact, has 18 cores as 1 core is used to improve chip yield and 1 core is used for system control and 16 are available to actual computation. Each node includes 32 MB of memory, which is sliced in 16 equal parts to be accessed by each core. The total memory bandwidth per node is 563 GB/s. In comparison, the Sequoia system will have 1,572,864 cores, 98,304 compute nodes and 96 racks.
Each node or “cell” is a self-contained SoC processing system that integrates “a plurality (e.g., four or more) of processing elements each of which includes a central processing unit (CPU), a plurality of floating point processors, and a plurality of network interfaces.” There will also be 1 GB of 1.33 GHz DDR3 memory.
The nodes are “interconnected by links to form a [5-dimensional or 'hypercube'] torus network [with direct memory access or DMA], each processing node being connected by a plurality of links including links to all adjacent processing nodes; enable the computing system to be partitioned into multiple, logically separate computing systems.” As a result, BlueGene/Q can be split in several instances of supercomputers to work on multiple tasks simultaneously. BlueGene/L, by the way came with a 3-dimensional torus interconnect to auxiliary networks and I/O.
IBM claims that “novel packaging technologies are employed for the supercomputing system that enables unprecedented levels of scalability, permitting multiple networks and multiple processor configurations. […] “Smaller development, test and debug partitions may be generated that do not interfere with other partitions.”
According to IBM, each node will consume about 30 watts of power, which is pretty impressive for a complete 16-core system, but is substantial in an 8,388,608 core environment. The ASICs alone will consume more than 15.7 MW of power, not including network, storage as well as cooling requirements, which suggests that this system should come with its own power plant. Sequoia is estimated to draw about 6 MW of power.
Wolfgang Gruener in Products on September 09
No comments:
Post a Comment