Bookmark and Share Printer-friendly version Email to a Friend

Voices: The future of computers: Power wall

( 01 Feb 2012 )
Russell Fish III, EDN

The quest for speed is not new.

"Captain," the old lady said excitedly, "I will give you some of my lard to use if you will beat that riverboat in the race."

"Madame," said the Captain, "you have a deal."

The Captain had the lard put on the logs in the boiler, and the old lady watched in excitement as the fire in the boiler grew hotter, the great wheels turned faster, the riverboat quivered and shook. The boats drew even, but they could not pass their rival.


Other than boiler explosions and extensive loss of life, steamboat racing was not that different from a CPU over-clocking convention. More speed, more heat ... until something breaks. Both competitions really concern money. Faster boats and faster chips are worth more.

It's all about energy
Energy is the ability to do work, and it is power exerted over a period of time. In the case of computers there is “useful” and “wasted” power. Useful power does work when the computer executes instructions or performs other operations. It is usually called "dynamic power" because it varies with the speed of computer operation. Dynamic power constantly stores and removes electrical charges from the billions of capacitors found in a current microprocessor. If the microprocessor quits clocking, dynamic power consumption stops.

Wasted power is consumed whenever a computer is powered on, even if the clock is stopped. For this reason it is usually called "static power". In integrated circuits, static power is consumed by the billions of resistors inherent in the leaky transistors that constitute a circuit. Even if a microprocessor is dead stopped, the static power consumption is draining energy from its power supply or battery.

Microprocessor power consumption is doubly bad for users thanks to the First Law of Thermodynamics. In essence, energy is never destroyed, it just changes form. In the computer world this means the hundred watts being consumed by the latest multicore Xeon changes form to 100W of heat.

Why we should care about power consumption
One obvious reason we should care is that power is neither free nor infinite. On the small scale, Apple's A4 powers the IPad. Intel's Atom powers the HP Slate. One operates for 10hr, and the other for five. One is the overwhelming market leader and the other is ... not.

On the larger scale, supercomputers of today are the video games of tomorrow. Study of their problems can give insight into future problems of computers of all sizes.

Supercomputer centers and cloud server farms purchase energy around 10 cents a kilowatt hour. The power consumption average of the Top 10 supercomputers is 4.3MW. If those machines run at full capacity, the power budget for a single year is $3.7M. In addition, the 4.3MW of heat generated must be disposed of in some fashion, usually by re-circulating refrigerated water through the chips.

The Swiss National Computer Center pumps water from Lake Lugano at 45m depth at a temperature of 6 degrees Celsius. The pump is connected by an 80cm pipe to two 13 ton, 6m high suction baskets. Three pumps push nearly 7,000 gallons per minute to the computer center.

Why we should care about heat
Heat is the enemy of engines and electronics. In engines heat compromises the mechanical characteristics of materials causing reduced reliability and eventual failure. In electronics, heat alters both the mechanical and electrical characteristics of semiconductor circuits, circuit packaging, and electrical wiring.

In 2008 Charlie Demerjian analyzed overheating of NVIDIA GPUs. In short, as the multicore GPU chips got really hot, the electrical connections between the die and the package substrate separated. As much as 40 percent of some products experienced early failure.

Early in its life, Microsoft's Xbox 360 faced a similar problem users referred to as "The Ring of Death". Infant mortality of early systems was reported to be between 23.7 and 54.2 percent. Some analysts claimed the failures resulted from the multicore IBM Cell derivative causing solder connections to melt and separate from the circuit board, much like NVIDIA's problem.

Temperature also affects performance in the following non-catastrophic ways:
- Increasing temperature decreases transistor speed.
- Increasing temperature increases transistor leakage.

Imagine running across a basketball court unimpeded. Now, imagine running across the same court while being pummeled by basketballs hurled from all directions. This is what happens to electrons that carry charge when a transistor turns on. The greater the temperature, the faster the basketballs come. You must work harder and harder to cross the court. So must the electrons carrying the logical information.

The increasing leakage with temperature causes circuits that are overheating to consume more power. As they consume more power, the die temperature increases further. As the temperature increases, so does the leakage, and so on, until either the thermal protection circuit kicks in or the chip destroys itself.

Up against the wall, just how bad is it?
The smart guys say it's bad.

From the National Research Council, "The Future of Computing Performance, Game Over or Next Level?":
"Even as multicore hardware systems are tailored to support software that can exploit multiple computation units, thermal constraints will continue to be a primary concern."

"....fundamental power and energy constraints mean that even the best efforts might not yield a complete solution."

"...there appears to be little opportunity to significantly increase performance by improving the internal structure of existing sequential processors."

"Even when new parallel models and solutions are found, most future computing systems' performance will be limited by power....."

"...it is an open question whether power and energy will be showstoppers.........."


From the DARPA ExaScale Computing Study:10
"....there are four major challenges..."

"The Energy and Power Challenge is the most pervasive of the four....."

"........microprocessors of the future will not only NOT run faster than today, they will actually decline in clock rate."


Study Chairman Peter Kogge, lent his cheery view:
"The party isn't exactly over, but the police have arrived, and the music has been turned way down."

The uniform conventional wisdom of the experts seems to be, "All is woe".

Another brick in the wall
To understand this dystopian view of computer future we can examine various attempts at mitigating power consumption and the limits of those attempts.

The most important efforts to reduce power consumption have come from semiconductor process improvements. Cheap reliable MOS (metal oxide semiconductor) transistors enabled invention of the monolithic microprocessor.

We can thank a handful of men for the gift of MOS transistors and the technology to manufacture them. An incomplete list includes:

Jean Hoerni – inventor of the "planar process"
Bob Noyce – inventor of the "silicon integrated circuit"
Kerwin, Klein, and Sarace - inventors of the "self aligned silicon gate"
Andy Grove and others – solvers of heavy metal contamination

Most contemporary engineers only know of Grove as Intel's management guru and author of "High Output Management".18 A small group that goes back 40 years, knows he literally wrote the book on semiconductor device physics.

An even smaller group knows he was a major intellect in solving the "game over" technical problem that could have prevented the microprocessor from ever being invented.

In the late 1960's MOS integrated circuits could be manufactured fairly easily. However, in many cases after a few weeks or months, the devices would cease to function. Some mechanism was causing the transistors to leak more current over time, and at some point, the transistors would fail as shorts. This low reliability made the devices commercially unsuitable.

Grove figured out that heavy metal ions were contaminating the circuits and these ions were migrating over time in a way to cause the failure. This lead to the introduction of "gettering", a process step that removed the heavy metals causing the contamination, and a step that is used to this day.

"Moore's Law" (or more accurately Moore's Trend) describes the tendency for the number of transistors that can economically be place on an integrated circuit to double every two years.

The technological insight that makes Moore's Law work is called "Dennard Scaling," after IBM scientist Robert Dennard. In his 1974 paper published in the IEEE Journal of Solid State Circuits, Dennard postulated:

MOSFETs continue to function as voltage-controlled switches while all key figures of merit such as layout density, operating speed, and energy efficiency improve provided geometric dimensions, voltages, and doping concentrations are consistently scaled to maintain the same electric field.

To the layman this means that as you make transistors smaller, they get better. Furthermore, as the power supply voltage decreases, power consumption decreases by the square of the voltage. This is the reason that Intel spends billions to be the first company to produce commercial volumes of a smaller process. Dennard Scaling predicts they will always have the fastest parts, the lowest power, and the lowest costs.



For 40 years the microprocessor business has lived off Dennard, but now transistor dimensions are approaching the atomic level and scaling is reaching its limits.

The gate is the electrical connection that controls the MOS switch. The gate is separated from the rest of the MOS transistor by an insulating layer. As this layer gets thinner, the transistor performance improves. However, at a certain point, the gate is so thin that it leaks electrons.

Silicon dioxide was the insulator of choice for three decades, but as gate leakage became an increasing problem, it was replaced by other materials that were less likely to leak. These materials were known as high-k (for high dielectric constant).

As power supply voltages were reduced, the voltage which caused the transistor to turn on was necessarily reduced. Currently the threshold voltage is as low as 0.3V. The closer that voltage gets to zero, the more difficult it becomes to turn the transistor completely off.

In fact, the transistors are so leaky in a current multicore microprocessor that it consumes 50W of power while standing still.

Reducing the supply voltage causes additional problems by increasing the current for the same power level. For an oversimplified example, assume a 100W multicore microprocessor running on 1V. The power pins must provide 100 amps of current. For this reason, power pins constitute 70 percent of the pins on some packages.

In addition, these huge currents cause voltage droop across the internal power buses. If the droop is great enough, the circuits it connects will cease operation.

Circuit tricks
Computer architects have some tricks that further reduce power beyond that achievable by process improvements alone.

Some microprocessors adjust the internal voltage of different sections of the logic depending upon what operation is being performed. This is known as "Dynamic Voltage Scaling". By running certain sections at a lower voltage than the rest of the chip, power in that section can be reduced.

When taken to the limit, entire sections of a chip can be powered down when they are not needed. This trick is less helpful that it used to be since many microprocessors have 2/3 of their area filled with memory caches. Caches cannot be powered down without losing their contents.

"Dynamic Frequency Scaling" similarly reduces the clock speed of some circuits under certain conditions. Dynamic power reduces linearly with reductions in clock speed. At state of the art process nodes, static power is greater than dynamic power for many microprocessors. As a result, frequency scaling is less helpful than in the past.

…And the wall came tumbling down
There is another way to attack the Power Wall. Venray designs CPU cores to be built on commodity DRAMs.

When Venray engineers designed TOMI Aurora (4-core 64M) and TOMI Borealis (8-core 1 GB), they started with several advantages over legacy approaches:
1. DRAM processes were inherently low leakage, because if the transistors leaked, the memory would forget.
2. The capacitance of buses between cores and memory was really small.
3. DRAM processes produce about the cheapest transistors in existence.

They also had several disadvantages:
1. DRAM transistors were about 20percent slower as the same logic process node.
2. DRAMs were mostly analog devices and sensitive to high current spikes.
3. DRAM processes usually had 3 layers of metal for connection compared to 10 or 12 layers in microprocessor processes.

The primary technique used to save power was to invent a really simple computer architecture that was both efficient on "Big Data" benchmarks and parsimonious in transistor count. The resulting Borealis core was 22K transistors. This was easily routable with three metal layers. The caches added an additional 393K per core.

Power was reduced even further by making extensive use of differential signaling wherever possible. DRAMs already did much of their work in differential, so this was fairly straightforward. The result was a 2.1GHz, 98mW 32-bit CPU core.

The future
For nearly 100 years, steamboat designers incrementally improved engine designs, boiler construction, materials, and fuels. Captains plying the Mississippi cargo trade pushed the technology envelope of heat and speed, eventually reaching the unbelievable rate of 9mph.

Then trains were invented and put the boats out of business. The same might happen to makers of today's legacy CPUs.


Related articles
The future of computers: Multicore and the Memory Wall
The Future of Computing Performance, Game Over or Next Level?


About the author
Russell Fish III's three-decade career dates from the birth of the microprocessor. One or more of his designs are licensed into most computers, cell phones, and video games manufactured today. Russell and Chuck Moore created the Sh-Boom Processor which was included in the IEEE's "25 Microchips That Shook The World". He has a BSEE from Georgia Tech and an MSEE from Arizona State.


References
http://americanfolklore.net/folklore/2010/10/riverboat_racing.html

http://en.wikipedia.org/wiki/First_law_of_thermodynamics. Sec 5.2 Static power is 40% of total power http://users.eecs.northwestern.edu/~rjoseph/publications/cmp-adapt.pdf

Comparing iPad and Slate: http://www.engadget.com/2010/04/05/hp-slate-to-cost-549-have-1-6ghz-atom-z530-5-hour-battery

Kogge: http://spectrum.ieee.org/computing/hardware/nextgeneration-supercomputers/0

Power Consumption of Top 10: http://www.top500.org/lists/2011/06/press-release

Supercomputer cooling: http://hpc-ch.org/blog/category/cscs/

NVIDIA failure: http://www.theinquirer.net/inquirer/news/1004378/why-nvidia-chips-defective

Microsoft Ring of Death: http://en.wikipedia.org/wiki/Xbox_360_technical_problems

http://sites.nationalacademies.org/CSTB/CurrentProjects/CSTB_042221

DARPA report: http://www.er.doe.gov/ascr/Research/CS/DARPA%20exascale%20-%20hardware%20(2008).pdf

http://insidehpc.com/2011/01/28/is-the-party-over-for-exascale-ambitions

"The Flood of Mighty Waters": http://books.google.com/books?id=3UcaAQAAIAAJ&pg=RA1-PA191&lpg=RA1-PA191&dq=%22all+is+woe%22&source=bl&ots=s7f6NRk0TB&sig=CcwPUZE9pEDn1a7pAK6A3cz7nrI&hl=en

http://en.wikipedia.org/wiki/Another_Brick_in_the_Wall

http://en.wikipedia.org/wiki/Planar_process

http://en.wikipedia.org/wiki/Integrated_circuit

http://en.wikipedia.org/wiki/Self-aligned_gate

http://scitation.aip.org/getpdf/servlet/GetPDFServlet?filetype=pdf&id=APCPCS000683000001000003000001&idtype=cvips&prog=normal&bypassSSO=1

http://www.amazon.com/High-Output-Management-Andrew-Grove/dp/0679762884

http://www.amazon.com/Physics-Technology-Semiconductor-Devices-international/dp/0471329983

Gettering: http://scitation.aip.org/getpdf/servlet/GetPDFServlet?filetype=pdf&id=APCPCS000683000001000003000001&idtype=cvips&prog=normal&bypassSSO=1

http://en.wikipedia.org/wiki/Moore's_law

Dennard: http://www.ieee.org/portal/cms_docs_societies/sscs/PrintEditions/200701.pdf

http://en.wikipedia.org/wiki/High-k_dielectric

p.6 Sec. 4.1 http://207530779760502934-a-1802744773732722657-s-sites.googlegroups.com/site/mingchenhomepage/published-papers/ics11.pdf

Dynamic Voltage Scaling: http://en.wikipedia.org/wiki/Dynamic_voltage_scaling

Dynamic Frequency Scaling: http://en.wikipedia.org/wiki/Dynamic_frequency_scaling

http://en.wikipedia.org/wiki/Joshua_Fit_the_Battle_of_Jericho

http://en.wikipedia.org/wiki/Big_data

http://www.edn.com/article/520499-Future_of_computers_Part_2_The_Power_Wall.php






 
Printer-friendly version Email to a Friend
 
Article Rating 
Average Rate: No rating yet
 
Poor Quite Good Good Very Good Excellent
 
 
ADVERTISEMENT
 
Related Content 
 
 
ON-DEMAND WEBCASTS


 
 
Highest Rated  
Feedback Loop  

ADS BY GOOGLE 
 
 
 
ADVERTISEMENT
Press Release 
 
TECHNOLOGY NEWS
 
 
 
PRODUCT NEWS
 
FEATURED SPONSORS
 
 
 
DESIGN CENTERS
 
ADVERTISEMENT
     
Reference Designs 
   
     
 
 
 
 

 

RSS
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

POLL
What type of environmental regulation do you think will be most beneficial for the tech industry?
Proper recycling and disposal
Push for power efficiency and energy conservation
Chemical/lead regulation
View results


 
     
 
Power Technology E-newsletter 
Power.org Releases Power Architecture 32-bit Application Binary Interface Supplement
EDNA, May 11
POL Regulators Designed for Energy-efficient Computing
EDNA, March 11
Fairchild Revolutionizes Power Savings
EDNA, January 11
Lattice Transforms Board Power and Digital Management
EDNA, November 10
 
Analog E-newsletter 
12V Dual-channel Synchronous Buck Converter Features Integrated FETs
EDNA, February 10
Power MOSFETs features reduced top-side thermal impedanc
EDNA, January 10
 
     
 
KNOWLEDGE CENTER
 
Texas Instruments: DaVinci™ Technology
 
Texas Instruments: Safe Bet Series
 
 
INDUSTRY LINKS
 
Photonics Association (Singapore)
Singapore Industrial Automation Association (SIAA)
Taiwan Semiconductor Industry Association (TSIA)
 
 
OUR SPONSORS
 






Keithley Instruments
With more than 60 years of measurement expertise, Keithley Instruments has become a world leader in advanced electrical test instruments and systems from DC to RF (radio frequency). Our products solve emerging measurement needs in production testing, process monitoring, product development, and research...
 
 
 
     
 

EDN India | EDN Taiwan | EDN Korea | EDN Japan | EDN China | EDN | EDN Europe

 
ABOUT EDN Asia | CONTACT US
   
© 2012 EDN Asia All rights reserved.