Nvidia's graphics brawn powers supercomputing brains

The company's graphics chips are finding a foothold in neural networks, a biology-inspired form of computing that is moving from research to commercial tasks like Google's photo recognition.

Nvidia, trying to move its graphics chips into the supercomputing market, has found a niche helping engineers build brain-like systems called neural networks.

For years, the company has advocated the idea of offloading processing tasks from general-purposes central processing units (CPUs) to its own graphics processing units (GPUs). That approach has won over some researchers and companies involved with neural networks, which reproduce some of the electrical behavior of real-world nerve cells inside a computer.

Neurons in the real world work by sending electrical signals around the brain, but much of the actual functioning of the brain remains a mystery. Neural networks in computers, somewhat perversely, emulate this mysteriousness. Instead of running explicit programming instructions to perform a particular job, they're "trained" by handling source data that creates communication patterns among many nodes in the neural network. The trained neural network then can be used to recognize patterns -- or cat pictures like one Google research example that's now commercialized as part of Google+ photos.

One Nvidia customer is Nuance, which uses neural networks to develop speech recognition systems that ultimately end up in places like cars or tech support phone lines. "We have been working with GPUs for over four years, but the recent models -- specifically the 'Kepler' line from Nvidia are providing the most substantial benefits," said Nuance's Chief Technology Officer Vlad Sejnoha in a statement. "We use a large-scale computing grid composed of a mixture of CPUs and GPUs, and are achieving an order of magnitude speedup over pure CPU-based baselines."

Neural network experts at Stanford University -- including Andrew Ng, who's worked on neural networks at Google -- have been working on marrying GPUs to neural networks. In a paper (PDF) for the International Conference on Machine Learning, they describe their work to get around thorny issues of getting the right data to the right GPU.

"Attempting to build large clusters of GPUs is difficult due to communications bottlenecks," they wrote in the paper, but the researchers' approach "might reasonably be packaged into optimized software libraries" to help others with the problem.

High-performance computing is in the news with the International Supercomputing Conference in Leipzig, Germany, this week.

GPUs are particularly well suited to doing large numbers of calculations that can take place in parallel. CPUs such as Intel's Core line are generally designed for tasks that run sequentially instead of being split into independent chunks, though multicore models of the last decade are increasingly parallel.

Still, general-purpose CPUs are not as parallel as GPUs, and Nvidia has made inroads into the Top500 list of the fastest supercomputers, with GPUs giving 39 machines a processing boost.

Intel isn't standing idly by while GPUs gain ground. It's got its own accelerator technology, the Xeon Phi co-processors, which plug into servers' PCI Express expansion slots. The current fastest supercomputer, the Tianhe-2, gets a boost from the Phi chips.

On Monday, Intel detailed its second-generation design, code-named Knights Landing, which will be built on the next-generation 14nm manufacturing process that enables more circuitry to be crammed onto a given sized chip. Two significant changes: the Knights Bridge co-processors will be able to plug straight into a regular CPU socket, which should make programming easier, and they'll have a built-in memory controller for faster data transfer.

In addition, Intel introduced new products based on the current-generation Phi chip, including the 7100 that doubles available memory to 16GB, the less expensive 3100, and the 5120D that can be mounted in a chip socket for high-density computing designs that lack room for a plug-in card.

Intel argues its approach is easier to program. Nvidia's GPU programming technology, though, called CUDA, has been maturing for years.

At the supercomputing conference, Nvidia announced CUDA 5.5, which lets programmers use GPU-based programs on machines with ARM-based CPUs, not just x86-based CPUs such as those from Intel. ARM processors are selling like hotcakes for use in smartphones and tablets because of their energy-efficient designs, and now they've begun expanding into some corners of the laptop and server market, too.

Nvidia itself has licensed the ARM design for its own Tegra processors, so it's got a vested interest beyond just graphics chips in seeing CUDA and ARM succeed.

CUDA on ARM systems isn't just for supercomputers, though. Roomba is using the technology for image recognition to help robots navigate better, and it can be used to help things like videogame physics engines on mobile devices to give realistic physical behavior to things like blowing curtains or splashing water. In both scenarios, saving battery life is important, said Roy Kim, marketing manager for Nvidia's Tesla group.

"If the workload is right, you're going to get about ten times the performance per watt over a CPU implementation," he said.