The addition of multiple cores to microprocessors has created a significant opportunity for parallel programming, but a killer application is needed to push the concept into the mainstream, researchers said during a panel discussion at the Hot Chips conference.
Most software today is still being written for sequential execution and programming models need to change to take advantage of faster hardware and an increasing number of cores on chips, panelists said. Programmers need to write code in a way that enables tasks to be divided up and executed simultaneously across multiple cores and threads.
A lot of focus and money have gone into building fast machines and better programming languages, said David Patterson, a computer science professor at the University of California, Berkeley, at the conference in Stanford on Monday. Comparatively little attention has been paid to writing desktop programs in parallel, but applications such as gaming and music could change that. Users of such programs demand the best real-time performance, so programmers may have to adopt models that break up tasks over multiple threads and cores.
For example, novel forms of parallelism could bring improve the quality of music played back on PCs and smartphones, Patterson said. Code that does a better job of separating channels and instruments could ultimately generate sound through parallel interaction.
The University of California, Berkeley, has a parallel computing lab where researchers are trying to understand how applications are used, which could help optimize code for handheld devices. One project aims to bring desktop-quality browsing to handheld devices by optimizing code based on specific tasks like rendering and parsing of pages. Another project involves optimizing code for faster retrieval of health information. The lab is funded primarily by Intel and Microsoft.
Berkeley researchers are trying to bring in parallelism by replacing bits of code originally written using scripting languages like Python and Ruby on Rails with new low-level C code. The new code specifically focuses on particular tasks like analyzing a specific voice pattern in a speech recognition application, Patterson said in an interview Wednesday. The code is written using OpenMP or MPI, application programming interfaces designed to write machine-level parallel applications.
Experts are need to write this highly specialized parallel code, Patterson said. It reduces development time for programmers who would otherwise use Python and Ruby on Rails, which make application development easier, but do not focus on parallelism, Patterson said in the interview. The lab has shown specific task execution jump by a factor of 20 with the low-level machine code.
The concept of parallelism is not new, and has been mostly the domain of high-performance computing. Low levels of parallelism have always been possible, but programmers have faced a daunting task with a lack of software tools and ever-changing hardware environments.
“Threads have to synchronize correctly,” said Christos Kozyrakis, a professor of electrical engineering and computer science at Stanford University, during a presentation prior to the panel discussion. Code needs to be written in a form that behaves predictably and scales as more cores become available.
Compilers also need to be made smarter and be perceptive enough to break up threads on time so outputs are received in a correct sequence, Kozyrakis said. Faulty attempts to build parallelism into code could create buggy software if specific calculations are not executed in a certain order. That is a problem commonly referred to as race conditions. Coders may also need to learn how to use multiple programming tools to achieve finer levels of parallelism, panelists said.
“There’s no lazy-boy approach to programming,” Patterson said at the conference.
Memory and network latency have created bottlenecks in data throughput, which could negate the performance achieved by parallel task execution. There are also different programming tools for different architectures, which make it difficult to take advantage of all the hardware available.
Many parallelism tools available today are designed to harness the parallel processing capabilities of CPUs and graphics processing units to improve system performance. Apple, Intel, Nvidia and Advanced Micro Devices are among the companies promoting OpenCL, a parallel programming environment that will be supported in Apple’s upcoming Mac OS X 10.6 operating system, also called Snow Leopard, which is due for release Friday. OpenCL competes with Microsoft, which is promoting its proprietary DirectX parallel programming tools, and Nvidia, which offers the CUDA framework.
OpenCL includes a C-like programming language with APIs (application programming interfaces) to manage distribution of kernels across hardware such as processor cores and other resources. OpenCL could help Mac OS decode video faster by distributing pixel processing across multiple CPU and graphics processing units in a system.
All the existing tools are geared toward different software environments and take advantage of different resources, Patterson said. OpenCL, for example, is geared more toward execution of tasks on GPUs. Proprietary models like DirectX are hard to deploy across heterogeneous computing environments, while some models like OpenCL adapt to only specific environments that rely on GPUs.
“I don’t think [OpenCL] is going to be embraced across all architectures.” Patterson said. “We need in the meantime to be trying other things,” like trying to improve on the programming models with commonly used development tools, like Ruby on Rails, he said.
While audience members pointed out that parallelism has been a problem for decades, the panelists said that universities are now taking a fresh approach to working on multiple programming tools to enable parallelism. After years of funding chip development, the government is also paying more attention to parallel processing by funding related programs.
Kozyrakis said Stanford has established a lab that aims to “make parallel application development practical for [the] masses,” by 2012. The researchers are working with companies like Intel, AMD, IBM, Sun, Hewlett-Packard and Nvidia.
An immediate task test for developers could be to try to convert existing legacy code in parallel for execution on modern chips, Berkeley’s Patterson said. A couple of companies are offering automatic parallelization, but rewriting and compiling the legacy code originally written for sequential execution could be a big challenge.
“There’s money to be made in those areas,” Patterson said.