Part 1: A look at how it works
No chip in years has caused as much excitement as the Cell processor developed by IBM, Sony and Toshiba. It promises to be the most important microprocessor of the decade, with potentially enormous repercussions for how the industry computes, and how the rest of us use digital media. It will power the PlayStation 3 and technical and commercial computing.
Technical details of Cell will be disclosed at the International Solid State Circuits Conference in San Francisco next week, and in anticipation we’ll look first at how the Cell works and then tomorrow at what it means to the industry and consumers.
Excitement about Cell has already led to some wild and poorly informed speculation, as Ars Technica’s Jon Stokes rued last week. But earlier in the month, Microprocessor Report’s Tom Halfhill published an investigation into a detailed patent filed in 2001, and published by the USPTO in October, and he was kind enough to discuss it with us. We’ll refer to it as the ‘734 patent.
The ambitious scale of the project is one of the most remarkable aspects of Cell.
“It isn’t just a single microprocessor or even a family of processors,” writes Tom. “It’s a top-to-bottom architecture for a broad range of computing systems, from servers and workstations at the high-end to game consoles, PDAs, digital TVs, and other consumer electronics at the low end”.
How does it look?
The ‘cell’ which gives the chip its name doesn’t refer to the hardware, but to a virtual clump of software which roams the system looking for computing resources. The patent refers to a “cell object” – program and data – and it can even roam across LANs or WANs, to find another Cell-based device.
A Cell chip consists of one or more independent execution units, and a program can commandeer as many of these as resources allow to create a temporary execution pipeline, each with its own register file and banks of RAM. These pipelines are dynamically configurable and can lock out other processes from grabbing their hardware resources. “The Cell architecture introduces a whole new meaning to the term ‘self-modifying code’,” notes Tom drily.
The ‘734 patent calls the basic hardware unit a PE, or ‘processor element’. Rather confusingly, a PE consists of a ‘processor unit’ or PU, and an array of attached, er, processing units or APUs. The patent, Tom notes, says that the “preferred” PE configuration is eight APUs. The “preferred embodiment” of an APU is 128kb of SRAM, 128 x 128-bit registers, four integer units and four floating point units. Some of these may be specialized for tasks such as shading.
Inside each software cell are ‘apulets’. These aren’t necessarily self-contained programs, stress MPR, but seem more like serialized objects. Amongst the many mysteries yet to be revealed about software cells is how the chip schedules such tasks, not just amongst onboard PEs but also amongst other Cells.
“Imagine an apulet running on your PDA that depends on a result coming from another apulet running on a computer in Norway,” writes Tom. The Cell processor must make its best guess, based on network latencies, how to distribute the workload. The designers have set themselves an awesome challenge.
Halfhill also notes that the Cell’s architecture is more flexible than Java’s sandboxes, because a software cell can encapsulate several processes, or part of a single process. There’s no evidence, he points out, that Cell implements JVMs in hardware: it’s much more subtle than that. For security purposes, Cell’s hardware restrictions may prove to be the most controversial aspect of the chip.
Some interesting design decisions have been made in creating the memory architecture –
“It’s hard to avoid the conclusion that Cell processors will have an extraordinarily secure but cumbersome memory model. For each main-memory access, the processor would have to consult four lookup tables… Three of those tables are in DRAM, which implies slow off-chip memory references; the other table is in the DMA controller’s SRAM. In some cases, the delays caused by the table lookups might eat more clock cycles than reading or writing the actual data. The patent hints that some keys might unlock multiple memory locations or sandboxes, perhaps granting blanket permission for a rapid series of accesses, within certain bounds.”
The Cell architecture isn’t just a blueprint for a new kind of chip, but for a massively distributed global computing network. Each Cell is given a GUID, a global identifier. Your PlayStation may be hosting processes that began life on a Cell on another side of the world. Remember that the architecture enables a strict, lock-down machine to be built, with access to memory tightly controlled. Since DRM is predicated on controlling uniquely-identified media to run, or not run, on a specifically-authorized piece of hardware, this allows system designers much more scope in building systems which can both restrict and track the content they play.
There may be more benign uses: Cell clearly makes a very sophisticated building block for distributed grid computing too. “A hypothetical Cell processor with eight of these APUs could achieve 32 BOPS and 32 gigaFLOPS at only 250MHz,” writes Tom. Or a teraflop at 1Ghz. This is an order of magnitude higher than today’s workstations in what could be a low cost, low power machine. If Cell fulfills its promise, Intel is facing its greatest challenge since the turn of the 1990s, when RISC processors seemed to be extending an unbeatable performance lead, and when Microsoft was porting Windows NT to every RISC platform it could: MIPS, Alpha and PowerPC. But the remarkable P6 core (which first appeared in the Pentium Pro) saw the performance gap narrow, and the alliances arrayed against Intel stumbled and fragmented.
This time, Cell is aimed at a different market, one that Wintel has failed to conquer – the living room.