Pentium 4
Essay Preview: Pentium 4
Report this essay
Recently Intel introduced their newest line of the Pentium 4 processors with the new Prescott core. In this paper I will discuss how the Pentium 4 processor works and the changes that have been made since its release, but mainly on the modifications in the newest Pentium 4s with the Prescott core. I will also briefly compare the performance levels of some of the different types of Pentium 4s.
The Pentium 4 line of processors encompasses a large range of clock speeds, from 1.7GHz up to 3.6GHz in the Prescott chip. Pentium 4s are all built with the same Netburst microarchitecture, but there are varieties of front side bus speeds, chip layout, and cores available. For example at 2.8GHz, one could choose from four different Pentium 4s: the 2.8GHz (a Northwood core with a 533MHz front-side bus), the 2.8C (Northwood again, but with an 800MHz bus), the 2.8A (Prescott with a 533MHz bus), or the 2.8E (Prescott with 800MHz bus). In all there are four types Pentium 4 versions that Intel has released each having slight improvements then the last.
The first Pentium 4 (Willamette) was introduced in November 2000 to replace its predecessor the Pentium 3. The Pentium 4 was the first to have a totally new chip architecture since the 1995 Pentium Pro. The biggest difference being Intels introduction of the Netburst microarchitecture which involved structural changes that affected how processing takes place within the chip. Aspects of the changes include: a 20-stage pipeline, which boosts performance by increasing processor frequency; a rapid-execution engine, which doubles the core frequency and reduces latency by enabling each instruction to be executed in a half (rather than a whole) clock cycle; a 400 MHz system bus, which enables transfer rates of 3.2 gigabytes per second; an execution trace cache, which optimizes cache memory efficiency and reduces latency by storing decoded sequences of micro-operations; and improved floating point and multimedia unit and advanced dynamic execution, all of which enable faster processing for especially demanding applications, such as digital video, voice recognition, and online gaming.
In January 2002 Intel released a new version of the Pentium 4 with a new Northwood core. Northwood combined an increase in the secondary cache size from 256k to 512k with a transition to a new 0.13 micrometer (130 nm) process technology. By making the chip out of smaller transistors, it could run faster and yet consume less power. It also had a new socket (socket 478) which unfortunately made upgrades impossible. Bus speeds where also increased to 533MHz and then to 800MHz for processors running at 2.4GHz or more. Also hyper-threading which I will discuss later was introduced to some Pentiums 4s that ran at an 800MHz bus speed.
In September 2003, Intel announced it would release yet another version of the Pentium 4 which they called the Extreme Edition. The design was mostly identical to Pentium 4 (to the extent that it would run in the same motherboards), but differed by an added 2 MB of Level 3 cache. Intel aimed the Extreme Edition to computer gamers, but some viewed it as an attempt to keep up with its competitor Athlon release of the Athlon 64s, nicknaming it the “Emergency Edition”. The effect of the added cache was somewhat variable. In office applications, the Extreme Edition was generally a bit slower than the Northwood, owing to higher latency added by the L3 cache. Some games benefited from the added cache, particularly those based on the Quake III and Unreal engines. However, the area which improved the most was multimedia encoding, which was not only faster than the other Pentium 4, but also the Athlon 64s.
On February 1, 2004, Intel introduced their latest and probably the last Pentium 4 with a new core codenamed “Prescott.” Some of the new changes in the Pentium 4s start with Intels fabrication process where with Prescott, Intel has shrank the Pentium 4 die from a 130nm to a 90nm process. This smaller die size is intended to allow the Prescott to achieve higher clock speeds. Intel has improved manufacturing process in a number of ways in order to facilitate the size change, most notably the use of a strained silicon substrate. When this is stretched slightly, the lattice structure of silicon atoms spreads out and opens up, allowing for freer flow of electrons. This lower resistance, in turn, allows for smaller gate lengths and faster transistors. In total the changes shrink the Pentium 4s die size to 122 mm2, from 145 mm2 for Northwood, even with Prescotts 125 million transistors, over twice Northwoods 55 million transistors.
The Prescott architecture team incorporated additional tweaks to the new Pentium 4 Netburst microarchitecture. The most notable change is that Netbursts main branch prediction/recovery pipeline has increased from 20 stages 31 stages in Prescott. By making each stage of the pipeline less complex, Intel increases the processors tolerance for running at higher clock speeds. Additionally The L1 data cache was increased from 4-way to 8-way when the size doubled (8K in Northwood to 16K in Prescott). The new 1MB unified, write-back L2 cache is 8 way set associative, as in past P4s, and has 128 bytes cache line.
The size of the instruction schedulers for x87 and all levels of SSE instructions were enlarged in order to better find parallelism in multimedia code, as were the effective size of the queues that feed all the schedulers. Increasing scheduler queue size reduces allocator stalls, permitting the allocator logic to keep on assigning micro-ops to individual scheduler queues that follow in the pipeline, while also processing machine resource requests from new micro-ops entering the allocator stage. A dedicated integer multiplier has been added. Before the floating point multiplier had been used for integer multiplies, but that increased latency by moving operands to the FP unit and routing the result back to the integer unit.