HP Compaq Deskpro EN 6233, Compaq Deskpro EN 6266, Compaq Deskpro EN 6300, Compaq Deskpro EN 6300A, Compaq Deskpro EN 6300C White Paper

...
Page 1
White Paper
October 2001
Prepared by: Workstations Division Engineering Compaq Computer Corporatio n
Comparison of Intel Pentium III and Pentium 4 Processor
Contents
Introduction................................. 3
Comparison of Pentium 4 and Pen tium III A rchitecture
Benefits....................................... 3
Case for Performance.................. 5
SYSmark 2001......................... 5
3D WinBench 2000 –
Processor Test......................... 6
Summary.....................................8
Additional Micro-Architecture
Detail .......................................... 9
Pen tium III............................... 9
Pentium 4.............................. 10
Impact of DirectX 8.0.............. 12
15WD-1101A-WWEN
Performance
Abstract: This white paper summarizes key technology adva ncements in the Intel® Pentium® 4 process or compared wit h the previous-generation Intel Pentium III processor. Common benchmark workloads are discussed to provide an illustration of whic h areas of c omputing will benefit the most f rom th is new ar c hitecture. Results of Compaq benchma rk testing, c omparing results for both processors, are included to demonstrate the performance gains realizable with the new processor.
Page 2
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 2
Notice
The information in this publication is subject to change without notice and is provided AS IS WITHOUT WARRANTY OF ANY KIND. THE ENTIRE RISK ARISING OUT OF THE USE OF THIS INFORMATION REMAINS WITH RECIPIENT. IN NO EVENT SHALL COMPAQ BE LIABLE FOR ANY DIRECT, CONSEQUENT IA L, INCI DENT AL, SPECIAL, PUNITI VE , OR OTHER DAMAGES WHATSOEVER (INCLUDING, WITHOUT LIMITAT I ON, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, OR LOSS OF BUSINESS INFORMATION) , EVE N IF COMPAQ HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
The limited warran ties for Compaq products ar e exclusively set forth in th e d ocu mentati on accompanying such products. Nothing herein should be construed as constituting a further or additional warranty.
This publication does not constitute an endorsement of the product or products that were tested. The configuration or configurations tested or described may or may not be the only available solution. This test is not a determination of product quality or correctness, nor does it ensure compliance with any federal, state o r lo cal require ments.
©2001 Compaq Information Technologies Group, L.P. Compaq, the Compaq logo, Evo and Deskpro are trademarks of Compa q Information Technologies Group, L.P. in the U.S. and/or oth er countries. Intel, Pent iu m, and Celeron ar e trademark s of Intel Corpor ation in the U.S. and/or oth er cou n tries. Micr oso ft and Wind ows are either registered tr ademarks or tr ademarks of Microsoft Corp or ation in th e United States and/or other countries. All other product names mentioned herein may be trademarks of their respective companies.
Comparison of Intel Pentium III and Pentium 4 Processor Performance White P aper prepar ed by Workstations Divisi on En gineerin g
First Edition (October 2001) Document Number 15WD-1101A-WWEN
Page 3
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 3
Introduction
This White Paper provides information useful in understanding the differences between the Intel® Pentium® 4 processor and the previ ous -generation Pentium III. A dis cus s i o n is inc luded about the architectural differences in the two processors and the performance benefits they provide. When evaluating performance, there is no single performance test (“benchmark) that can completely describe the performance of a complex system like modern microprocessor or personal computer. It is important to obtain the complete performance picture. In other words, the system should deliver high performance across the entire spectrum of applications such as producti vity, multimedia, 3 D a nd Inter net. Each of the se app licat io n c ategories c aries a uniqu e set of computation and data movement characteristics; thus it is important to realize how each class of application would benefit or not from the new architecture. It is also important to realize the investment protection delivered, where the new architecture will provide reasonable performance gain for current applications while providing headroom for future growth as more and more ISVs will fully take advantage of the new a rchite c ture. W ith that in mind, it is expected that there is a non-uniform gain in performance, as each class of current application lends itself more to the new architecture while others do not. Using the Compaq Deskpro EN platform equipped with 1 GHz Pentium III processor as the baseline, benchmark results of the new Compaq Evo D500 platform equipped with the 1.7 GHz Pentium 4 processor are presented as a comparison of the two architectures.
Comparison of Pentium 4 and Pentium III Architecture Benefits
As Internet and digital med ia become more p ervasi v e i n modern comput ing, the Pentium 4 processor is optimized for a new level of digital audio, video, photography and 3D performance. For corporate users, the Pentium 4 offers excellent performance with added headroom for future applications such as
Java technology and XML, which will be increasingly enabled in Office XP, Windows® XP and W eb services
Enhanced 3D rendering for business analysis, video decompression for e-learning, and peer­to-peer interaction for improved collaboration
Secure connections with support for latest encryption technology for data transfer and e­Commerce tr ansactions.
How are these potential enhancements possible with this new processor? Lets explore the mic ro­archit ecture enhancements in th e P entium 4 processor:
Representing a breakthrough to a new level of computing, the Pentium 4 processor is a completel y red esig ned v ers i on of t h e earlier Intel IA32 processor architecture or Pentium III while maintaining backward compatibility with existing applications. This means the Pentium 4 processor protects users current investment in existing applications while providing new optimized instructions, registers, and data structures for future applications.
Page 4
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 4
The Pentium 4 proces sor is optimiz ed for la rge data sets transfer and handlin g. This means th e customer will see significantly improved performance over previous generation Pentium III processor s in applications that handle and require larg e amount s of data. This will ap ply to all vert ic al app lications an d many hor izontal app lic ations , such as financial analysis applic atio ns, where handling large dat a sets is the nor m. For a limited nu mber of horizontal ap plications, such as Microsoft Word, performance is not enhanced and can even suffer, though differences are typically made up for by faster processor speeds enabled by the new processor architecture. Howe ver, it should be noted that the tren d in offi ce appl ications is f or grea ter a nd great er usa ge o f graphics. Use of graphics presupposes the existence of data-intensive graphics-generation applications, which benefit greatly (and noticeably to the user) from Pentium 4 enhancements. Moreover, as noted above, the trend is also to increasing use of java technology and XML in Office XP, Windows XP and Web services. Nevertheless, in the short term, if the customer’s need is primarily for office applications and there is a budget constraint, Pentium III may still offer an acceptable solution. However, the customer should be aware that Compaq expects that, in the near future, office applications will be handling much more data requiring the architectural adva ntages the Pe ntium 4 p ossesses.
Perhaps more important for the user is the fact that higher processor speeds from Intel will only be available in the future in the Pentium 4. The Pentium III will offer no further increases in processor speeds. (Intel will continue to r ef resh Celero n processors , however). This is illu strated in Figure 1, which shows the roadmap for of Intel processor technology. This means that regardles s of the a pplication, improvements in performance can onl y be obtained b y greater processor speed available from Intel in the Pentium 4 processor.
Figure 1: Roadmap of Intel Processor Evolution
Page 5
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 5
On t he surface, the a rchitecture of t h i s new class of Pentium 4 pr o cess o r looks t he same as the Pentium III, but after one drills further down, the Pentium 4 is significantly enhanced to give better levels of performance in terms of frequency and i nstru c tions execut ion per c lock. Thes e are the two variables that measure the level of how fast an application executes and is defined in the following performance equation:
Performance = M H z (Frequency) x Instr u ctions executed per clo ck (I P C ) The Pentium 4 processor addr es ses th e two variab les in the performance equatio n w i th the n e w
underlying silicon/logic implementation of what Intel calls NetBurst micro-architecture. The NetBurst mi c ro-ar c hitecture more specifical ly atta c ks the frequency and IPC varia bles of th e performance equatio technology, its redesigned architecture of the complete instruction pipeline, its execution engine, and its extension to the existing instruction set. As we move forward through this paper, the benefits of this is will be more clearly explained
More detailed information can be found after the summary section.
QZLWKLWVDGYDQFHG PDQG PVKRUWO\DIWHUVLOLFRQSURFHVV
Case for Performance
Applications generally can be divided into two classes: 1) floating-p oint-based ap pli cat ions that are memory- a nd bandw idth- intensive and, 2) integer-based a nd b asic office productivity applications. Recalling the performance equation mentioned above, the IPCs achievable by the above two classes of applic atio ns vary great ly due to t he var iation of branc hes in applica tion code. This variation of br anches a f fects t he predic tability of co de flow. A higher probabilit y of correct pr e dicti on yiel ds a high er potential IP C and, therefore, h igher p erfor mance. Floati ng­point-based multimedia applications tend to have branches that are very predictable and thus have a higher IPC potent ial. As a result, these appl ic ations scale very well with f reque ncy and benefit greatly from the new architecture of the Pentium 4. However, integer-based and basic office productivity applications tend to have more random branches in application code, thus are more difficult to predict. The result is less efficient use of the Pentium 4 architecture on these applications. However, since Pentium 4 processors are available at higher frequencies than Pent ium III, performanc e is still e n hance d accor ding to the perf ormance equa tion.
SYSmark 20 01
SYSmark2001 is a suite of application software and associated benchmark workloads developed by Applications Performance Corporation (BAPCO). It is a tool that measures system performance on popular business-oriented applications in the Microsoft Windows operation system. SYSmar k c ontai ns twel ve (12) applica tion workl oads that are di vided i nto two cat e gories: Offic e Productivity and I nternet Content Creation.
Page 6
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 6
Figure 2: Comparison of Pentium III with Pentium 4 in SYSmark 2001 Benchmark Tests
Figure 2 cl early il l ustr ates the Pentium 4 perf ormanc e adva ntages over Pentiu m III. It is also cl ear that performance gains in the Office Productivity workload are less dramatic when compared to Internet Content Creation workload. In the Internet Content Creation workload, where the typical workload is streamed in nature (Windows Media Encode for example), the application tends to have branches that are very predictable resulting in performance that scales very well with fre quency and ben e fits greatl y from t he new archit ecture of the P entium 4.
3D WinBench 2000 – Processor Test
3D WinBench 2000 measures system–lev e l 3D performance, i nclud ing CP U and gra phics subsystem. To understand the processor 3D performance, this benchmark suite includes the Pr oc essor Test which measures the CPU-intensiv e portion of the 3D gr aphics pipeline – geometry and setup stage.
Page 7
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 7
Figure 3: Comparison of Pentium III and Pentium 4 in 3D Winbench 2000 Processor Test
To display 3D obj ect s o n a 2 D comput er s creen, it is much easier t o r epr es ent 3 D objects as a collection of polygons (usually triangles) than as curved surfaces. The larger the number of triangles used to represent the 3D object, the more closely the approximation of the mathematical description resembles the 3D object. The process of breaking up a 3D object into triangles is called tessellation and involves an enormous number of floating-point vector calculations. Objects in the real world have material properties and reflectivity and these impact how the objects interact with light, the more lighting from various sources and angles, the more realism to the object/scene. Again, ca lc ulati ons of light effects on 3D obj e c ts require la rge numbers of complex floating-point vector calculations. The CPU index performance gain in the 3D Winbench 2000 – Process or Test , bench ma rk, illustr ated in Figur e 3 , resulted fro m the increas e i n floating- point performance of the Pentium 4 processor .
Page 8
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 8
Summary
The Pentium 4 architecture offers significant innovations compared to earlier Pentium III technology. These innovations lead to breakthroughs in performance that are measured and substantiated by testing reported in this white paper.
The Pentium 4 proces sor is optimiz e d for large dat a sets transfer and han dling, so customers w ill see significantly improved performance over previous generation Pentium III processors in applications that hand le and re quire la rge amounts of data . Floatin g- point-based multimedia applications tend to have branches that are very predictable and thus have a higher IPC (Instructions executed Per Clock) potent ial. Integer - based a nd basi c of fice productivity applications tend to have more random branches in application code, thus are more difficult to predic t. This means t he IPC potentia l is not high, bu t the fact that P ent ium 4 is availab le in higher frequencies than Pentium III results in increased performance with these applications.
It is important for the user to note the fact that higher processor speeds from Intel will only be available in the future from the Pentium 4. The Pentium III will offer no further increases in processor speeds. (I ntel will continue t o refr esh Cel eron processors, however). At some p oint, regardles s of the a pplication, improvements in performance can onl y be obtained b y greater processor speed. The customer should be aware that Compaq expects that, in the near future, office applications will be handling a lot more data, thus resulting in the need for increased processing power and efficiency that the Pentium 4 offers.
Page 9
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 9
Additional Micro-Architecture Detail
Figures 4 and 5 provide an o verview of the micro -architectu res of the Pentium II I and Pentium 4 processors respectively.
Pentium III
Figure 4: Pentium III Micro-Architecture Overview
Page 10
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 10
Pentium 4
Figure 5: Pentium 4 NetBurst Architecture Overview
Again, the NetBurst micro-architecture attacks the frequency and IPC variables of the performance equation
ZLWKLWVDGYDQFHG PDQG PVKRUW
ly after) silicon process technology, its redesigned architecture of the complete instruction pipeline, its execution engine, and its extension to the existing instruction set, which is as follows:
20-Stage Pipeline as compared to a 10-stage Pipeline in the Pentium III – smaller workload
per stage but at significantly faster execution time
Execution Trace Cache to remove the long latency associated with the instruction decoder
from the main executi o n loop i n the Pentium III
Rapid Execution Engine where multiple Arithmetic Logic Units (ALUs) are executed twice
as fast as the core freque ncy, resulting in higher execution throughput, reduced exe c ution late ncy, a nd extension of the total of execution por ts to seven (7) as compared to five (5 ) in the Pentium III
Advanced Transfer Cache with much higher throughput at 54.4GB/s for a 1.7 GHz Xeon (32
bytes x one transfer per clock x 1.7 GHz) to feed the data-hungry execution units as compared to 16GB/s throughput at 1 GHz in the Pentium III
Page 11
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 11
Advanced Dynamic Execution with very wide windows of instructions (126 instructions
versus 42 instructions in the Pentium III) from which the execution units can choose to execute, thus avoiding dependency stalls that would prevent execution units from doing useful work. In addition, 4KB of branch target buffer (as compared to 1KB in the Pentium III), and a multilevel advanced branch prediction algorithm to keep detail on the history of past program branches, thus reducing by approximately 33% the mis-predictions rate as compared to the Pent ium III.
400 MHz System Bus with enhancements to signaling scheme and bus protocols, thus
featuring data bandwidth and b us transfer ef f iciencies much higher tha n those o f the Pentium III, as follows:
200% data bandwidth improvement (3.2GB/s (8 bytes x 400 Mtransfers/s) versus 1.06
GB/s (8 bytes x 133 Mtransfers/s))
17% latency improvement for first critical data read – 46% latency improvement for 64-byte read – 25% latency improvement for data write – 64% latency improvement for 64-byte write – New cycles every two clocks at 200 MHz versus every three clocks at 133 MHz – 200% snoop bandwidth improvement (3.2GB/s (64 bytes/2 clocks @ 100 MHz) versus
1.06GB/s (32 bytes/4clocks @ 133 MHz)).
Higher concurrent requests – Faster interrupt servicing (bus message versus I/O cycles)
Streaming Single Instruction Multiple Data Extension 2 (SSE2) with 144 new instructions
that deliver 128-bit SIMD integer arithmetic operation and 128-bit SIMD Double-Precision Floating Point to reduce the number of instructions to complete a task or program, effectively increasing IPCs.
Page 12
Comparison of Intel Pentium III and Pentium 4 Processor Performance White Paper 12
Impact of Dir ectX 8.0
Opt imized us age of SSE/S SE2 ext e nsion a nd code flow optimization to take a dvantage of th e new NetBurst micro-architecture, allow graphic drivers to make use of DirectX 8.0 programmable vertex and pixel shader s to pr oduce signifi c ant perfor ma nce gai ns as il lu stra ted in Figure 6.
Figure 6: DirectX8 Performance Improvements
Loading...