Wonder has ISO 9001 certified & developed high-level manufacturing technique in production fields of CPU heatsink & fan,With years of manufacturing experience, our experts offer best solution for all your cpu cooler needs.

Server CPU Design Trends: Back To Simplicity And TLP

This article mainly introduces some basic concepts about the future development trend of processors. This trend seems to be becoming more and more obvious. This trend is thread-level parallel processing (TLP/Thread Level Parallelism) technology to make multi-threaded applications more efficient. .

Like many other industries, the computer industry has many concepts that cycle back and forth between popular and obsolete – powerful or simple, supercomputer or cluster, fat client or thin client, close-up or general purpose, revolution or evolution…, For processor design, a major development trend since the 1980s is RISC-style design, which allows people to use a simpler instruction set to simplify core design and bring higher execution efficiency. This is a classic example of the KISS (Keep It Simple Stupid) principle, although not all RISC processors are actually that simple. In the 1990s, the focus of processor design began to move to instruction parallel processing (ILP/Instruction Level Parallelism) and clock frequency, and the processor became more complex. This article mainly introduces some basic concepts about the future development trend of processors. This trend seems to be becoming more and more obvious. This trend is thread-level parallel processing (TLP/Thread Level Parallelism) technology to make multi-threaded applications more efficient. .

Use Multithreading To Improve Performance

At present, multi-processor workstations and servers have become very common. For many years, the sales of these models have reached millions. With the introduction of Intel’s hyper-threading technology, multi-processor technology has also begun to popularize on the desktop. The application of speed is often not easy to meet the requirements for a single processor. By extending the processing to multiple processors, the processing efficiency can be improved. There is some extra complexity in writing programs that use multiprocessing, but that’s not a big deal unless the performance benefits of using multiprocessing don’t justify the overhead.

In fact, there are very few critical applications that do not support multi-threading. If it does not support multi-threading, it may be due to too high development costs or hardware that does not support it on the market. For example, desktop 2D or 3D graphics processing hardware is mostly single-threaded or single-threaded for primary processing. While high-end SGI graphics processing systems use hundreds of processors, 3D processing software is often highly parallelizable and can also be used on desktop graphics systems, but most desktop systems currently have only one processor. However, Sony’s PlayStation 3 gaming console will use multiple processors, which means that the gaming software on this gaming platform can benefit from multi-threading for better performance.

Multiprocessor systems have existed for a long time, but the technology to achieve multithreading through multiple processor cores has just appeared. This technology called chip multiprocessing (CMP/Chip Multi-Processing) Putting the core inside a processor has become common in some mission-critical embedded applications. All the major CPU companies are planning to launch CMP products, usually by adding another identical core to an existing design, and these products are currently targeting the server market.

This may sound boring and simple, but it involves some deep-seated issues, that is, the trade-offs in CMP optimization of processor cores. This is also the main content of this article. The end result of these optimizations is KISS — making CPU core design simpler and more efficient once again. However, unlike the RISC design trend of the 1980s, this time no changes to the processor’s instruction set were required.

This change involves the design and implementation of the core of the operating system, as well as the design of the compiler and some software related to threads. Processors that use CMP technology and each core also supports multi-threading will have only one processor. The system supports a large number of threads. The more rationally these threads are planned, the less likely they are to be blocked, and the higher the system execution efficiency will be. There are also optimizations such as more efficient locking of code snippets and specific drivers to make applications more friendly to multithreaded processors. Inadequate support for multithreading models is now a big problem for operating systems, and in the consumer market, hardware implementation of multithreading technology also takes time.

Lose Weight And Gain

In some discussions on the development direction of CPU design, the traditional CPU design is often referred to as “FAT (fat)” relative to multiple thin cores optimized by TLP. At present, the design of only one core is mainly oriented to single-line and performance-oriented optimization, while the consideration of improving the execution efficiency of multi-threaded programs is relatively minor. TLP-optimized processors are just the opposite. Single-threaded performance gives way to multi-threaded performance. The purpose of the design is to optimize the execution efficiency of thread groups rather than single-threaded performance when there are multiple active threads.

In order to achieve this goal, there are two aspects worth considering. One is to use smaller CPU cores, which usually leads to higher execution efficiency. Optimization of the memory system. Because TLP-optimized processors are already purpose-specific in certain programs, they may target special applications such as high-performance computing HPC (such as IBM and Sun’s HPC implementation) or network performance-sensitive applications (such as some embedded Design or SUN’s Niagara processor) for special optimization, of course, general-purpose servers (such as IBM and Sun’s mainstream POWER and UltraSPARC processors) can also use this technology.

Now, let’s analyze how a processor (such as IBM’s POWER processor or Sun’s UltraSPARC processor) for a server that handles heavy workloads is designed to handle the two aspects mentioned above.

A simple metric for evaluating the performance of a processor test is the actual number of instructions the processor executes per second, but this criterion is not easy to compare when the processors are using different instruction sets. Another way to compare is to compare the average number of instructions executed per processor cycle (IPC) and the processor cycle frequency, in gigahertz as a unit, that is BIPS, which is equal to the average IPC × frequency, BIPS is also one billion per second meaning of the command. In the actual test, the average number of instructions per clock cycle (IPC) is always smaller than the maximum number of instructions that can be executed in parallel (Instruction Level Parallelism/ILP) that can be obtained by the CPU. The result obtained by running the latest processor and using the latest server test program is Rarely is IPC greater than 1.0, if using a Xeon processor, this value is rarely greater than 0.5, which is much smaller than the theoretical value that can be achieved, and it is even worse if the hardware and software used are not optimized.

Using a 3GHz 6-way core, the fat processor can reach a peak processing power of 18BIPS. However, for a single-threaded program, it is difficult to achieve an execution speed higher than 1.5IPC, so its processing power is only 4.5BIPS. In terms of multi-threaded server programs, it can reach 2.0 IPC, thereby increasing the processing power to 6 BIPS. If using eight 1.5GHz three-way TLP-optimized cores, the peak processing power will reach 36BIPS. If the actual single-threaded system reaches 1.0IPC and the server multi-threaded program reaches 2.0IPC, it will provide 2BIPS for single-threaded programs and 2.0IPC for single-threaded programs. Multithreaded programs bring 24BIPS of processing power, four times the performance improvement for multithreaded programs is the reason we do this. As to why these “thin” cores can lead to higher speeds with reduced size and power consumption, that’s a question we’ll discuss in a future post.

CPU Heatsink
Logo