Information about Hyperthreading



Hyper-threading, officially called Hyper-Threading Technology (HTT), is Intel's trademark for their implementation of the simultaneous multithreading technology on the Pentium 4 microarchitecture. It is a more advanced form of Super-threading that debuted in U.S. patent 4,847,755 (Gordon Morrison, et. al) and can be seen in use on the Intel Xeon processors and was later added to Pentium 4 processors. The technology improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle, for example during a cache miss. A Pentium 4 with Hyper-Threading enabled is treated by the operating system as two processors instead of one.

Performance

The advantages of Hyper-Threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.

According to Intel, the first implementation only used an additional 5% of the die area over the comparable non-hyperthreaded processor, yet yielded performance improvements of 15–30%.

Intel claims up to a 30% speed improvement compared against an otherwise identical, non-simultaneous multithreading Pentium 4. The performance improvement seen is very application-dependent, however, and some programs actually slow down slightly when Hyper Threading Technology is turned on. This is due to the replay system of the Pentium 4 tying up valuable execution resources, thereby starving the other thread. (The Pentium 4 Prescott core gained a replay queue, which reduces execution time needed for the replay system, but this is not enough to completely overcome the performance hit.) However, any performance degradation is unique to the Pentium 4 (due to various architectural nuances), and is not characteristic of simultaneous multithreading in general.

Details

Hyper-Threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a Hyper-Threading equipped processor to pretend to be two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. Where execution resources in a non-Hyper-Threading capable processor are not used by the current task, and especially when the processor is stalled, a Hyper-Threading equipped processor may use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)

Except for its performance implications, this innovation is transparent to operating systems and programs. All that is required to take advantage of Hyper-Threading is symmetric multiprocessing (SMP) support in the operating system, as the logical processors appear as standard separate processors.

However, it is possible to optimize operating system behavior on Hyper-Threading capable systems, such as the Linux techniques discussed in Kernel Traffic. For example, consider an SMP system with two physical processors that are both Hyper-Threaded (for a total of four logical processors). If the operating system's process scheduler is unaware of Hyper-Threading, it would treat all four processors the same. As a result, if only two processes are eligible to run, it might choose to schedule those processes on the two logical processors that happen to belong to one of the physical processors. Thus, one CPU would be extremely busy while the other CPU would be completely idle, leading to poor overall performance. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors; in a sense, this is a limited form of the scheduler changes that are required for NUMA systems.

Security

In May 2005 Colin Percival presented a paper, Cache Missing for Fun and Profit (PDF file), demonstrating that a malicious thread operating with limited privileges permits monitoring of the execution of another thread, allowing for the possibility of theft of cryptographic keys. Note that while the implementation details are specific to the Intel Pentium 4, "for reasons of availability," the techniques in general apply to any system that caches or pages memory; see also, virtual memory, timing attack.

Future

Older Pentium 4 based CPUs use Hyper-Threading, but the current-generation Pentium M based cores, Merom, Conroe, and Woodcrest, do not. Hyper-Threading is a specialized form of simultaneous multithreading (SMT), which has been said to be on Intel's plans for the generation after Merom, Conroe and Woodcrest.

More recently Hyper-Threading has been branded as energy inefficient. For example, specialist low power CPU design company ARM has stated SMT can use up to 46% more power than dual CPU designs. Furthermore, they claim SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease[1]. These considerations are claimed to be the reason Intel has dropped SMT from new cores.

For 2008 Intel has claimed that the new Nehalem will see the rebirth of Hyper-Threading. Nehalem is projected to contain up to 8 cores and will be able to effectively scale 16+ threads.[1]

See also

References

1. ^ [2]

External links

Security
Performance problems

Sources

Replay: Unknown Features of the NetBurst Core [3]
Intel Corporation

Public (NASDAQ:  INTC , SEHK: 4335 )
Founded 1968 1
Headquarters Santa Clara, California
 United States

Key people Paul S.
..... Click the link for more information.
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
..... Click the link for more information.
Pentium 4
Central processing unit

An LGA 775 Pentium 4
Produced: From 2000 to 2008
Manufacturer: Intel
CPU Speeds: 1.3 GHz to 3.
..... Click the link for more information.
Super-threading is a form of multithreading, that appeared in Pentium processors prior to the introduction of hyper-threading.

In super-threading, the processor can execute instructions from a different thread each cycle.
..... Click the link for more information.
Xeon
Central processing unit

Produced: From 1998 to present
Manufacturer: Intel
CPU Speeds: 1.6 GHz to 3.
..... Click the link for more information.
In computer engineering, an execution unit is a part of a CPU that performs the operations and calculations called for by the program. It may have its own internal control sequence unit (not to be confused with the CPUs main control unit), some registers, and other internal units
..... Click the link for more information.
An operating system (OS) is the software that manages the sharing of the resources of a computer. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the
..... Click the link for more information.
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
..... Click the link for more information.
The Replay system is a little known subsystem within the Intel Pentium 4 processor. Its primary function is to cache operations that have been mistakenly sent for execution by the processor's scheduler.
..... Click the link for more information.
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
..... Click the link for more information.
The architectural state is the part of the CPU which holds the state of a process, this includes:
  • Control registers
  • Instruction Flag Registers (such as EFLAGS in x86)

..... Click the link for more information.
Branch misprediction occurs when a Central processing unit (CPU) mispredicts the next instruction to process in branch prediction, which is aimed at speeding up execution.
..... Click the link for more information.
A data dependency in computer science is a situation whereby computer instructions refer to the results of preceding instructions that have not yet been completed. This can also be known as a data hazard. Ignoring data dependencies can result in race conditions.
..... Click the link for more information.
Symmetric multiprocessing, or SMP, is a multiprocessor computer architecture where two or more identical processors are connected to a single shared main memory. Most common multiprocessor systems today use an SMP architecture.
..... Click the link for more information.
An operating system (OS) is the software that manages the sharing of the resources of a computer. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the
..... Click the link for more information.
Some of the information in this article or section may not be verified by . It should be checked for inaccuracies and modified to cite reliable sources.

..... Click the link for more information.
Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor.
..... Click the link for more information.
This article is about the computer term. For the TBN game show, see Virtual Memory (game show).
Virtual memory is an abstraction implemented in a computer that gives an application program the impression it has contiguous working memory, while in fact it is
..... Click the link for more information.
In cryptography, a timing attack is a side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms. The attack exploits the fact that every operation in a computer takes time to execute.
..... Click the link for more information.
Pentium M
Central processing unit

Pentium M 730 core Dothan
Produced: From 2003 to present
Manufacturer: Intel
CPU Speeds: 900 MHz to 2.
..... Click the link for more information.
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
..... Click the link for more information.
ARM Limited

Public (LSE:  ARM , NASDAQ:  ARMHY )
Founded 1990
Headquarters Cambridge, England, UK

Key people Sir Robin Saxby, Warren East, Tim Score
Industry RISC Microprocessors
Products Processor IP; Physical IP
..... Click the link for more information.
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
..... Click the link for more information.
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
..... Click the link for more information.
In computer science, thrash is the term used to describe a degenerate situation on a computer where increasing resources are used to do a decreasing amount of work. Usually it refers to two or more processes accessing a shared resource repeatedly such that serious system
..... Click the link for more information.
Simultaneous multithreading, often abbreviated as SMT, is a technique for improving the overall efficiency of superscalar CPUs. SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
..... Click the link for more information.
This article contains information about scheduled or expected .
It may contain preliminary or speculative information, and may not reflect the final specification of the product.

Nehalem is a codename for both a processor microarchitecture and a processor.
..... Click the link for more information.
multi-core CPU (or chip-level multiprocessor, CMP) combines two or more independent cores into a single package composed of a single integrated circuit (IC), called a die, or more dies packaged together.
..... Click the link for more information.
A barrel processor is a CPU that switches between threads of execution on every cycle. This CPU design technique is also known as "interleaved" or "fine-grained" multithreading.
..... Click the link for more information.
Portable Document Format (PDF)

Adobe Reader displaying a PDF in Microsoft Windows Vista
File extension: .pdf
MIME type: application/pdf
Type code: 'PDF ' (including a single space)
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter