ParallelFox and HyperThreading

Years ago, Intel added a HyperThreading feature to their CPUs before dual-core processors were available. More recently, Intel reintroduced the technology into their “Core i” series of processors. What is HyperThreading and how does it affect ParallelFox? Let’s start with the Wikipedia description:

Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two “logical” processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)

What this boils down to is that a single thread/process will not utilize all of the execution “slots” or “units” in a CPU core. This is especially true when the processor is “stalled”, meaning that the processor is waiting on something before it can continue. This may be due to the inherent design of the CPU, or because it is waiting for data to be accessed from main memory. HyperThreading allows a second thread/process to utilize the unused execution slots. Generally, this is a good thing and can provide a 15-30% performance boost to parallel processing.

However, in cases where there is heavy competition among threads for the same execution slots and other resources, HyperThreading can be slower than running a single thread on each core. The examples that ship with ParallelFox exploit this weakness. On a single-core HyperThreading CPU, the “after” examples are actually slower than the “before” examples. Of course, this was not intentional. The reason is that the examples simulate work rather than resemble real-world code. Here is the SimulateWork() function:

Procedure SimulateWork
   Local i
   For i = 1 to 1000000
      * Peg CPU
   EndFor
EndProc

While this code does a good job of pegging a CPU core at 100%, it also causes the same few instructions to be executed millions of times. With HyperThreading enabled, competition between the two threads for the same CPU resources is extreme. In a real-world scenario, there would likely not be this much competition for resources and HyperThreading would be beneficial.

As with most things, your mileage may vary. If you find that your code runs slower with HyperThreading, you can tell ParallelFox to use only half of the “logical” processors and start only one worker per physical core. Here is example code for that:

* Use only physical cores
If Parallel.DetectHyperThreading()
   Parallel.SetWorkerCount(Parallel.CPUCount / 2)
EndIf
Parallel.StartWorkers("MyApp.EXE")

ParallelFox uses WMI (Windows Management Instrumentation) to detect if HyperThreading is enabled. WMI has shipped with windows since Windows 2000. However, WMI can only detect HyperThreading on Windows XP SP3, Windows Server 2003, and later versions, because that is when Microsoft introduced the required APIs. On previous versions of Windows, Parallel.DetectHyperThreading() will always return .f. even if HyperThreading is enabled.

There are several other features I want to add to ParallelFox, but at this point, I think it is feature complete for version 1.0.  Also, very few issues have been reported from previous versions, so I am moving this release up to release candidate status.

Download ParallelFox at VFPX.