The last word on Hyperthreading (4930k)

Hyperthreading works like this:

  • CPUs always consisted of “architectural state” and “execution units”
  • Architectural state (pipelining, etc…), however, is much slower than the execution units of today (which can execute more than one instruction per clock)
  • The original Pentium already introduced a concept called “superscalar execution”, which even doubled the number of execution units, thereby making the execution units faster - this was necessary because back in those dark ages, executing even one instruction per clock cycle was almost unheard of (even integer multiplication and division took up many, many clock cycles… as opposed to today)
  • One day, the page turned - and suddenly we had CPUs which could execute multiple instructions in a single clock cycle - and the architectural state became the bottleneck (not being able to hand out enough work to the execution units)
  • To solve this problem, Intel implemented “symmetric multithreading” (SMT), which we all know by their brand name of Hyperthreading (HT) - SMT, however, basically only doubles the (rather small) architectural state, while leaving the number of execution units as they are
  • Result: much better utilization of otherwise mostly idling execution units

Thats the theory.

In practice, there are a few things keeping HT / SMT from scaling 100%:

  1. Streams of CPU instructions in threads utilizing the same execution units, without instructions regarding other execution units which could be executed (in order or out of order, doesn’t matter) in between them - audio software is affected by this a little bit (music software uses general purpose CPUs as DSPs, but the problem is not as big as it may seem at a first glance)
  2. Theoretically the so called “cache thrashing”, if the L1 / L2 caches are too small (which is NOT the case with Cubase and similar applications, btw…!)
  3. Also, back in the dark ages, there was - from what I have heard - a problem with some OS which didn’t load out the physical and logical cores in the proper order - Windows 7 and Windows 8 / 8.1 DO NOT (!) have this problem
  4. It’s a question of effort and the quality of the HT implementation, which is far better than back in the old Pentium 4 days (one has to add that the P4 was an extraordinarily bad CPU overall anyway)