Cubase 10, Windows 10 and multi-core (14+ cores)

Hi uarte. No, I never did the registry changes, nor the audioengine.properties trick (which I didn’t even know it existed before reading this thread). It was clear for me since upgrading to Cubase 9 that, if I wanted to work with virtual instruments at low latencies without spikes, the only solution at the time was to disable hyperthreading, so I simply did that and moved on, until now. In fact, reading about these possible improvements in multi-core was the main reason for me to purchase the update to Cubase 10.

Of course, I’ll make my best to help Steinberg / Fabio if there’s something in particular I can test in my rig for them.

Thanks Jorge, I hope Fabio and Steinberg in general can sort this out. I think the general direction is positive with C10, but you should have seen an improvement. I need to test out some more heavy duty projects soon to see if higher CPU load makes a difference for me here. So far so good for me though on an average size project, the improvements with C10 have been noticeably good. But I haven’t had the time yet to really push it as hard as your project appears to do.

Out of curiosity, what plugins are you using in your session? If I have the same ones, I’ll try setting up a session that pushes the system more with those plugins.

Fabio, can you please clarify exactly what this change should or shouldn’t allow? Exactly how many computational threads are allowed now? Can a large scale CPU be expected to be used fully? eg. AMD Ryzen Threadripper 2990WX with 32 core and 64 thread? Or is this just a “fix” to throttle Cubase so it only uses a certain percent of the available threads and no dropouts occur?

If this is not a total fix and still won’t allow full use of large processors like that, is it expected there will ever be a “full fix”? The number of cores and threads available to us grows every year. Some of us are running synths or plugins that require a high CPU single thread, and thus we need these processes spread out among all our cores (with or without multithreading enabled).

Microsoft claims to have been working closely with you on this issue for over 10 months now:
https://answers.microsoft.com/en-us/windows/forum/windows_10-performance/windows-10-limits-max-number-32-of-threads-with/e3a47fc2-9547-4fea-b830-042a552f56a9

So I can only hope you have found a total solution between the two of your companies. Otherwise, I would like to know what you suggest people do long term in order to fully use these big processors. If this is expected to work with Cubase fully, I will order one of these processors and can post test results in a few weeks. But I need to be clear on what to actually expect. I don’t want to buy a $2000+ processor for $800 worth of performance.

If ASIOGUARD is enabled, then why do you have such a heavy load on “real-time peak”? Maybe some kind of plug-in does not support asioguard or is it disabled for some plugins?

With Asioguard should be like this (ASIO Guard is set to “Normal”, and my RME AIO is set to a latency of 128 samples)
asioguard.jpg

@jorge
Have you done the usual optimations in bios ?
Disabling any and all power saving and turbo boost, and clocking all cores the same. A 5960 I would clock at 3ghz on all cores to start somewhere and increase the clock if the temp allows it.
Setting memory timing to what the manufacturer recommends for the memory sticks, auto setting in bios sometimes get it wrong.
Also.
Updating any and all drivers, even non-audio related devices can have a disastrous influence if they cause interrupts.
LatencyMon is a good tool to find any problematic, programs and drivers.
Most of this can be Googled, if you need more information.

Anybody see any performance enhancement with a 3930K on 10?

I’ve been doing more tests during the weekend and I must say that, yes, according to the ASIO loads, there’s an improvement of about 5-10% between Cubase 9.5 and 10 in my system, which is indeed nice. However, as I said before, even with this gain in performance, I still can’t work reliably with hyperthreading activated due to the real-time glitches.

Well, nothing really esoteric I guess: 8-10 instances of Kontakt with single instruments, a couple of TAL-U-NO-LX, a couple of Reaktor ensembles… and, for effects, mainly Sountoys and Waves, being the NLS the most used plugin, as it’s almost in every track.

To be honest I don’t really know, as I don’t usually check the real-time peak meter while working on projects, and the average load has always stayed very steady with hyperthreading disabled. But you may be right, I’ll recheck the ASIO Guard status per plugin, just to be sure.

Yes, all the usual optimizations in BIOS and Windows are done, and all the drivers are updated. I even exchanged my good but old Nvidia GT-640 graphic card with a more modern Nvidia model, and also with a Radeon card, but they didn’t make a big difference in real-time performance for Cubase. All these tests were made when I purchased the 5960x machine a couple of years ago, and just before I finally decided to disable hyperthreading for the time being and call it a day, as the computer performs well enough for my needs.

Hello, sorry for the (forced) absence.
Thanks to all who posted their experience - feedback pretty much matches our in-house tests.

@Jorge: I’d recommend to get in contact with support - if it’s just one project, it should be relatively easy to find out.
(Spanish support is a few metres away from me, by the way :slight_smile: )

1809 didn’t change anything about MMCSS as far as I’m aware, but i wouldn’t be surprised if the registry key was wiped.
Cubase 10 just addresses resources differently and spawn threads in a more complex way, it doesn’t really do something ‘different with MMCSS’.

The new engine works and addresses resources in a very different way, but it’s way to complex to be explained here and I frankly don’t have full information myself yet. Also, the engine behaviour can be adapted to particular cases, if needed.

But:
– No, Cubase does not throttle anything
– No, it does not limit the amount of cores used in any way
– Plugins that require high-speed core speed will most likely continue to do so (load spread does not depend on Cubase alone)
– Cubase 10 can make use of as many logical threads as available, but it should be mentioned again that not all CPUs are designed for this kind of work-load: thread-synchronisation and amount of cores, base-clock speed vs. amount of cores, instructions sets and CPU architecture all need to be taken into account (but there are recents tests from DAW builders that hints at the fact that this ‘trend’ is changing)

(Microsoft provided us with information and the registry work-around, but the long-term solution Pete talks about is something totally different to this and requires an engine re-write with different APIs)

Hyper-threading needs to be reacivatend in your BIOS, usually in the CPU Frequency / Advanced section.
That CPU does NOT require the registry key, so hopefully you didn’t apply any…

Unlikely - the new behaviour is meant to improve performance with more than 14 cores.

Be careful with Waves. Some of them do not like Hyperthreading…

+1

Please someone from Steinberg, who has knowledge, clarify that this CPU will work to full potential with Cubase.

As I wrote above, Cubase 10 addresses resources differently, so the limitation to 14 logical cores is now gone.
Theoretically, any CPU can be used, and if issues arise the engine can be further tweaked.

But not all architectures are created equal or are tailored to a certain kind of processing - 2990WX at full potential? Difficult to say, honestly, the same CPU might work differently, even just switching the motherboard.
On the SCAN main page, http://www.scanproaudio.info/, you currently find an article about Cubase 10 directly on the homepage, with pretty detailed info on the 2990WX performance and benchmarks against other chips.
It’s really worth a read, particularly on the perfomance and overhead loss, which according to their test isn’t DAW-specific.
Also interesting to read about the stable OC figures.

Thanks for the link.

Since this “improvement” pretty sure does no more than automatically determining the max. possible count of realtime threads, instead of reading it from a file like audioengine.prop:
Why do you bother all the users with huge “beta” trials then? It would be more convenient for all, to provide a small tool/view which shows the auto detected count on the GUI to report here.

(Still a bit pissed on the answer of Steinberg support one year ago, I should myself contact Microsoft to ask for removal of the MMCSS limitation and that Steinberg can do nothing about it. I am sure your voice is heard when you contact MS, at least better than mine)

As I wrote in the OP, the way Cubase 10 addresses resources and spawnes threads is completely re-designed - NOT just automatic detection of the max threads.

Thanks for clarifying.
Are those new techniques and complete re-design then disabled if something is specified in the audioengine.prop file?
Because otherwise I do not really get what you are trying to get feedback on, considerig thread limit of the OS stays same.
Should be same for all kind of hardware - or are those improvements only active if the CPU could hit the limits?
Seems to be irrelevant to me HOW Cubase spawns threads, as long it does not spawn more than the limit of realtime threads.

But I have to admit I have very limited insight on this - and is more or less off topic.

Still I would prefer if Cubase and MS would adress the root cause itself or at least provide a timeline for it, to allow benefits of recent of multi core processors.

The feedback I’m gathering is to get more info from users I may not be in contact with and make sure the improvement is positively affecting as many systems as possible and that it matches our expectations.

It is general and always active: by all means a different design.
The improvements should be more visible in two main cases: 1. high core-count, and 2. low latency on multi-core machines (but the processing overhead to sync the threads on the OS side is not affected, of course).

It is very relevant, as the way Cubase spawned threads previously is what made the MMCSS limitation exacly 14 (if Cubase didn’t spawn prefetch threads at all, which is not possible and just an example, the limit would have been 28).

Microsoft won’t change it, as this would have ‘system-wide performance implications’ (quoting Pete Brown here) and as mentioned previously “the long-term solution Pete talks about is something totally different and requires an engine re-write with different APIs”, and this is not something that can be done quick and without very extensive testing.

Thanks for taking the time, this is useful (interesting and motivating) information.
I will have a closer look whether the change in strategy has effect on my projects (i9 7900X).

Is there any way to influence/reduce amount of prefetch threads, e.g. by structuring the project/busses/channels in a certain way?

Be warned about the architecture of AMD’s latest multi-die CPUs. Scan Pro Audio has already shown how this configuration is not ideal for real-time audio applications. The inter-core data transfer in the interposer presents delays compared to on-die and essentially results in NUMA like a multi-socket workstation. The same reason multi-socket is not always ideal for pro audio. See the articles:

http://www.scanproaudio.info/2017/08/14/first-look-at-the-amd-threadripper-1920x-1950x/
http://www.scanproaudio.info/2018/08/24/threadrippers-2990wx-2950x-on-the-bench-just-a-little-bit-of-history-repeating/

Fabio, are you saying that this “engine re-write with different APIs” is what we got with Cubase 10 or is this some interim step to get to that point?

No and no. This is what needs to be done, to fix the whole thing (or better said: this had to be started already very long time ago).
Apparently decision to do so is still not taken yet - C10 only tries to improve on the current problem.