Intel’s Arrow Lake chips aren’t winning any awards for gaming performance but I think its new E-cores deserve a gold star

When Intel first announced its new Core Ultra 200S series of desktop processors, all using its latest Arrow Lake architecture, eyebrows around the world collectively shot up. Not because Intel had wrestled power consumption in game back under control and not because HyperThreading was gone. The shock was that Intel publicly admitted that Core Ultra 200S chips just weren’t as good in gaming as 14th Gen Core models.

Just how much worse wasn’t obvious until we finally got to test out the Core Ultra 9 285K and Ultra 5 245K. In one fell swoop, Intel let go of the gaming crown it had been fighting AMD and its 3D V-Cache equipped processors over.

Intel presentation slide for its Core Ultra 200S processors

(Image credit: Intel)

But deep inside the chips, almost hidden under the overwhelming disappointment of Arrow Lake, are some little gems that I think deserve a lot more credit than they’re getting. They also bode well for the future of Intel’s hybrid architecture.

I’m talking about Arrow Lake’s Efficient cores (aka E-cores), codenamed Skymont. Traditionally, anything to do with gaming has been handled by the full-fat P-cores (P for Performance) and while that’s still the case with Arrow Lake, the removal of simultaneous multithreading in the P-cores, means that the little E-cores now have more to do.

We got a hint of just how much Intel had improved its E-cores when it launched Lunar Lake, the architecture for mobile platforms. But talk is one thing: actual performance figures are what matters.

So, to that end, after I’d finished testing and reviewing the Core Ultra 9 285K, I ran our suite of CPU gaming benchmarks again (with the same test setup), but this time in two configurations: (1) All E-cores disabled, so just eight P-cores, and (2) one P-core and seven E-cores.

This way the CPU would always have eight threads on offer, but in the second configuration, the E-cores would be called upon to do a lot more work than normal.

Cyberpunk 2077

First up is Cyberpunk 2077 and one can clearly see that this particular game doesn’t mind being limited to just the P-cores and isn’t super happy with having to rely on the E-cores. That’s because CD Projekt’s engine for the game (REDengine 4) is primarily designed around an eight-core CPU but will make use of more cores, if available.

But it’s actually not as bad as you might think. Compared to the full core configuration, the 1P+7E setup is only 14% slower on average, though the 1% low figures are worse, being 22% lower.

Compared to the P-cores, the E-cores have less L2 cache (a cluster of four shares 2 MB, whereas one P-core gets 3 MB all to itself) and the maximum boost clock is around 1.0 GHz slower. With regards to the latter, one P-core can run up to 24% faster than any E-core.

While you’ve got little chance of pushing the E-core clock speed up to the same level as a P-core, they can be overclocked a fair bit. I’ve not had a chance to fully explore this yet, due to BIOS problems with early Z890 motherboards, but I bet that 22% deficit in the 1% lows could be pulled back by upping the E-core’s clocks.

Baldur’s Gate 3

Baldur’s Gate 3 paints a similar picture to that with Cyberpunk 2077 and for the same reasons. Running with just eight P-cores doesn’t cause much of an issue, but replacing seven of them with E-cores induces a performance hit.

However, it’s only a 16% reduction on average, compared to the full chip, and the 1% low values drop by even less, just 15%.

Now, one might argue that the game is just relying almost entirely on one P-core, which is why the performance reduction isn’t very big. There is some merit to that argument as games aren’t multithreaded in the same way as an offline renderer or video encoder is.

Baldur’s Gate 3 does rely on more than one core though, so in the 1P+7E configuration, the E-cores are still doing plenty of the tasks required. And if you think the E-cores look impressive here, you’re in for a shock with the next game.

Homeworld 3

No, those figures aren’t a mistake. I’ve repeated them on three different motherboards and Homeworld 3 genuinely runs better on a Core Ultra 9 285K when it only has one P-core and seven E-cores.

Yes, the average frame rate is down, and I’m not suggesting that if you do have any Arrow Lake CPU, you should go and start disabling cores left, right, and centre. But that’s a 31% improvement to the 1% low figures!

It’s not immediately obvious why this game performs better on the E-cores compared to the P-cores but one possible explanation is L3 cache latency.

The whole chip has 32 MB of L3 cache but it’s actually split up into slices, with each P-core and E-core cluster having its own little slice. Cores access the other slices via the internal ring bus.

In the case of the 1P+7E configuration, those cores are all right next to each other, meaning L3 cache accesses are going to be fractionally quicker than when every core is in play.

Metro Exodus: Enhanced Edition

Metro Exodus is the oldest game in our CPU benchmark suite and it’s the least multithreaded of them all, putting the majority of the processor workload onto just a few cores.

But as you can clearly see, the game really doesn’t mind having to use E-cores. In fact, it looks like it prefers not having a full Arrow Lake chip, and this is down to thread scheduling.

This is why Intel recommends using its Application Optimization tool (APO) and Metro Exodus does run better when that’s installed and active. The gains aren’t massive but the average frame rate and 1% lows are both improved when using it.

Total War: Warhammer 3

I’ve saved the best for last. Warhammer 3 is another game that needs Intel’s APO to run better but even so, those Skymont E-cores are doing sterling work. Like Metro Exodus, this game is mostly single-core focused and the rest of the threads involved aren’t particularly intensive.

I suspect that L3 cache latency is playing a bigger role here than in Metro Exodus but it all makes me wonder if Intel doesn’t just need a bespoke thread scheduler to help games run better, but something to completely park individual cores, to create the perfect configuration.

Our CPU test suite also includes Factorio but my tests show no difference between the three core configurations, hence why there’s no separate graph for that game. But I do have one more to show you.

Power consumption

They’re called E-cores because they’re efficient in terms of die space and power consumption. A single E-core cluster, comprising four units, takes up roughly the same amount of die space as one P-core.

This graph clearly demonstrates just how much less power they also need. Some of this is down to the fact that they’re clocked a fair bit lower, but even so, E-cores absolutely sip at energy in gaming. It’s a different story when the whole chip is loaded up with threads (e.g. a Blender test) but in this situation, it’s an impressive sight.

I don’t think we’ll ever see Intel release a desktop CPU that’s almost entirely E-cores but if the next generation of Core Ultra processors can have them running a bit faster, Team Blue could well be back in with a fighting chance of snatching the gaming crown from AMD.

Until then, just stick with Raptor Lake or any of AMD’s AM5 socket chips. Arrow Lake has disappointed an awful lot of people, but those Skymont E-cores are darned impressive to me.

Source

About Author