Quantcast

Maximum PC

It is currently Sat Nov 22, 2014 6:18 am

All times are UTC - 8 hours




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Once more: Tackling the game/multicore problem
PostPosted: Sat Oct 18, 2014 3:15 pm 
Smithfield
Smithfield

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 5459
I'm opening the table to figure out how various games use multiple cores in a processor. My assertion is that not every game is stuck on the whole "two core" or "two thread" idea people keep pushing around. My issue is that people make this assertion as if it's common knowledge that has some credible starting point in history. I mean, I guess if you were to start this in 2000 and not update it since, sure. But this is 2014, multi-core systems have been a staple of computer systems since 2008 (2005 if you want to include the enthusiasts).

With that out of the way, let's get a few primers out of the way so people are caught up with terms and their usages (in a spoiler, so as to minimize the wall of text)

Spoiler: show
Processes and Threads
Let's take a baking analogy, since cooking is a nice analogy of how a program works.

A process is basically the program itself. In the sense of baking, a process is the entire entry on how to bake a cake. The act of baking that cake in its entirety is doing that process. A thread is a subunit of the process that is considered an independent set of instructions. For instance, one thread could be heating the oven, another gathering the ingredients. These are parallel tasks. If you had one person, that person could only set the oven or get the ingredients (order doesn't matter). If you had two people, one could set the oven while the other gets ingredients. You could add another person, but since the oven is a one person job (there's only one oven in our example), perhaps that person can gather other ingredients. However, you can only add so much before you've broken your tasks down to elementary tasks. Say our ingredients were flour, milk, and eggs. You could only assign four people total to these two tasks before adding more does nothing.

Resources
One of the major headaches with multithreading is resource management. Consider our baking example once more with four workers (by the way, we'll say they're very dumb, very literal, and are basically robots). Our resources are:

  • The oven
  • The ingredients (which we'll say are still flour, eggs, and milk)
  • The utensils (bowl, mixer, pan)

Those that are gathering ingredients need to be aware of what ingredients they have so they don't get assigned double duty. And even then, all of them should be aware that the oven was set. Then there's the problem with the utensils, say one worker is assigned to the bowl for mixing but is waiting on the mixer. Another worker is off to get it, but the worker mixing cannot do anything until the mixer arrives. In another scenario, if the oven is not set, then the process is effectively halted until it is. And then there's the issue of trying to schedule all workers so you maximize your utilization without doing extra work that's meaningless.

Amdahl's Law
Amdahl's Law, or argument, is how much a process can benefit from adding more cores to it. Basically, for example, if 50% of your task can be parallelized, then adding another core will only yield a 50% improvement. Adding another gives a 25% improvement, another a 12.5% improvement and so on. In short, the improvement approaches the inverse of the percentage as more cores are added. Like it or not, not everything (or maybe anything) can achieve 100% parallelization.

Making a Program Multi-threaded
In a complicated program like a game, there's a few ways to tackle how get a game rolling:
  • Run every step in sequence. For instance, 1 - Main game code (handle input, etc.) 2 - Run physics 3 - Run AI 4 - render graphics 5. play sound
  • Break up steps into threads, run in parallel if possible. It may still follow the previous example, but perhaps now you can run the physics and AI together, then run the graphics and sound.
  • Break up the threads further into smaller units, run them all in parallel when possible. For instance with the AI routine, process each entity in parallel, rather than in sequence.

So how does this affect gaming performance?
GPUs are still bound to the CPU because the CPU needs to feed the GPU jobs. At some point, the CPU gathers up the data from the results of other subroutines, figures out what the GPU needs to do with these, and issues the command. If the CPU is too busy with other game code, it doesn't send jobs to the CPU as fast as you'd like, and hence, FPS droops.


The Setup and Test
The hardware setup is the following:
  • CPU: Intel Core i5-4670K with a maximum clock speed of 1.6GHz
  • Motherboard: ASRock Z87E-ITX
  • RAM: 8GBx2 DDR3-1600 Crucial Ballistix Sport
  • GPU: EVGA GeForce GTX 980 Superclocked (Shaders: 1241MHz to 1342MHz. Memory: 7010MHz)
  • OS: Windows 8.1 Professional Update 1

Wait, wait wait... "LatiosXT, WTF is wrong with your CPU?" you might ask. I deliberately downclocked my CPU to amplify the effects of CPU dependencies.

The test will run an assortment of games and one benchmark utility.
  • Unigine Heaven - 640x480, all options turned off or minimal.
  • Unreal Tournament - 640x480, all options turned off. Ran CTF-Face with 16 bots
  • Unreal Tournament 2004 - 640x480, all options turned off or minimal. Ran CTF-Face with 32 bots
  • Unreal Tournament 3 - 800x600, all options turned off or minimal. Ran CTF-Suspense with 31 bots
  • Company of Heroes 2 - 1024x768, all options turned off or minimal, except physics (High, presumption is it's a CPU based one). Used the benchmarking tool
  • Sleeping Dogs - 640x480, all options turned off or minimal. Used the benchmarking tool.
  • Crysis 2 - 800x600, "High" preset.
  • F1 2013 - 640x480, all options turned off or minimal. Captured at Monaco Grand Prix after one lap
  • ARMA 3 - 640x480, "Lowest" preset.
  • Max Payne 3 - 640x480, all options turned off or minimal.

FRAPS was used in some tests to capture FPS. Simulating an x-core system was done by setting the process affinity. You also might be thinking... how come no multiplayer? My answer is it's harder to get consistent results (although I should fire up a bot-based game and crank it up). Also you're probably wondering "Why all low settings?". This eliminates the GPU from the equation as much as possible, as it's done rendering frames in the fastest time possible. Given that I have a GTX 980, how well it can spit out frames is a very good indicator of how well the CPU affects gaming performance under the conditions I'll give it.

I also chose games with a more or less real-time requirement. Sure, Civilization V is supposed to make use of all those cores, but it only happens at points where you don't care what the gaming performance is (when the computer opponents take their turns).

Results
Code:
Game                       FPS 1 Core  FPS 2 Cores FPS 3 Cores FPS 4 Cores
Unigine Heaven                     4            8          99         104
Unreal Tournament                 100       76.923         100         100
Unreal Tournament 2004            70           64          71          86
Unreal Tournament 3            15.58         76.9       105.6      124.36
Company of Heroes 2 Min         3.05         8.94       11.82          13
Company of Heroes 2 Max          9.5        27.81       39.45          43
Company of Heroes 2 Avg         6.65        20.71       29.61          32
Sleeping Dogs Avg               35.1         71.8      101.25       118.6
Sleeping Dogs Max               78.1        149.9       190.6       235.6
Sleeping Dogs Min                 11         35.5        44.6        46.3
Crysis 2                          34           76         118         130
F1 2013                           20           41          75          95
ARMA 3                             0           51          60          65
Max Payne 3                        9           38          61          74


Handy graph:
Spoiler: show
Image


Conclusions
Let's do this by app:

Unigine Heaven: Rendering engine is capped at three cores of utilization... which is odd considering how low it is at one and two.
Unreal Tournament: No surprises (that blip must've been me being human). Though it's also interesting that the maximum FPS is about 100 in the setup.
Unreal Tournament 2004: This one I'll buy is a single core game, but it's definitely not a "one/two thread" game (in fact, UT had a dozen or so).
Unreal Tournament 3: While there was a huge jump from one core to two cores, there were still noticeable jumps from two to three, and three to four. UE3 is definitely parallelzing some tasks, but they don't majorly impact the overall performance that much (AI, maybe?)
Company of Heroes 2: This is the most interesting one to me, as it seems to have an exponential decay on all stats. Meaning that the game engine is probably as parallelized as much as it possibly can.
Sleeping Dogs: This one I think was also pretty interesting. While the minimum FPS fell flat after adding a third core, the maximum FPS kept climbing up in major steps. Average though had a linear climb... so this could imply Sleeping Dogs does have a good degree of parallelization
Crysis 2: Significant jumps from one to two and two to three, but not as dramatic as three to four (but still noticeable). Might be in the same vein as CoH2.
F1 2013: This one I thought for certain was going to be rather flat, but there was a linear increase as cores were added, indicating F1 is also very parallelized.
ARMA 3: The results made me scratch my head. Yes, the game practically froze at 1 core. 2 cores and beyond though and it stayed more or less the same. Seems like Bohemia Entertainment was fibbing on how ARMA 3 was efficient on multicore systems.
Max Payne 3: While it'd probably be better to test RAGE on say GTA IV, Max Payne 3 I figured was a nice "another shooter to test" game. But it also showed the same decay as CoH2, indicating at least some parallelization.

So there we go. If there's one takeaway I can get from this, it does disprove that games are "limited to two cores", because if that were the case, these games would fall flat after the second core is enabled. I could do a few more games since I have a variety in my Steam account (I'm thinking Shogun 2 and Skyrim would be good ones... I don't know why I didn't do it).

Oh, and if you're curious, if we go by Passmark's score, the i5-4670K at that clock speed represents the same performance level as a Core 2 Quad Q6505, Athlon II X4 650, Core i3-2100, or an AMD FX-7750 at four cores. A AMD A6-3600, Pentium G2020, or Athlon II X4 605e at three cores. A Celeron B840 or Athlon 64 FX-72 at two cores. And a bunch of laptop processors from 2006 at one core.


Top
  Profile  
 
 Post subject: Re: Once more: Tackling the game/multicore problem
PostPosted: Sun Oct 19, 2014 6:35 am 
Coppermine
Coppermine
User avatar

Joined: Wed Mar 25, 2009 4:40 pm
Posts: 732
Wow! That is really impressive. Not only did you do a nice job of analyzing the issue, but from the little testing I've done, I know how time consuming this can be.

I was curious. Since you mention you lowered the CPU speed to amplify the scenario, did you run your test with the CPU at full speed? If yes, did you still see an increase with added cores, or were the cores sufficient enough at stock speeds that they didn't have to utilize 3+ cores?

Second, and here's the real question: let's say I'm building a rig mainly for gaming. Sure, I'll surf the web and do some other mundane items, but my real purpose for building it is gaming. Based on my budget, I've identified two options for the CPU, a higher clocked i3 4370 (3.8 ) or a lower clocked but 4 core i5 4460 (3.2) (for the sake of this scenario, let's say they're priced the same). Which one would you recommend I get?


Top
  Profile  
 
 Post subject: Re: Once more: Tackling the game/multicore problem
PostPosted: Sun Oct 19, 2014 6:51 am 
Smithfield
Smithfield

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 5459
btdog wrote:
I was curious. Since you mention you lowered the CPU speed to amplify the scenario, did you run your test with the CPU at full speed? If yes, did you still see an increase with added cores, or were the cores sufficient enough at stock speeds that they didn't have to utilize 3+ cores?

Actually that brings up a good point. Depending on how the game is structured, a faster CPU could get everything done on just one or two cores. Going under the assumption that the graphics job builder is a single threaded tasks, that could give the appearance that games are only capable of using two cores maximum. So based on that, the games that were decaying exponentially (Company of Heroes 2, Crysis 2, Max Payne 3) should see an even faster drop off.

Quote:
Second, and here's the real question: let's say I'm building a rig mainly for gaming. Sure, I'll surf the web and do some other mundane items, but my real purpose for building it is gaming. Based on my budget, I've identified two options for the CPU, a higher clocked i3 4370 (3.8 ) or a lower clocked but 4 core i5 4460 (3.2) (for the sake of this scenario, let's say they're priced the same). Which one would you recommend I get?

We'll find out once I get the results!

Although hyperthreading is a curious beast, since I don't have it. I could attempt this test on my laptop which does have an i7-4700HQ


Top
  Profile  
 
 Post subject: Re: Once more: Tackling the game/multicore problem
PostPosted: Sun Oct 19, 2014 1:39 pm 
Smithfield
Smithfield

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 5459
At btdog's request, I ran the same tests but with the full potential (actually a little more) of my processor. This time the CPU will operate at a maximum frequency of 4.4GHz. I've also modified my testing a little. I'm using FRAPS to capture a minute of game time repeating the same scenarios for each capture (except two cases... due to preconceived presumptions). If the game has a benchmark utility, I've opted for that instead. And I've added another game. So once more, here's the games, and the test scenario:

  • Unigine Heaven: Benchmark utility.
  • Unreal Tournament: Used "stat fps" to view FPS games. Presumption is it wouldn't change across cores. Ran CTF-Face with 16 bots as spectator
  • Unreal Tournament 2004: Ditto for Unreal Tournament. Ran CTF-FaceClassic with 32 bots
  • Unreal Tournament 3: Used FRAPS. Ran CTF-Suspense with 31 bots.
  • Company of Heroes 2: Used benchmark utility
  • Sleeping Dogs: Used benchmark utility
  • Crysis 2: Used FRAPS. Captured at the first level just before you go outside and learn about the visor.
  • F1 2013: Used FRAPS. Used the Monaco Grand Prix track, starting the capture session on the green light
  • ARMA 3: Used FRAPS. Played Showcase -> Infantry. Started the capture as soon as the Sergeant orders to go forward
  • Max Payne 3: Used FRAPS. Capture at the first level after Max slides down to take down a mook.
  • Shogun 2: Total War - Used the 720p DX11 benchmark utility

Results
Code:
Game/App                    FPS 1 Core   FPS 2 Cores   FPS 3 Cores   FPS 4 Cores
Unigine Heaven - FPS             4.6         13.5         309.5          315.9
Unigine Heaven - Score           115          340          7797           7957
Unigine Heaven - Min FPS         3.7         10.5          72.3           77.5
Unigine Heaven - Max FPS         5.1         23.1         565.4          583.9
Unreal Tournament             303.03       303.03        303.03         303.03
Unreal Tournament 2004           150          150           150            150
Unreal Tournament 3 - Min         29           86           181            193
Unreal Tournament 3 - Max         44          158           253            259
Unreal Tournament 3 - Avg     38.133      128.917       224.783            235
Company of Heroes 2 Min        13.41        48.56          60.4          61.51
Company of Heroes 2 Max        35.02        72.71         90.73          99.11
Company of Heroes 2 Avg        26.89        57.67         71.66          76.17
Sleeping Dogs - Min             89.7        175.8         234.7          268.2
Sleeping Dogs - Max            196.7        363.6         456.4          537.2
Sleeping Dogs - Avg             15.8         84.4          99.3          109.5
Crysis 2 - Min                    61          104           149            136
Crysis 2 - Max                   136          217           246            249
Crysis 2 - Avg                86.367      157.383       188.117        202.067
F1 2013 - Min                     66          112           116            115
F1 2013 - Max                     85          121           123            124
F1 2013 - Avg                 72.417      116.75        119.417          119.7
ARMA 3 - Min                     WNR           80           113            115
ARMA 3 - Max                     WNR          112           154            157
ARMA 3 - Avg                     WNR       96.767       132.417         138.85
Max Payne 3 - Min                 43           79           148            161
Max Payne 3 - Max                 96          177           227            249
Max Payne 3 - Avg             63.717      116.167       175.683          199.9
Shogun 2: Total War - Min         44           94           101            104
Shogun 2: Total War - Max         99          222           246            245
Shogun 2: Total War - Avg         70          172           187            187


And a handy graph (this time I found a better way to represent the data)
Spoiler: show
Image


Conclusions
Most of the games showed an exponential decay this time! And curiously, a lot of them do top out once the third core is added or even the second. Take for instance F1 2013. When I lowered the clock speed, the game was showing linear improvements with each core added. After the second core was added, it plateaued. Even Sleeping Dogs, which also showed great multi-core support at lower speeds jumped up greatly at two cores, but not so much at three, as far as the average goes (but the min and max had noticeable improvements). ARMA III turned out to be another headscratcher, increasing performance noticeably at three cores over two. Of course, Unreal Tournament and Unreal Tournament 2004 showed no surprises.

Now someone might take this as "see? clearly modern games can't take advantage of more than two cores!"... Except the previous testing showed that some games show consistent improvement as more cores are added. What's going on here? I think the overall take away is that no matter how much you've made your program's tasks run in parallel, if the theoretical throughput of each individual core is high enough, it will outdo multicore processors to a certain degree. Here's a scenario to consider. Say you have a quad core processor and three tasks: A, B, and C. A takes 1 unit of time to complete, B takes 2 units, and C takes 3 units. Assume that all three tasks are completely independent, do not wait on any input, and are simply ready to go.

A naive approach to scheduling is to simply give each core one of the tasks. The total time to completion is 3 units of time. However, this is actually wasteful. In this scenario, all other things considered equal, a dual-core processor will perform the same as the quad core. How? Since the minimum time to completion is 3 units, smartly schedule tasks A and B, which equal 3 units of time, to one core. Give the other core task C. You are now using two cores and achieving the same throughput. Similarly, if we had four tasks, D, E, F, G with time to completion of 1, 2, 1, and 4 units of time respectively, the same scheduling routine will still allow the dual-core processor to keep up with the quad-core processor.

There's also the fact that if your processor is fast enough, fewer cores can do the same amount of jobs as more cores of a lower speed. Consider once more tasks A, B, and C with a completion time of 1, 2, and 3 units of time. If assign all the tasks to one core, the total time to completion is 6 units of time. But if you double the speed of that core, it's now at the same performance level of the quad-core processor. Even if the processor is four times the speed, it will be as fast, if not faster. It doesn't have to be clock speed either, but throughput.

In short, no matter how many tasks you have and no matter how broken down each task is into subtasks, you can and possibly will run into a scenario where the scheduler (which is OS dependent) gives one core a bunch of smaller routines and another a larger routine, canceling out the benefits of adding more cores. The only exception is if every task has the same time to completion or very close to it. This is why computations like image effects and video compression scale very nicely with multi-core systems. It doesn't matter what pixel you have, you're going to apply the same exact math formula as everything else. Even if the operation is "pixel * value" and the pixel value is 0, the computer is going to calculate it (well, possibly. It may be smart enough to figure out that it's 0).

So are modern, and by that, games made for last generation consoles, multi-core capable? I think the answer is yes. But the problem is we throw parts so fast at it, and games were designed for aging parts after all, that those parts can get the same amount of work done with less. There's also the case where some of these games were hitting the limits of the GPU. Yes, that means that's the absolute fastest the GPU can render those games.

Actually...
I went into this experiment knowing very well that doing this test at my processor's full potential would yield results that would bolster the assertion that games were only "dual/tri-core capable". This is why the first test was with a lowered processor speed. I know that OS scheduling algorithms are multi-core processor capable and are designed around them to make use of those resources efficiently. Efficiency means better energy savings which is what the trend back in the mid-2000s was going for. In other words, I wanted to see if I could make games break those scheduling algorithms to start using all of the cores, to force one core at 100% utilization as much as possible in order to make the OS scheduler think when I enable another core to go "Oh, I have another worker, I better start sending jobs to it", rather than have all those tasks float around all available cores with the most efficient scheduling.

And honestly, game developers (the people who actually do the coding) are not stupid. They're the closest thing to a marriage of application programming and systems programming, which requires an intimate knowledge of how the system works. Game developers want to squeeze the most out of the target system (or at least the "lowest common denominator" as PC Gamers like to scoff). In fact, deferred rendering, which is starting to become the norm for many game engines, was developed on a console (The first Xbox) to achieve certain lighting effects that many modern games use.

So to answer this question:
Quote:
Second, and here's the real question: let's say I'm building a rig mainly for gaming. Sure, I'll surf the web and do some other mundane items, but my real purpose for building it is gaming. Based on my budget, I've identified two options for the CPU, a higher clocked i3 4370 (3.8 ) or a lower clocked but 4 core i5 4460 (3.2) (for the sake of this scenario, let's say they're priced the same). Which one would you recommend I get?

The quad core is more of an investment into how long you can go without needing to upgrade the CPU. Assuming games get more complicated, the dual-core with Hyperthreading will hit the dreaded and mysterious "CPU bottleneck" sooner than the quad-core. Even though Hyperthreading does give the impression of a logical core per physical one, it doesn't actually give a high throughput. Most pin it at about 20% at best. But it's really hard to say when the dual-core will run dry and when the quad-core will.


Top
  Profile  
 
 Post subject: Re: Once more: Tackling the game/multicore problem
PostPosted: Wed Oct 22, 2014 2:53 pm 
Team Member Top 500
Team Member Top 500
User avatar

Joined: Thu Jul 21, 2005 10:45 pm
Posts: 3036
Location: Central Florida
Nice work!!! But you do realize, don't you, that multi threading as you've been using it refers to process multi threading which is reported by Windows. But when we say an application or game is dual-threaded, that means it's written to take advantage of 2 logical CPU cores, or when it's multi-threaded, it can take advantage of more than 4 logical CPU's which has no direct correlation to process multi-threading. Instead of saying that F1 2013 is "dual-core aware", we say that it is "dual-threaded". It's no different than minute and minute.... one is generally considered a measure of time, the other simply means "very small".

UT3 did surprise me, but maybe because it was updated since release??? Or maybe I was just told incorrectly by a supposed developer at the AMD Tech Tour '07 that it was dual-core aware??? The only other surprise in your lists was Shogun 2: Total War, I'm pretty sure I was told it was quad-core aware, but doesn't seem to benefit beyond 3 cores, or maybe it's just that the modern GPU unburdens it enough to not be noticeable???

LatiosXT wrote:
The quad core is more of an investment into how long you can go without needing to upgrade the CPU. Assuming games get more complicated, the dual-core with Hyperthreading will hit the dreaded and mysterious "CPU bottleneck" sooner than the quad-core. Even though Hyperthreading does give the impression of a logical core per physical one, it doesn't actually give a high throughput. Most pin it at about 20% at best. But it's really hard to say when the dual-core will run dry and when the quad-core will.

On that... games & game engines have to be made HyperThreading aware for them to even be able to utilize HT properly or at all, so the dual-core with HT will run out of steam very quickly Vs a quad-core without. Unfortunately, the only multi-core CPU I have now, is my i7-4700HQ, but at least it has HT, so I am able to play around with HT a little bit, and I can tell you that Unigine Engine is definitely not HT aware. I don't have my games loaded onto the laptop yet, so not sure which of the games I own do or don't benefit from HT.

My results with Unigine Valley Benchmark - settings below
Image

I didn't start playing with HT cores till I reached 3 cores - for 3 core, I used CPU 0, 3, & 6 - for 4 core I used all HT CPU cores (1, 3, 5, 7) - In Windows on an Intel CPU with HyperThreading, the real cores are the even numbered cores, and the HT cores are the odd numbered cores.

No affinity changes - FPS 17.5 - Min 10.3 - Max 33.4 - Score 733

1 core - FPS 4.1 - Min 3.4 - Max 4.6 - Score 170

2 core - FPS 8.7 - Min 3.4 - Max 14.0 - Score 364

3 core w/HT - FPS 9.8 - Min 4.1 - Max 16.7 - Score 411

3 core no HT - FPS 17.0 - Min 7.5 - Max 33.4 - Score 712

4 core all HT - FPS 5.5 - Min 3.1 - Max 20.4 - Score 229

4 core no HT - FPS 17.5 - Min 9.6 - Max 34.0 - Score 731


Also did some screenshots during each test to show CPU Utilization levels, and you're going to notice something odd when I get down to 3 core & 4 core without HT....

Single core running on CPU 4 - if you look closely you can see a couple down dips between scenes on CPU 4
Image

Dual core running on CPU 0 & 4
Image

triple core running on CPU 0, 3, & 6 - notice core 3 is sitting at zero utilization.
Image

Triple core running on CPU 0, 2, & 4
Image

Quad core running on CPU 1, 3, 5, & 7
Image

Quad core running on CPU 0, 2, 4, & 6
Image

As you can see the triple core no HT test appears to be using 4 cores, while the quad core no HT test appears to be using 3 cores.... No I didn't mix up the screen shots, though initially I thought I had, so I re-ran the test yesterday morning, and came up with the same odd CPU utilization graph. My best guess, Unigine engine doesn't like it when you Alt Tab out to change CPU affinity ???

Also going to show why you cannot look at CPU utilization graphs with Vista or later to see how many cores a game uses.... Anyone can verify this by running Cinebench 11.5 or R15 too. In this example, I'm using Cinebench 11.5, & manually set "custom number of render threads" to 4 from "Preferences":
Image
Notice, core 1 (a HT core btw...) has the largest load, but 2, 3, 5, & 6 all have medium to high utilization, & all cores have some utilization even though Cinebench was told to only use 4 CPU's. This is because Windows is maximizing available resources. I ran this test twice, deleting & reinstalling Cinebench after the first run, each ran had same behaviour & both scored 5.56.

I then manually forced Windows to only use real cores & ran twice again, deleting & reinstalling Cinebench after the first run, each of these runs scored 5.52
Image

Then finally I ran it on all 8 threads.
Image
6.32 which when calculated against the 4 CPU runs shows HT only contributing 14.5%

Unfortunately, since the laptop auto-adjusts CPU core speed on the fly, this isn't exactly "high accuracty data" & it's most likely just a fluke that I got the same score twice for two different tests.


Top
  Profile  
 
 Post subject: Re: Once more: Tackling the game/multicore problem
PostPosted: Wed Oct 22, 2014 8:26 pm 
Smithfield
Smithfield

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 5459
chaosdsm wrote:
Nice work!!! But you do realize, don't you, that multi threading as you've been using it refers to process multi threading which is reported by Windows. But when we say an application or game is dual-threaded, that means it's written to take advantage of 2 logical CPU cores, or when it's multi-threaded, it can take advantage of more than 4 logical CPU's which has no direct correlation to process multi-threading. Instead of saying that F1 2013 is "dual-core aware", we say that it is "dual-threaded". It's no different than minute and minute.... one is generally considered a measure of time, the other simply means "very small".

The problem I have with the term "dual threaded" is that it's made up, and any one with an understanding of computer systems should call BS on that. Games are application software, they are meant to run on a theoretical computer that it doesn't know the specs about. Oh sure, installers can make a system check for enumerations, but get past that and attempt to run, the program's going to run as long as it's not using anything special hardware wise. I got Oblivion to run on a system that technically didn't meet requirements (the GPU was a GeForce FX 5600, instead of the required FX 5700). It ran slower than a slug about to die, but it had no idea it was running on inferior hardware and didn't say "hey, get a better GPU!".

And also to really make sure I have my stuff down, I went and had a look at Modern Operating Systems 3rd Edition by Andrew Tanenbaum. There was a chapter on how Windows Vista works. Yes, a bit old, but it's NT6 and NT6 is what we use. Anyway, it lays out how Windows' scheduler works: "Threads are the kernel's abstraction for scheduling the CPU...". And scheduler runs a preemptive priority based queue. Threads from each process fill these queues and the kernel schedule has no idea which thread belongs to which process. That is, it doesn't pick a process and then a thread, it just picks a thread. And when it picks a thread, it schedules it on whatever available resource there is. If you set a processor affinity, the scheduler is just going to pick the processors flagged instead of any of them that are available. The process has no control over this scheduling and hence the process has no idea how many cores it has and thus can't be limited to scheduling on two cores at any given time. Scheduling is the job of the kernel, not the application, because if the application were in charge, we'd be back to cooperative multitasking.

So then if games have multiple threads (Unreal Tournament 2004 for instance could have a dozen threads), why can't they scale? It's because of how each thread was programmed to interact with each other. Traditionally, games were designed using a synchronous model, where threads were waiting for another thread to complete. If we had, for instance, threads for I/O, game code, animation, physics, and video, the runtime would be something like the video is waiting for physics and animation to be run, the physics and animation are waiting on game logic (which includes AI and all that), and the game logic is waiting on the I/O. But since multicore processors started to become a thing, they started switching to an asynchronous model. Now the game logic, the animation and physics routines can run in parallel while the video just nabs at some point in time the latest data. Besides that, game devs needed to get on their act on how to use multiple cores effectively with the Xbox 360 and PS3 being multicore systems (or like I said before, if games still were truly "dual threaded", we should fire the entire game industry because they were wasting power for 8+ years). The moment you decouple threads from each other, there's no "it can only run on two cores" deal. It's either you all wait in line, or you all go do your thing. If you had an assembly line with four workers would you tell them "oh sorry, only two of you can work at once" when they all can work?

So anyway... I went back and ran the most interesting game between us, F1 2013. Along with the 60 second FPS capture, I left Task Manager up to see the CPU utilization... I'm not going to post pictures because I'm feeling too lazy to do that but:
Code:
              FPS 1 Core    FPS 2 Core    FPS 3 Core        FPS 4 Core
1.6GHz Min           DNR            49            58               101
1.6GHz Max           DNR            64            96               118
1.6GHz Avg           DNR        56.617        83.083             109.8
1.6GHz CPU%          DNR       95%/92%   90%/90%/88%   90%/90%/88%/88%
4.35GHz Min           51           114           114               115
4.35GHz Max           82           121           123               124
4.35GHz Avg         73.5         117.7       119.017            119.25
4.35GHz CPU%        100%     100%/100%   80%/80%/80%   72%/65%/65%/62%


What can I take from this?... Perhaps that we're thinking too much about something. That we think that better CPU performance automatically means better graphical performance. That if F1 2013 was idling 40% of the time with four cores, it should be using that CPU power to process more frames for the GPU! Well the other factor I'm wondering is what is my GPU's maximum potential? What if on the GTX 670 I used to have the frame rates weren't as high as these? I don't know.

But I still can take away that the game kept all of the cores fairly busy and it still responds accordingly when you enable more cores if the CPU performance is low enough and that games are not limited by some arbitrary number of cores.

Some further reading:
http://www.geeks3d.com/20100418/game-en ... resources/
http://www.gamasutra.com/view/feature/1 ... ngine_.php (note this is from 2005)
http://www.gamasutra.com/view/feature/2 ... basics.php


Top
  Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group

© 2014 Future US, Inc. All rights reserved.