Quantcast

Maximum PC

It is currently Sun Jul 13, 2014 7:13 am

All times are UTC - 8 hours




Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: How hard is it to change a program to use multiple cpu/gpus?
PostPosted: Fri Apr 06, 2012 5:25 pm 
Million Club
Million Club
User avatar

Joined: Thu Aug 07, 2008 6:30 am
Posts: 231
I'm a non-programmer wondering. I often play an MMO that's pretty well developed, but still can only make use of a single cpu and single gpu according to the developers. Would changing a program to enable it to make use of multi-cored cpus and SLI/X-Fire be that difficult?


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Fri Apr 06, 2012 5:34 pm 
Clawhammer
Clawhammer

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 4983
Multi-GPU is a little more easier, since basically the only "universally compatible" mode is alternate frame rendering.

However, multi-core programming is trickier because data that can be split amongst cores must be independent and they don't require the results of one another. And it's usually something you have to design from the start, otherwise it would require a lot of code review to see what can be optimized. But as an example (and I'm not sure if it's proper), if you had the following:
Code:
A = 2;
B = 3;
C = 0;
A = B * 2;
B = A + C;
C = A + B;


The last three operations can't be split up, because they all depend on each other in some form or fashion.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Sun Apr 08, 2012 9:40 am 
Million Club
Million Club
User avatar

Joined: Thu Aug 07, 2008 6:30 am
Posts: 231
Thanks for the reply. That's pretty much what I had guessed about the cpus. This is just the first time I had it confirmed by anyone. That's too bad. A lot of the players have lesser PCs and need all the help they can get. :(


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Mon Apr 23, 2012 1:35 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
twistedspark wrote:
Would changing a program to enable it to make use of multi-cored cpus and SLI/X-Fire be that difficult?

This is actually a fairly difficult question requiring knowledge and insight from quite a few different perspectives.

In terms of engineering, it can be vastly more difficult to write multi-programmed or multi-threaded software, a more general term is concurrent software, than normal, non-concurrent, software. The only real exception involves what are known as embarrassingly parallel problems.

Given that a video game isn't going to fall into the previous problem category, we need to consider how much speedup we're likely to achieve from writing parallel code and compare this to an estimate of the additional cost/effort. The speedup can be calculated using Amdahl's law. The additional effort is going to depend on the type of game, how it can be structured in terms of design, etc. I would guestimate that at a multi-threaded version of a typical video game would cost at least 3x as much to develop as a single-threaded version (and I don't think that it will provide the company with any where close to 3x the profit).

From a theoretical computer science perspective, those problems which benefit from parallel programming (or equivalently parallel circuitry) fall into the NC complexity class. This gets to the crux of the reason that LatiosXT is able to state that the rendering can be more easily accomplished on multiple GPUs whereas the pseudo-code that he provided is not as easy to write as parallel code. The first problem is in NC (ie the fact that multi-core GPUs exist is evidence of this assertion) and the second isn't in NC.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Tue Jun 12, 2012 12:16 pm 
Bitchin' Fast 3D Z8000
Bitchin' Fast 3D Z8000
User avatar

Joined: Mon Jun 14, 2004 4:04 pm
Posts: 985
Location: Earth
twistedspark wrote:
I'm a non-programmer wondering. I often play an MMO that's pretty well developed, but still can only make use of a single cpu and single gpu according to the developers. Would changing a program to enable it to make use of multi-cored cpus and SLI/X-Fire be that difficult?


SLI/X-Fire abstracts multi-video card computing fooling the application into thinking that it is talking to one video card. So as long as you code against a framework (in this case DirectX for Windows) a lot of the work comes for free. However, making use of multi-core CPU's is a little challenging but Microsoft had made it simple to develop against it using Task Parallel Library. However games require mostly GPU power and nVidia and AMD have done a stupendous job of writing drivers and API's that abstract alot of the hardwork.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Tue Jun 12, 2012 4:38 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
Aha... LatiosXT has provided me with a pretty good functional programming exercise! I'm certain that he regrets posting this now! JK... =)
LatiosXT wrote:
But as an example (and I'm not sure if it's proper), if you had the following:
Code:
A = 2;
B = 3;
C = 0;
A = B * 2;  //A = 6
B = A + C;  //B = 6
C = A + B;  //C = 12

Note -- I added the comments with the current values for each variable in the code above.

While the code that LatiosXT provided was meant to be a trivial example illustrating one of the dependency problems that exists when writing concurrent programs, it is also a pretty good representation of something we might have to compute as programmers. ABC might represent coefficients in an equation or system of equations. They might represent growth rates or some other mathematical model. By rewriting the code using more of a functional programming approach, we are able to distill a bit of the essence of what is really happening.

Code:
(defun twice (x) (* 2 x))
(setf a 2 b 3 c 0)
(psetf a (twice b)
       b (+ (twice b) c)
       c (+ (twice b) (twice b) c))
(values a b c)  ;;returns 6 6 12

Aha... A is really just twice B; B is twice B plus C; and so on. Easy enough. Sure, this isn't parallel programming, but abstraction is usually the most important step in writing good software. Now if both the equation and initial coefficients are known at compile time, we could rewrite this code using _even_ a C style macro to set the final values of A B and C at compile time. I tend to prefer self-preservation over self-punishment, so I've written the macro in Lisp.

Code:
(setf a 2 b 3 c 0)
(defmacro twice (x) `(* 2 ,x))
(defmacro twice-eq (a b c)
  `(setf a ,(twice b)
    b ,(+ (twice b) c)
    c ,(+ (twice b) (twice b) c)))

;;The compiler would expand (twice-eq a b c) into ...
CL-USER> (macroexpand-1 '(twice-eq a b c))
(SETF A 6 B 6 C 12)  ;;... a simple assignment form

Alright, the example that LatiosXT has given is so simple that with little effort we can reduce it into O(1) at run-time (by performing the computation at compile-time), or if we didn't know the coefficients until run-time, we could rewrite the code to be O of the given function at run-time. The point is that taking a more functional approach allowed us to identify a pattern. We're likely to encounter far more complex problems at work/school though. Perhaps both the initial values and function are not known until run-time (ie the user selects the function). Also, the user might select a probabilistic function whose computation must occur run-time (due to issues with seeding a rand function with the system clock). In general, a functional programming approach makes handling these tasks easier than using an imperative programming approach.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Fri Jun 15, 2012 7:12 pm 
Team Member Top 50
Team Member Top 50

Joined: Sat Jun 25, 2005 11:04 am
Posts: 1026
I don't see how functional programming would really add anything in this instance. Maybe because this is too simple of an example. By the way, do you have any recommendations for resources to learn functional programming (both the theory and actually programming in it)?


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Sat Jun 16, 2012 9:01 am 
Clawhammer
Clawhammer

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 4983
The example was too simple, but it was the best I could come up with to show dependencies of data. For all we know, A, B, and C, could be accessible variables within different threads, or different processes. Like for instance, if you have a game where there's a physics, an AI, and graphics routine, I would think the physics routine needs to be done first, otherwise the AI can't decide how it should move, which in turn, both need to finish so the graphics routine can render the scene properly.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Sun Jun 17, 2012 3:09 am 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
mag wrote:
I don't see how functional programming would really add anything in this instance. Maybe because this is too simple of an example.

In this case, I was using the term functional programming pretty loosely suggesting that thinking in these terms might provide some additional insight to understanding a problem. By substituting the function twice() for the addition and multiplication operations, all of the dependencies were removed.
Code:
(defun twice (x) (* 2 x))
(setf a 2 b 3 c 0)
(setf a (twice b)                          //ALL OF THESE ASSIGNMENTS CAN OCCUR IN PARALLEL NOW
       b (+ (twice b) c)
       c (+ (twice b) (twice b) c))
(values a b c)  ;;returns 6 6 12


mag wrote:
By the way, do you have any recommendations for resources to learn functional programming (both the theory and actually programming in it)?

I've learned FP mainly through books in my personal library and studying Lisp programming (online and offline), so this is one case where my knowledge of a subject is far more practical than theoretical. I believe a paper by Backus in '78 was one of the foundation for a lot of the later work in FP; However, unless your math background is pretty solid, I would suggest skipping this paper for the time being and focusing on some lower hanging fruit. Start with the FP article on Wikipedia. If you haven't done much programming outside of mainstream Java and C++, I would suggest reading about continuations, coroutines and generators. Hell, just watch all of the SICP courses while you're at it. =)

Defmacro has a pretty good introduction/overview on FP. It's a bit too verbose but easy to get though. I believe that Joel on Software also discusses FP a bit in an article on MapReduce.

This recently caught my eye. I've just started reading it, but I've liked most of it...
http://matt.might.net/articles/programm ... oroutines/

I'll keep an eye out for some more articles over the next few days and update the thread if I come across anything.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Wed Jun 20, 2012 7:51 pm 
Team Member Top 50
Team Member Top 50

Joined: Sat Jun 25, 2005 11:04 am
Posts: 1026
Gadget wrote:
mag wrote:
I don't see how functional programming would really add anything in this instance. Maybe because this is too simple of an example.

In this case, I was using the term functional programming pretty loosely suggesting that thinking in these terms might provide some additional insight to understanding a problem. By substituting the function twice() for the addition and multiplication operations, all of the dependencies were removed.
Code:
(defun twice (x) (* 2 x))
(setf a 2 b 3 c 0)
(setf a (twice b)                          //ALL OF THESE ASSIGNMENTS CAN OCCUR IN PARALLEL NOW
       b (+ (twice b) c)
       c (+ (twice b) (twice b) c))
(values a b c)  ;;returns 6 6 12

Based on this example, I don't see a real benefit, since 2*b would have to be computed for a, b, and c (as illustrated below). c could be further improved to (x=4*b, c=x+c), but you'd only save the difference between an assignment (y=c) and an addition (x=x+x). I realize a, b, and c could be operations that take a lot longer than basic instructions, but I'd think these lower level optimizations would be handled by the compiler, especially in higher level languages.

Code:
Parallel:
a            b            c
a=2*b       y=c         x=2*b
             b=2*b      x=x+x
             b=b+y      c=x+c
// y=c is done so there is no dependency between b and c

Serial:
a=2*b
b=a+c
c=a+b

Gadget wrote:
mag wrote:
By the way, do you have any recommendations for resources to learn functional programming (both the theory and actually programming in it)?

I've learned FP mainly through books in my personal library and studying Lisp programming (online and offline), so this is one case where my knowledge of a subject is far more practical than theoretical. I believe a paper by Backus in '78 was one of the foundation for a lot of the later work in FP; However, unless your math background is pretty solid, I would suggest skipping this paper for the time being and focusing on some lower hanging fruit. Start with the FP article on Wikipedia. If you haven't done much programming outside of mainstream Java and C++, I would suggest reading about continuations, coroutines and generators. Hell, just watch all of the SICP courses while you're at it. =)

Defmacro has a pretty good introduction/overview on FP. It's a bit too verbose but easy to get though. I believe that Joel on Software also discusses FP a bit in an article on MapReduce.

This recently caught my eye. I've just started reading it, but I've liked most of it...
http://matt.might.net/articles/programm ... oroutines/

I'll keep an eye out for some more articles over the next few days and update the thread if I come across anything.

Thanks for those resources, I'll take a look at them. Could you list some of the books (I've got some use/lose points on Safari Books).


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Sun Jun 24, 2012 6:28 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
mag wrote:
Based on this example, I don't see a real benefit, since 2*b would have to be computed for a, b, and c (as illustrated below). c could be further improved to (x=4*b, c=x+c), but you'd only save the difference between an assignment (y=c) and an addition (x=x+x). I realize a, b, and c could be operations that take a lot longer than basic instructions, but I'd think these lower level optimizations would be handled by the compiler, especially in higher level languages.

Actually, compiler optimization is a good point. Because (pure) functional programming languages do not have side-effects, the compiler can often perform optimization when it would not be possible in imperative language like C which don't contain such guarantees (regarding side-effects). In this example, a _smart_ compiler would recognize and compute the twice function prior to starting the assignments which could then be performed in parallel. Further, if the language supports lazy evaluation, the assignments would (could) be skipped because the variables a, b and c aren't used after the assignment statements.

mag wrote:
Thanks for those resources, I'll take a look at them. Could you list some of the books (I've got some use/lose points on Safari Books).

Before I forget again, there are several decent FAQs / Tutorials on functional programming available from both the Haskell and Clojure communities. If you're mainly interested in ideas on concurrency, I would suggest looking into Clojure. Also, there are several Channel9 videos by Microsoft Research which cover functional programming (IIRC, there is 6+ part series on the subject). While on the topic of MS, the LINQ Wikipedia page contains links to several functional programming concepts (eg Select -> Map and Aggregate -> Fold).

As far as books, I would start with the textbook that used in your programming languages course. I have Concepts, Techniques, and Models of Computer Programming and Design Concepts in Programming Languages. Both of them had a fair amount of information on functional programming.

Next I would turn to the SICP book which contains a fair bit of functional programming (especially sects 1.3, 2.2, 3.1 and 3.5). This textbook is geared towards writing software, not debating the merits of functional programming.

On Lisp by Paul Graham talks has two chapters (iirc) about functional programming and he discusses some of the merits of FP throughout the book. The book is primarily about writing macros in Lisp, hence the name, On Lisp. You can download this book from his website.

Common Lisp the Language 2 by Guy Steele (abbv CLTL2) is the reference book for the Common Lisp language. You won't find much discussion on the merits of functional programming, but the book does cover pretty much every functional programming technique at some point in the text (eg Appendix A and B cover the Series and Generators). This might be available online as well.

I recently downloaded a books on Parallel R, Parallel Programming in .NET and Hadoop, but I haven't had a chance to read any of them. Hopefully, I'll be able to start one of them next week.

For anyone trying to improve their programming skills or entering an AI related field, I would highly recommend Paradigms of Artificial Intelligence Programming by Peter Norvig. This book isn't about functional programming; However, it is considered by many people to be one of the best books on programming ever written. It covers a very wide range of topics: How to program in Common Lisp; How to write AI programs; History of several AI problems and programs, including code; Treatment of several styles of programming.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Sun Jun 24, 2012 9:02 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
I forgot an IBM tutorial article... Statistical programming with R: Part 2. Functional programming and data exploration.
Quote:
<snip>This installment also explored how R's functional programming style aids in rapid exploration of patterns and data and analytic scenarios. </snip>


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Sun Jun 24, 2012 9:54 pm 
Team Member Top 50
Team Member Top 50

Joined: Sat Jun 25, 2005 11:04 am
Posts: 1026
Gadget wrote:
mag wrote:
Based on this example, I don't see a real benefit, since 2*b would have to be computed for a, b, and c (as illustrated below). c could be further improved to (x=4*b, c=x+c), but you'd only save the difference between an assignment (y=c) and an addition (x=x+x). I realize a, b, and c could be operations that take a lot longer than basic instructions, but I'd think these lower level optimizations would be handled by the compiler, especially in higher level languages.

Actually, compiler optimization is a good point. Because (pure) functional programming languages do not have side-effects, the compiler can often perform optimization when it would not be possible in imperative language like C which don't contain such guarantees (regarding side-effects). In this example, a _smart_ compiler would recognize and compute the twice function prior to starting the assignments which could then be performed in parallel. Further, if the language supports lazy evaluation, the assignments would (could) be skipped because the variables a, b and c aren't used after the assignment statements.

After reading your defmacro link, I was going to post something similar to this. Basically, if a, b, and c are basic instructions, then any compiler will optimize them, but if they are more complex functions, then functional programming will be able to optimize them, since there aren't any side effects, but other paradigms won't be able to.

Gadget wrote:
mag wrote:
Thanks for those resources, I'll take a look at them. Could you list some of the books (I've got some use/lose points on Safari Books).

Before I forget again, there are several decent FAQs / Tutorials on functional programming available from both the Haskell and Clojure communities. If you're mainly interested in ideas on concurrency, I would suggest looking into Clojure. Also, there are several Channel9 videos by Microsoft Research which cover functional programming (IIRC, there is 6+ part series on the subject). While on the topic of MS, the LINQ Wikipedia page contains links to several functional programming concepts (eg Select -> Map and Aggregate -> Fold).

As far as books, I would start with the textbook that used in your programming languages course. I have Concepts, Techniques, and Models of Computer Programming and Design Concepts in Programming Languages. Both of them had a fair amount of information on functional programming.

Next I would turn to the SICP book which contains a fair bit of functional programming (especially sects 1.3, 2.2, 3.1 and 3.5). This textbook is geared towards writing software, not debating the merits of functional programming.

On Lisp by Paul Graham talks has two chapters (iirc) about functional programming and he discusses some of the merits of FP throughout the book. The book is primarily about writing macros in Lisp, hence the name, On Lisp. You can download this book from his website.

Common Lisp the Language 2 by Guy Steele (abbv CLTL2) is the reference book for the Common Lisp language. You won't find much discussion on the merits of functional programming, but the book does cover pretty much every functional programming technique at some point in the text (eg Appendix A and B cover the Series and Generators). This might be available online as well.

I recently downloaded a books on Parallel R, Parallel Programming in .NET and Hadoop, but I haven't had a chance to read any of them. Hopefully, I'll be able to start one of them next week.

For anyone trying to improve their programming skills or entering an AI related field, I would highly recommend Paradigms of Artificial Intelligence Programming by Peter Norvig. This book isn't about functional programming; However, it is considered by many people to be one of the best books on programming ever written. It covers a very wide range of topics: How to program in Common Lisp; How to write AI programs; History of several AI problems and programs, including code; Treatment of several styles of programming.

I'll take a look at those resources. Thanks.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Wed Jun 27, 2012 6:33 am 
Team Member Top 100
Team Member Top 100

Joined: Fri Sep 17, 2004 5:35 pm
Posts: 1176
Wow, it's cool to see Gadget bringing computational complexity theory into the discussion -- kudos!

To add my 2 cents to the discussion, I think this is a great example of a simple question with no simple answer, other than the trite, "it depends". As Gadget was saying, some algorithms will not see any performance improvement from the ability to use multiple processors, while others will. It depends mainly on the amount and type of parallelism exposed in the algorithm. SMP (the multi-core model you're used to) systems are best for algorithms with coarse-grained parallelism; that is, when you have a number of mostly-independent tasks which must all be computed. Programmable hardware (which I suspect we will start seeing more and more of) is best for very fine-grained parallelism. Take the example of reversing the bit order in a word: most CPUs will require tens of cycles to do this, but in hardware it's just a matter of changing the wiring. No need for any logic, and this circuit would be extremely fast! Just note that the bit-level parallelism inherent in the problem can be effectively exploited in hardware, but cannot be effectively exploited on a CPU without hardware dedicated to this purpose. GPUs are best for SPMD (single program, multiple data) meso-parallelism, where there is a common operation which must be run repeatedly on a large data set, usually with some amount of local communication between tasks. Instruction-level parallelism will be ignored here because it's mostly a hardware design and compiler problem.

But, you were specifically asking about how difficult it is to parallelize a game engine to effectively use multiple CPUs and GPUs. I think the GPU part has mostly been answered, but I disagree with Gadget's guestimate that it would take a work factor of 3x. In most games there is a decent amount of coarse-grained parallelism: there's the physics engine, the rendering engine, the network engine, etc. All of these parts must certainly share data, but it's relatively infrequent and the tasks are mostly autonomous. It probably would not take that much effort to split these tasks into separate threads/processes, but proper synchronization and the non-determinism imposed by true SMP may make things tricky. Of course, the speedup you would see is dependent on the relative runtimes of each of these tasks. If you separate the physics engine into a separate thread, even assuming miraculous cost-free synchronization, you won't see much of a speedup if it only took 10% of the runtime in the first place (cf. Amdahl's Law).

I think your question was more of an engineering question than a theoretical one. So, instead of looking at established academic work, we should look at the marketplace. If you look into the architectural differences of the Xbox 360 and the PS3, you'll immediately see some pretty striking differences. The Xbox 360 is basically a 3-core SMP PowerPC system with a standard PC graphics card; the PS3 is a very highly specialized 8-core vector processor (similar to those used in supercomputers for decades). If you can properly expose enough of the right types of parallelism in your game engine, the PS3 will knock the crap out of the Xbox 360. But, it turns out game developers have a very difficult time writing code to effectively use the PS3's powerful resources. On the Xbox 360, we see game developers coping pretty well, and they seem to be able to effectively use all 3 cores.
"A few" cores seems to be fairly straightforward for developers to design for. More cores mean more parallelism has to be exposed (which is a terrible game of rapidly diminishing returns for the effort invested), and synchronization and non-determinism will quickly turn your dreams into a nightmare.

So, my opinion is that it wouldn't be much effort to enable limited coarse-grained parallelism in most game engines, but it may be difficult to utilize all the cores effectively.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Wed Jun 27, 2012 4:58 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
Kybo_Ren wrote:
Wow, it's cool to see Gadget bringing computational complexity theory into the discussion -- kudos!
Thank you, sir.

Kybo_Ren wrote:
Programmable hardware (which I suspect we will start seeing more and more of) is best for very fine-grained parallelism.
Really? I haven't followed hardware closely for quite some time now. Are there plans for incorporating FPGAs into mainstream hardware? That would raise all kinds of interesting questions/issues in a multi-programming environment.

Kybo_Ren wrote:
Take the example of reversing the bit order in a word: most CPUs will require tens of cycles to do this, but in hardware it's just a matter of changing the wiring. No need for any logic, and this circuit would be extremely fast! Just note that the bit-level parallelism inherent in the problem can be effectively exploited in hardware, but cannot be effectively exploited on a CPU without hardware dedicated to this purpose.
Good example! Which I will gladly upgrade to great if you provide a strong practical reason -- I couldn't come up with anything offhand, but I suspect there might be some uses related to information theory. Keeping in mind that it will still be much slower than a dedicated hardware solution, can anyone come up with a way to reduce the time complexity from Theta(word_size/2) down to Theta(1)?

Kybo_Ren wrote:
But, you were specifically asking about how difficult it is to parallelize a game engine to effectively use multiple CPUs and GPUs. I think the GPU part has mostly been answered, but I disagree with Gadget's guestimate that it would take a work factor of 3x. In most games there is a decent amount of coarse-grained parallelism: there's the physics engine, the rendering engine, the network engine, etc. All of these parts must certainly share data, but it's relatively infrequent and the tasks are mostly autonomous. It probably would not take that much effort to split these tasks into separate threads/processes, but proper synchronization and the non-determinism imposed by true SMP may make things tricky.
Unfortunately, estimating the additional work required for implementing any type of parallelism is going to be extremely difficult. Conceptually, it may not seem difficult to implement some degree of parallelism in a game engine. A really good team of developers with experience in this area might actually complete coding without incurring much additional expense. However, parallelism is going to introduce quite a bit of additional complexity in all of the other phases as well, especially testing and maintenance, which I suspect will be _at least_ 5x more expensive based on my experience helping maintain/fix a couple of defense projects w/ multiple processes (using shared memory in Unix). Finding the cause and then correcting a bug in a parallel program is truly one of the worst levels of hell for software developers.

Kybo_Ren wrote:
I think your question was more of an engineering question than a theoretical one. So, instead of looking at established academic work, we should look at the marketplace.

This is the reason that I began the preceding paragraph with unfortunately. It is unfortunate that we can't utilize academic materials to help derive an answer to this question because _software engineering_ is mostly a sham! There are far too many people working on new engineering methods (in order to obtain consulting contracts?), formulas and theorems based on personal conjecture, and far too little work based on quantitative data analysis. Unfortunately, this situation is likely to persist because corporations and governments aren't collecting or disseminating the necessary data.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Wed Jun 27, 2012 7:09 pm 
Clawhammer
Clawhammer

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 4983
I think No instruction set computing would be good for our computing. Apparently NVIDIA Kepler runs on this style.

Before you go "But Kepler sucks at compute performance!", well because like VLIW, it requires a highly tuned compiler.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Wed Jun 27, 2012 9:16 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
LatiosXT wrote:
I think No instruction set computing would be good for our computing. Apparently NVIDIA Kepler runs on this style.
Huh... sounds interesting... although it also sounds like NO instruction set is a bit of a white lie. Maybe I should look into compiler research. The boss will never tell you that optimization / performance doesn't matter plus we're becoming more and more dependent on good compilers.

LatiosXT wrote:
Before you go "But Kepler sucks at compute performance!", well because like VLIW, it requires a highly tuned compiler.
I should really do a hardware review... is VLIW still considered immature or slow?


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Wed Jun 27, 2012 10:04 pm 
Clawhammer
Clawhammer

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 4983
Gadget wrote:
Huh... sounds interesting... although it also sounds like NO instruction set is a bit of a white lie. Maybe I should look into compiler research. The boss will never tell you that optimization / performance doesn't matter plus we're becoming more and more dependent on good compilers.

The idea is that there's no instruction set. I almost want to say it's like FPGAs with VHDL, but I can't. It's probably something that's way over my head. But all I know is there's no instruction decoder, the processor executes the operation directly from memory.

On a semi-related note, or maybe not, it's like what is a RISC processor. Apparently the only way something is RISC is if its memory operations are only load and store.

And on another note, funny you we're becoming dependent on compilers. Embedded experts also expect us to be dependent on code generators, which in fact, I already am. My supervisor has adopted using QP, it's a bitch to hand write. But the tool the author provides is basically draw a bunch of UML state charts which handle all the hard parts. You just have to add in the event handlers.

Quote:
I should really do a hardware review... is VLIW still considered immature or slow?

Depends on the application. AMD's Radeon HD5000-6000 was VLIW based. A Russian processor called the Elbrus 2000 uses VLIW and at 300MHz can compete with... I forget what, but I keep thinking a Pentium 4 @ 1.5GHz. Anyway, it's probably not that it's immature or slow, it's just that the requirements for either VLIW and NISC require a high degree of predictability. For compute programs, it's perfect. For event driven software, I can't exactly imagine so


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Wed Jul 04, 2012 1:56 am 
Team Member Top 100
Team Member Top 100

Joined: Fri Sep 17, 2004 5:35 pm
Posts: 1176
Gadget wrote:
Really? I haven't followed hardware closely for quite some time now. Are there plans for incorporating FPGAs into mainstream hardware? That would raise all kinds of interesting questions/issues in a multi-programming environment.

There have been plans for this for years, but none have borne much fruit. People have tried everything from FPGAs on PCMCIA cards to FPGAs directly in processor sockets, to hard PPC cores directly embedded in FPGA logic. Some of the reasons they haven't been very successful outside a small niche is they're costly, have little utility for your grandma's web browsing, vary widely in resources and capability, and there is no standard abstract interface to them. Cost can be mitigated by scale, but the rest of the problems are very much still problems today. I think we will start to see some reconfigurable logic incorporated directly onto SoCs (like Xilinx's Zynq) destined for mobile phones as a means to decrease energy consumption, thereby giving grandma some utility. The last two points are the real deal-breakers. The ISA abstraction has proved very useful for CPUs: you can buy CPUs with vastly different resources but use the same, unmodified, software on both, and your more capable CPU will just 'magically' work faster. There is no obvious abstraction useful for porting FPGA bitstreams; everything about them is as device-specific as it gets. The closest thing to a common abstraction would be behavioral RTL, but even that is light-years away from being one, not to mention completely inaccessible to software developers (even with modern high-level synthesis tools). Even worse, it will take hours of compute time just to create the bitstreams for the specific accelerator...

Heterogeneous computing is a very hot topic, and projects like LiquidMetal and OpenCL are moving toward the direction of seamless portability between multiple models of computation. Even with these challenges, there are a number of possible workarounds: libraries, precomputed bitstreams available OTA, 'warp processing', etc. will all play a role in bringing these devices to the mainstream.

Gadget wrote:
Good example! Which I will gladly upgrade to great if you provide a strong practical reason -- I couldn't come up with anything offhand, but I suspect there might be some uses related to information theory. Keeping in mind that it will still be much slower than a dedicated hardware solution, can anyone come up with a way to reduce the time complexity from Theta(word_size/2) down to Theta(1)?

Well, it was just a particularly simple and illustrative example. I'm not proposing CPU designers incorporate such logic in their chips :). If you want a more useful example of an operation with large amounts of bit-level parallelism which typically can't be exploited by CPUs, I would suggest counting the number of bits set in a word. Intel recently introduced the 'popcnt' instruction to do just this, but software has to be specifically coded to take advantage of it. It's a surprisingly common operation, and is useful in pretty much anything from parity checking to bitvector manipulation in DBMSs.


Gadget wrote:
Unfortunately, estimating the additional work required for implementing any type of parallelism is going to be extremely difficult. Conceptually, it may not seem difficult to implement some degree of parallelism in a game engine. A really good team of developers with experience in this area might actually complete coding without incurring much additional expense. However, parallelism is going to introduce quite a bit of additional complexity in all of the other phases as well, especially testing and maintenance, which I suspect will be _at least_ 5x more expensive based on my experience helping maintain/fix a couple of defense projects w/ multiple processes (using shared memory in Unix). Finding the cause and then correcting a bug in a parallel program is truly one of the worst levels of hell for software developers.

I agree. Without direct experience with the specific code involved, the best anyone can do is guess. I also have experience working with parallel programs on multiprocessors, and it can indeed be an absolute nightmare to diagnose even the simplest bugs. Personally, I hope those who use their technological knowledge to enable political oppression (I'm looking at YOU, Blue Coat programmers!) and sell out free speech tools to the Saudis (I'm looking at YOU, Moxie!) are reserved the final circle in hell: debugging highly-parallel, tightly-coupled programs on a multiprocessor system with no cache coherency, in a perpetual Sisyphusian punishment.

LatiosXT wrote:
I think No instruction set computing would be good for our computing. Apparently NVIDIA Kepler runs on this style.

Before you go "But Kepler sucks at compute performance!", well because like VLIW, it requires a highly tuned compiler

I'm not so convinced this is really the way forward. This only exacerbates the problems VLIWs present. VLIWs haven't been adopted on a large scale, but it's not really because of the compiler's obligation to statically schedule everything. Intel's IA-64 compiler is quite good at this. The most serious problem is the notion of binary-compatible software goes right out the window, even among CPUs sharing a common instruction set! NISC has the exact same problem here, but even worse: now you don't even have a common instruction set!

People have tried to mitigate this problem for years; remember the Crusoe? It was a VLIW under the hood with an x86 JIT unit. The idea was the chip would be compatible with existing software, perhaps see some speedup where the JIT unit did a good job, and give great performance for device-specific code. Sounds like a great idea; how did it work out? Well, does anyone remember the Crusoe???

Gadget wrote:
Huh... sounds interesting... although it also sounds like NO instruction set is a bit of a white lie. Maybe I should look into compiler research. The boss will never tell you that optimization / performance doesn't matter plus we're becoming more and more dependent on good compilers.
<break>
I should really do a hardware review... is VLIW still considered immature or slow?

It's not really a lie. We're used to an ISA as a specification for interacting with the CPU's control FSM, which in turn manages the microarchitectural details necessary to actually effect your command. In this case, there isn't much need for a decoder and control logic: nearly all the microarchitectural details are handled by the compiler, which encodes their commands directly. It really entirely skips the abstraction layer ISAs provide, hence, "no instruction set". I suppose you could say the instruction set is just the set of all possible combinations of FU commands.

I don't think VLIWs were ever thought of as slow. VLIWs can be extremely fast -- after all, the Itanium saw some significant adoption in supercomputers. With a perfect compiler, code can be scheduled optimally, which is not even close to feasible with dynamic issue OOO CPUs (this is not to say that 'perfect compilation' is feasible in software, either -- last time I checked, we were up to sequences of no more than 7 instructions...).
I also don't think VLIWs are considered immature. The idea has been around for decades and decades. They're not widely used (unless you consider DSPs to be VLIW machines) simply because development is difficult and they don't meet the needs of the market at a feasible cost.

LatiosXT wrote:
The idea is that there's no instruction set. I almost want to say it's like FPGAs with VHDL, but I can't. It's probably something that's way over my head. But all I know is there's no instruction decoder, the processor executes the operation directly from memory.

I would say it's more closely related to CGRAs than RTL for FPGAs. It kind of takes the idea of static scheduling to the next level. VLIWs still schedule most microarchitectural details, while this is giving the code generator even more fine-grained control over the microarchitecture. I see why you say this, though. It makes binary code look much more like a bitstream than a program.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Wed Jul 04, 2012 8:01 am 
Clawhammer
Clawhammer

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 4983
Noticed how I said that NISC should be adopted for compute, not for general applications. Not to mention that VLIW (or something VLIW-like) has been used extensively on GPUs. The problem with VLIW/NISC is that the software has to be predictable enough for a static scheduler and the compiler has to be smart enough to sort instructions ahead of time. On a graphics pipeline, no problem.


Top
  Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group