Quantcast

Maximum PC

It is currently Fri Dec 26, 2014 1:44 pm

All times are UTC - 8 hours




Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2
Author Message
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Thu Jul 05, 2012 1:07 am 
Team Member Top 100
Team Member Top 100

Joined: Fri Sep 17, 2004 5:35 pm
Posts: 1176
Ah, that wasn't so clear. Yes, perhaps we will see NISC chips appearing in very specialized domains. I don't know whether this will start appearing in GPUs, but I guess time will tell.
I don't know what you mean by "software has to be predictable enough for a static scheduler", though. The compiler is obviously very tightly coupled to the architecture, and knows the exact latency of any EU op (save for memory accesses with caches, I suppose). Where is the unpredictability in the software? We're talking about static vs. dynamic issue scheduling of operations to the available EUs, not speculation, right? I'm also unsure what you mean by "[the] compiler has to be smart enough to sort instructions ahead of time". All compilers must schedule instructions, right? You mean it must be careful to compute operations in parallel when possible? This isn't really a big deal; intraprocedural dependency analysis is relatively time consuming but well-understood.

And Gadget, I just thought of a practical use case for a bit-reversal instruction: an FFT butterfly :).


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Thu Jul 05, 2012 8:43 am 
Smithfield
Smithfield

Joined: Sun Jun 18, 2006 7:37 pm
Posts: 5544
Kybo_Ren wrote:
Ah, that wasn't so clear. Yes, perhaps we will see NISC chips appearing in very specialized domains. I don't know whether this will start appearing in GPUs, but I guess time will tell.
I don't know what you mean by "software has to be predictable enough for a static scheduler", though. The compiler is obviously very tightly coupled to the architecture, and knows the exact latency of any EU op (save for memory accesses with caches, I suppose). Where is the unpredictability in the software?

If you have conditional statements or are allowing interrupts, then you face a certain level of unpredictability.

Quote:
We're talking about static vs. dynamic issue scheduling of operations to the available EUs, not speculation, right? I'm also unsure what you mean by "[the] compiler has to be smart enough to sort instructions ahead of time". All compilers must schedule instructions, right? You mean it must be careful to compute operations in parallel when possible?

The reason why I mentioned compute is because computing is mostly number crunching. There might be some conditional statements here and there, but I imagine all you're really doing is throwing some data into a bunch of math to solve, then rinse, lather, and repeat as necessary. Since math tends to be a predictable thing with clearly defined rules, then you can axe a complicated scheduler that's trying to account for conditional branches and dependencies.

Like lets take pixel shading. You're trying to figure out the end 32-bit value of a pixel, but there are possibility millions of them. However, since all the lighting sources and modifications (texture maps) are known, you really don't need a complicated scheduler to generate an image. And since figuring out a pixel's color isn't dependent on its neighbor, the operation is highly parallel.


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Thu Jul 05, 2012 4:16 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
Kybo_Ren wrote:
Gadget wrote:
Really? I haven't followed hardware closely for quite some time now. Are there plans for incorporating FPGAs into mainstream hardware? That would raise all kinds of interesting questions/issues in a multi-programming environment.

There have been plans for this for years, but none have borne much fruit. People have tried everything from FPGAs on PCMCIA cards to FPGAs directly in processor sockets, to hard PPC cores directly embedded in FPGA logic.

I didn't think about it before, but I guess the physics boards that started appearing a few years back are one example. I'm assuming that they're an ASIC, but I'm not sure.

Kybo_Ren wrote:
I'm not so convinced this is really the way forward. This only exacerbates the problems VLIWs present. VLIWs haven't been adopted on a large scale, but it's not really because of the compiler's obligation to statically schedule everything. Intel's IA-64 compiler is quite good at this. The most serious problem is the notion of binary-compatible software goes right out the window, even among CPUs sharing a common instruction set! NISC has the exact same problem here, but even worse: now you don't even have a common instruction set!
Prior to this moment, I haven't given this much thought... so be gentle if I really screw this one up. =)

Binary compatibility is a nice feature, but it does have at least one drawback. The [probably not so] obvious drawback is that commercial software may not be compiled or written to take advantage of many of the features of your particular processor [remember, be gentle. =)]. I'm sure there are reams of interesting academic and industry papers dealing with this issue, and granted, most applications will not benefit greatly, but there are those special cases in which having some extra muscle is really nice (eg media encoding, matrix calculations, etc). Assuming this is true, I think we need to ask whether binary compatibility is really all that beneficial in this day and age, right? Why shouldn't a decent installation program be able to determine the type of machine then install an appropriate binary?

Kybo_Ren wrote:
It's not really a lie. We're used to an ISA as a specification for interacting with the CPU's control FSM, which in turn manages the microarchitectural details necessary to actually effect your command. In this case, there isn't much need for a decoder and control logic: nearly all the microarchitectural details are handled by the compiler, which encodes their commands directly.
I was suggesting that it could be considered a white lie because an ISA serves as an interface between the hardware and software. Even if the hardware is programmable (which I don't believe is the case for NISC), there has to be an explicit interface somewhere, otherwise, how are the compiler folks going to know what to do? Sure, the decoder and instruction set may no longer exist, but the hardware interfacing role still exists which is why I called it a white lie. In a NISC architecture, do the compilers still generate "object code" or some other intermediate code?

From a theoretical CS perspective, I can't imagine why this architecture isn't reducible to a standard Turing machine. From a CS architecture perspective, are there some compelling reasons for adopting it? Does the elimination of the decode step translate into a potentially large gain in performance? Will compilers be able to customize branch-prediction to the individual application to such a degree that it makes sense to adopt this architecture? I read on Wikipedia that the design of the ISA is very costly, but I wonder if this is due to the actual complexity of designing a decoder. Is most of the cost saving resulting from simply shifting hardware development to software (which can also be substantial)?


Top
  Profile  
 
 Post subject: Re: How hard is it to change a program to use multiple cpu/g
PostPosted: Thu Jul 05, 2012 7:55 pm 
Bitchin' Fast 3D Z8000*
Bitchin' Fast 3D Z8000*
User avatar

Joined: Tue Jun 29, 2004 11:32 pm
Posts: 2555
Location: Somewhere between compilation and linking
LatiosXT wrote:
The reason why I mentioned compute is because computing is mostly number crunching. There might be some conditional statements here and there, but I imagine all you're really doing is throwing some data into a bunch of math to solve, then rinse, lather, and repeat as necessary. Since math tends to be a predictable thing with clearly defined rules, then you can axe a complicated scheduler that's trying to account for conditional branches and dependencies.

I would have to say that I tend to agree if you limit 'math' or 'number crunching' to formulaic calculations; Linear algebra and basic statistical calculations, in particular, appear to be fairly straight-forward requiring little or no branching. However, I don't want to leave anyone with the impression that this is true of mathematical computation in general. Take the predicate function isPrime(), for example. The basic algorithm is simple and involves a single branch:
Code:
iterate through known prime numbers until you get to the sqrt of n (we'll make the simplifying assumption this was already done)
  if the current prime number is a divisor of n, return false
return true

How do we handle branch prediction for this algorithm? While there are infinite prime numbers, the probability that a given number is prime is 0 which suggests that the condition will be true quite often. In fact, half of all the natural numbers are divisible by the very first prime number. Yet, given an odd number, most prime numbers are not divisors especially when a 'number cruncher' is investigating the primality of a number. It isn't very clear to me which case is the more likely. Granted, this is a simple example and I believe that most processors are capable of performing both code paths simultaneously (Kybo_Ren -- please correct me if I'm wrong!), but I wanted to illustrate that even a simple mathematical function can result in unexpected difficulties. There is also a more subtle computation that takes place in the above algorithm which is performed frequently in mathematical formulas. An iterative method, like Newton's method, is used to arrive at the square root of n (also true for sin, cos, tan, ln/log, and many other mathematical functions). I don't believe we can make assumptions about the most likely case (between low and high), otherwise someone would probably have developed a better method taking this into account. Also, I don't believe that the processor will be able to handle the entire 'code path', for lack of a better term, because the potential number of paths grows exponentially (ie 2^num_iterations).

While it is true that math tends to have well-defined rules, this doesn't necessarily imply increased branch hits. I don't think it is a coincidence that in an era where mathematical computation was the primary use for computers, both Fortran and Lisp included syntax/forms to simplify writing code that handles multiple conditions. Finding the derivative of a function, for example, involves the application of a few straight-forward rules. However, performing this task requires some type of pattern matching followed by the application of the rule. I don't believe that the processor can do much in the way of branch prediction for this case either. Randomized algorithms, Monte Carlo methods, chaotic / non-linear systems seem to be another area in which branch prediction is going to suffer. Integration, for example, is often performed using randomization. Many computational statistics algorithms utilize randomization as well.

Just some thoughts.


Top
  Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page Previous  1, 2

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group

© 2014 Future US, Inc. All rights reserved.