Here’s the second part of our exclusive QuakeCon interview with John Carmack. In the first part of our conversation, Carmack discussed his hopes for Quake Live and the id Software’s new gaming direction in Rage. This time around, he gets more into the heady technical stuff with his thoughts on Nvidia’s CUDA, physics accelerators, general purpose computing, and ATI’s rumored Fusion technology.
MaxPC –Can we talk about PhysX and GPUs and Cuda and stuff like that for a sec?
John Carmack – I was well known as not being a supporter of the PhysX accelerators. It’s always felt like a gimmicky plan with people setting up a company to be acquired. For years, the tack has been what do you do with any time Intel delivers something more with processors and more cores? It’s never really proven out right and there’re a lot of reasons for it.
For one thing you can’t scale AI and physics in general with your gameplay, while with graphics, you could scale. Without scaling, you can’t design a game that requires fancy AI and then turn off the fancy AI for the low end systems because practically that’s not possible. Similarly for physics, if it’s anything other than eye candy, you also can’t scale. If the building is going to fall down you need to know whether you’re going to be able to get past it on the high end or the low end.
So what’s happened of course is that PhysX is degenerated to fancy eye candy. You got your fields of grass, you got your walls of blocks that come tumbling down and things that aren’t crucial to the actual game, and that is just a fancy cookie that you throw at the player, which admittedly has some value. So in terms of the general purpose acceleration it was clear even when AGEIA was starting, that we knew that the graphics processors are going to be getting more generalized, and we never thought that they had any special sauce in their hardware that was fundamentally going to be better.
So what’s going on with the Cuda approach is and I think Nvidia is being very wise about their approach where they’ve brought in something early on so some people could start getting some things done with Cuda. So they’ve got a community of high performance computing research guys working with Cuda and it’s great because it’s so important to get that out of your labs and into a customer’s lab and just seeing how things work in the real world. They’re going to have several generations of extra insight over Intel by the time larrabee ships.
Right now the switch between GPU and Cuda is a really heavyweight switch. In the next-gen stuff, it’s much more lightweight so you can toggle back and forth, and in the future, it’s all mix and match. They’ll [eventually] run GPU and Cuda processes simultaneously and it opens up a lot more avenues for computation. There are still some fundamental worries that I have about vector length on there where all of these things that are set up to be GPUs first they’ve all got very long vector lengths. So while you may have a 128 sort of banks of threads, each of them are doing 32 things at the same time. I still see a huge potential for miserable utilization where even if you could suck up all the threads, if you don’t have something that can use wide vectorization, you wind up with only 5% utilization.
MPC – that combined with the heavy switch is disastrous right now, right?
JC- Yeah, you can’t really use it in a game right now. It doesn’t make a lot of sense but it’s going to in the near future and by the time we get to next-gen console stuff all of that is going to be a nice finely integrated stuff. Right now you have this continuum from a general purpose processor like we’ve got as the main CPUs on the 360 or the PS3, then you’ve got like the Cells which are general purpose processors but they’re all wide vector with no caches special DMAs. Then you got things like Cuda thread processors and each one is more hassle than the other and the one before that.
The Cuda processors are moving up, clearly. They’re going to get caches and more general purpose programming abilities but they’re not going to be all the way to what larrabee is doing which is really independent processors with a couple of cores and a couple of threads. It’s going to be interesting to see how all that plays out where, my suspicion is that for a lot of applications they’re designing and benchmarking for, Intel will wind up having good performance. But the internals of it, the software that they write for it is pretty ugly while the code that you could write for Cuda is pretty clean.
I think Intel is going to be fine on the peak performance numbers and will probably have a process advantage, which is always one of Intel’s big hammers. So it’s going to be interesting how Nvidia’s greater experience in utilizing all this parallelism plays out versus the kind of might that Intel is going to have in their raw process advantage in applications.
MPC – Where does ATI fall into this?
JC – We’ve gotten the pitches on the Fusion project and how they’re putting it together with the more general purpose stuff, like with the AMD CPUs on it. We have less insight into that than we have into other projects. In general ATI doesn’t have quite a good developer relations support as we get from Intel and Nvidia. Again, it’s going to be interesting to see how all that plays out. I know their market share isn’t doing real well on the different PC cards.
MPC – Do you think an open API will help them?
JC – It’s a tough thing because I think that trying to spec an API for experimental hardware like this is really tough, and like I said last night it’s very different than what it was with graphics where we had examples of all that research that had been done and we knew how to do it and we were just cleaning it up and doing it better.
But in this type of situation, we really can’t say that anybody that gets up there and clearly acts like they know exactly the way things are going, is just putting up a good front because the work just hasn’t been done yet. Nobody has written major applications that are working on these things, and one of these approaches may turn out to be fundamentally better than the other. We just don’t know which one.