Two Gaming Technologies Explained: A White Paper Round-Up
Gaming, as it always has been, is in a strong state of transformation at the moment. Major developers are focusing on creating 3D ready platforms, while others, like Nintendo and Microsoft, are trying to take us beyond controllers--actually developing games that require physical movement and in-game interactions.
The brave new world of gaming will be an interesting one indeed, so we decided to take a look at two of the pioneering technologies that may change games forever: Microsoft's Kinect and autostereoscopy.
You can check out our previous white paper round ups here and here!
Microsoft Kinect
Microsoft's unique input device for the Xbox has opened up some very intriguing possibilities. But how exactly does it work?
Kinect is, perhaps, the most significant product Microsoft has developed since Windows itself. It has the potential to impact not only gaming, but general computing, communications, and media, as well. It’s an evolutionary platform blending sight, sound, and software that, if developed correctly into the future, could become a revolutionary UI.

Sight
Kinect’s console includes an RGB camera—the same type found in webcams and cell phones across the globe. Currently, it’s a device with a 640x480 resolution capable of capturing 30 frames per second. It’s not 3D.
An avatar, in this context, is simply a wireframe representation of the player that has been mapped with recognition points. These points correspond to the movement nexus that’s available from the wireframe (wrists, neck, elbows, shoulders, hips, etc., in the case of human beings) and are what allow the system to emulate accurate player motion onscreen in real time. “Real,” in this case, entails a reported 200ms lag—including screen response time—thanks to processing overhead and the usual screen refresh timing. It’s possible to reduce this using a faster CPU, but in general, 200ms is right on the border of human perception.
This is basically the same motion-capture process that’s been used for the last decade or so in, among other things, sports games, to accurately record athletes’ movement for reproduction during the game’s playback. But these professional systems use keyframes to flow the motion, while Kinect’s approach bypasses the static recording of pre-existing motion, instead reproducing the kinetic motion presented by the live player (in 20 points of motion) as the action proceeds.
Perhaps more mundane but nonetheless important, the combination of infrared and RGB cameras also allows Kinect to provide facial recognition that can automatically log a player on to the Microsoft network as well as associate the player with a previously used avatar. A recent update, called Avatar Kinect, gives the console the power to recognize players’ facial expressions and display them onscreen. In context, this ability can be used in several preconfigured venues (currently all thinly disguised chat room environments) to communicate with other players both verbally and through facial expressions. Apply notions of affective computing—which posits that systems will soon be capable of reacting to human facial expressions and emotions—and you can see why this is such a big deal.
The entire Kinect console sits atop a pedestal, much like those of 1960s lava lamps. Unlike (most) lava lamps, the Kinect pedestal has a built-in tilt motor that lets the entire console move. The tilt range is about 27 degrees, and it’s used in conjunction with the 57 degree horizontal field of view and 43 degree vertical field of the console’s cameras to give the system a greater ability to track you as you move around.
Sound
Although you may hear a barely perceptible whir coming from the console, it’s the only sound you’ll hear. There are no speakers inside the Kinect. Instead, the interior sports four microphones—three on the lower-right end, and a single on the lower-left side. All four face downward.
The quartet composes a spatial sound array that samples incoming audio and compares the four streams, separating background noise from speech, and different voices from each other. It’s effective to about 4 meters from the console.

Nestled alongside the RGB camera are an infrared emitter and an infrared camera. The former bathes the immediate area in infrared while the latter collects the radiated and reflected information for spatial analysis. The Kinect combines the 2D RGB image with the IR background fill to complete a recognizable object that exists at a distance "L" from the system and is along the X, Y, and Z (3D) axes.
While noise-cancellation microphones have been around for years, Kinect faces the unique challenge of typically having TV/receiver speakers closer to the mics while the human voices are farther away. The acoustic-echo-cancellation techniques used in common speaker phones tend to work well, but the recognizable-voices-versus-background-noise scenario is the reverse of that for the Kinect. Software created by the Speech Group at Microsoft Redmond Research solved the problem.
Software
The Kinect console does not have a processor, which is surprising considering all that’s expected of it. The console did have one when it was first announced (Project Natal in 2009) but Microsoft withdrew the internal CPU and decided to let the processing power of the Xbox handle matters. Kudo Tsunoda, the mastermind behind Kinect, insists that the add-on uses “less than one percent” of the Xbox 360’s processing power.
To help achieve that, Microsoft dropped the effectiveness of the camera down from the 60fps at its announcement in 2009 to 30fps at its commercial release. Still, that would put a huge burden on the software efficiency of the algorithms that run the console—except that the bulk of the overhead has been mitigated because the algorithms are located in the Xbox console as Kinect drivers.
These drivers are what describe a human’s position in Cartesian space, and they are what handle reverberation problems and suppress loudspeaker echoes in the stereo acoustic-echo-cancellation algorithm. They do all this and more based on comparisons to decision forests (a collection of decision trees) in conjunction with thousands of stored samples.
Continuum
There is no technical reason why a Kinect console could not be attached to any computing device that was loaded with the algorithms it needed to function. While that might be slightly difficult for the traditional BIOS/OS arrangement found in most contemporary computers, a UEFI environment would clear the way for the archetypal house of the future—run by voice commands and gestures with only its own facial recognition algorithms needed to provide security.
By the time you read this, it’s likely that Microsoft will have made some form of Kinect-related announcement at the 2011 Electronic Entertainment Expo in Los Angeles. Early speculation is that Microsoft’s purchase of Skype might herald advanced video conferencing—such as predefined avatars with full expressions instead of true video images, to keep the CPU overhead down. And somewhere in the far-out reaches of time and space, what might a Kinect for PC/Mac be able to do with an über CPU?
It’s going to be an interesting future.