Two years ago, Microsoft’s Kinect for Windows literally opened the PC’s eyes. And now Microsoft researchers are teaching it to see.
For too long, PCs sat mute, dumbly waiting for users to type on their keyboards or insert a disk. Then they became connected, reaching out to others when users commanded. At Microsoft’s Silicon Valley Techfair last week, company researchers showed how they’re taking the PC in a new direction, combining machine vision with a new independence so they may recognize and interpret what the PC sees and present that information in a useful context.

Unlike Google or other Silicon Valley companies, Microsoft has traditionally operated more like a public university than a private company, hosting similar research showcases once or twice a year. Yes, it holds some of its research close to its vest, especially that which later emerges as products, like its Cortana digital assistant. But many more are released to the public, both to show off the company’s technical expertise as well as to demonstrate potential directions for the company.
In all, Microsoft researchers showed off about 18 projects last week. We selected five, with four of those incorporating Kinect in some way. And no, we don’t think Microsoft hit a home run with each. After all, future successes are often built upon past failures.
Your webcam: the next Kinect
If you’ve been attentively reading our coverage of Microsoft’s Build, this presentation by Vivek Pradeep shouldn’t seem all that new: Microsoft Kinect for Windows executive Michael Mott exclusively revealed to PCWorld that Microsoft is actively working to use conventional webcams as depth cameras, like its Microsoft Kinect.
In the video, Pradeep and his colleague show off what they call MonoFusion. Conceptually, it’s pretty simple to explain: Using an unmodified webcam, the two researchers pan the camera over the scene. Behind the scenes, the Microsoft software interpreted what it sees from a depth perspective, creating 3D models of the objects in a Kinect-like fashion. The software then applies a color map, or texture, to the objects, essentially transforming the video of a collection of stuffed animals into models of the animals themselves.

What MonoFusion sees: the raw feed, a depth map (right) and textured 3D objects.
What Microsoft created, Pradeep noted, is a simple and powerful SDK to take the imagery and export the 3D models into a game or augmented reality application. It certainly has all sorts of possibilities.
Floating displays with gesture recognition
About a year ago, Microsoft researcher Jinha Lee unveiled a spectacular 3D desktop that used the combination of polarized glass and some intelligent software to create the illusion of a desktop with depth. Now Microsoft researcher Tim Large has developed a second, physical “floating display” to provide a somewhat similar illusion.
The idea, Large said, is to take a conventional 2D display and apply a series of plastic films above it, “tuned” to a particular range of light emitted by the LCD monitor. As demonstrated in the video below, the films project a second screen that “floats” in the air above the monitor. A second researcher, Yutaka Tokuda, also showed that it’s possible to superimpose the second screen content above the main display, using Kinect to help fine-tune the illusion.
While both Large and Tokuda indicated that the idea was to bring digital creations to life via the second display, it’s difficult to see what Microsoft hopes to accomplish here. While overlaying information on top of video has become commonplace on news reports, weather forecasts, and football games, cluttering a PC screen with too much extraneous information can be problematic. It appears that both are playing with the idea of focus, calling out specific elements for a closer look.
In March 2013, Microsoft concentrated on showing how Perceptive Pixel’s massive touchscreen displays could be used to empower employees as gigantic video whiteboards. So far, the floating display appears to be little more than a headache waiting to happen.
ViiBoard: collaborative annotation
But if Microsoft’s floating display seems like a stretch, the ViiBoard feels like an extension of today’s workplace. The concept is extremely simple: By pairing a Kinect sensor with a Perceptive Pixel (PPI) display, users who approach the display as a whiteboard are recognized, and whatever they write is color-coded and stored.
The demo by Yinpeng Cheng, a senior research engineer for Microsoft (who refers to his project as Vtouch) shows an impressive amount of polish. As the user approaches the display, for example, it dims. Waving a hand upwards reveals a user menu, which follows the user as he or she moves to either side of the screen. And if a user shows 10 fingers in a “typing” gesture, a keyboard appears. Even “penstrokes” with either a finger or stylus can be color-coded depending on which hand the user is employing, and a squiggle can be quickly erased with just a gesture.
Cheng refers to VTouch as a way to make the whiteboard (or what he calls a “touch board”) more valuable through collaboration. And that’s been the axis that the small number of free, third-party office suites have begun heading down, in addition to duplicating the functionality of Microsoft Office. So far, that’s been confined to document sharing and collaboration. But for businesses who often come together in conference rooms for meetings, or for collaborative discussions between participants in different locations, VTouch or the ViiBoard could point to future improvements in Skype or PPI-specific apps.
Real-time tracking of animals (and you)
While the Microsoft Research Computational Ecology and Environmental Sciences group set out to assist biologists and other animal researchers in tracking and learning about animals in the wild, the technology they developed could certainly apply to either law enforcement or the military.
Lucas Joppa, a scientist with Microsoft Research, said his research consists of three elements: Zootracer, a software algorithm for tracking objects recorded on video; Mataki, a 7-gram, GPS-enabled tracking device; and an unmanned drone designed to wirelessly communicate and even lock on to whatever’s carrying the Mataki device, such as an elephant, or a car.
Zootracer starts off rather dumb: it can distinguish objects recorded on video or via Kinect, but just barely. Joppa demonstrated how the user needed to “teach” the algorithm how to identiy an object, like a bee, as it moved between points. But by pausing and identifying the bee multiple times. the sensor began tracking it as it moved around the screen.

Microsoft’s unnamed drone.
That’s fine, as bees go. But to learn more about larger animals and their environments, Microsoft developed Mataki, a sensor package that can be attached to an animal. Using short-range wireless mesh communication, Mataki can dump its knowledge onto another sensor. And if that’s not good enough, Microsoft developed drones that can swoop in and capture the data—or, using the GPS data streaming from the sensor, “lock on” and follow a specific Mataki sensor from the air.
The latter capability certainly should raise some eyebrows. But it might not be surprising if Joppa is quietly hired away by Amazon. Shopping or courier services should know exactly where you are, and cutting delivery times by a few minutes could mean more efficient shopping drones—even if they’re science fiction for now.
Printable electronics
3D printing is becoming ever more ubiquitous, but to do so requires an investment of time, money, and materials into a 3D printer and substrate. Alternatively, “subtraction” printers can etch away a block of material. In some ways, we’ve been “3D printing” computer chips for decades, etching away silicon using photolithography. Likewise, “printing” computer circuits using metallic ink from an inkjet printer has become almost commonplace, to the point where you can do it from home.
Microsoft researcher Steve Hodges (beginning at 1:13) showed off an interesting combination of the two, however. Using a small number of tiny printed circuit boards with embedded chips, Hodges printed the metallic connective traces with a $100 modified inkjet printer onto a piece of photo paper. Then, using electrically conductive double-sided tape from 3M, he simply stuck the micro-PCBs to the paper. Voila: a quick-and-dirty homebrewed product, such as a motion sensor.

Researchers used electrically conductive double-sided tape to connect chips to photo paper.
Some design shops already use 3D printers today to instantiate models of all sorts of things. Adding cheap and easy logic boards isn’t eminently practical for long-term development, but it seems at least as useful as a 3D printer for quick prototyping.
We’re just on the cusp of understanding what it means for a PC to see, hear, and understand. So far, we as a culture have been very suspicious about intrusions into our personal lives —even as we’ve marveled at how creations like digital assistants can make those lives better. And it may be that while we attempt to limit the efforts of wearables like Google Glass to spy on elements of our social lives, we will allow further access to the PC into our homes and offices. What they will do when they open their eyes and look around, however, is still in the lab—for now.
Updated April 28 to include a video report from IDG News Service.