After guessing your age, classifying dog breeds, and finding celebrity likenesses, Microsoft researchers have launched a new tool for identifying the contents of photos.
With CaptionBot, users can upload any photo, and Microsoft will use various recognition services to describe what’s happening. This includes identifying celebrities, recognizing emotions, and describing basic objects that appear in the scene.
We’ve seen this type of party trick before. Last year, Wolfram Alpha released a similar tool, which remains available at ImageIdentify.com. But while Wolfram’s tool seems a bit better at identifying specific flora or fauna, CaptionBot is more descriptive about the scene itself. (Wolfram identifies this image, for example, as a Golden Retriever, while CaptionBot describes it as “a dog standing on top of a grass covered field.”)
Given Microsoft’s recent snafu involving the Tay chatbot, in which users programmed it to spew racism and misogyny, it wasn’t long before people started testing CaptionBot against potentially offensive images. But Microsoft isn’t biting this time. The bot refuses to work with porn, saying it “may be inappropriate content so I won’t show it,” and according to Business Insider will not identify photos of Adolf Hitler, saying “I’m not feeling the best right now.” (The bot did, however, identify a picture of Joseph Stalin as “a man wearing a hat” who looks happy.)
It’s unclear how much of that behavior is a deliberate response to the Tay debacle, but at least some of the blocking is by design. To create CaptionBot, Microsoft relied on its Bing Image Search API, Emotion API, and Computer Vision API, the latter of which is able to recognize famous people and block unwanted or adult content.
Why this matters: Although CaptionBot itself is just a time-waster, the underlying technology is a big part of Microsoft’s recent strategy, which involves providing ready-made AI and machine learning tools to businesses and developers. CaptionBot illustrates the potential impact image recognition tools can have on modern computing, but also how far they have to go before they can fully stand in for a human eye.