Google have open sourced it’s Show and Tell model for automatically captioning images. This is an excellent example of how neural networks work: train the model with examples – in this case human captioned images – and then let it loose on new images. From the Venture Beat article:
Google trains Show and Tell by letting it take a look at images and captions that people wrote for those images. Sometimes, if the model thinks it sees something going on in a new image that’s exactly like a previous image it has seen, it falls back on the caption for the caption for that previous image. But at other times, Show and Tell is able to come up with original captions. “Moreover,” Shallue wrote, “it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.”
As the article points out, there are many more players looking to do the same thing. Imagine how much easier life would be in editorial if all the B-roll came in organized like this.