Advances in Content Recognition
At the current stage of technology development, we are largely limited to adding Content Metadata manually. If we want people described; if we want the scene described; or the action described, we need to add Keywords or Notes to achieve that. I don’t expect that to be the case in the future. Technology from Clarifai and Google give us clues to the future.
My friend Cirina Catania introduced me to Clarifai a few weeks back. Clarify takes still images and provides keywords for each uploaded image. (There is an API for developers to use.) The results are both remarkable and ultimately useless.
Upload an image and you will quickly discover that Calrifai is doing a clever comparison of visually similar images, and applying known keywords to the new image. That is remarkably accurate, and a real indicator of what can be done.
The problem is that Clarifai returns too many keywords. One thing I have discovered over the years is that too many keywords is as bad as too few keywords.
If there are too many keywords, the content is spread thin and you end up with large lists of keywords with few clip selects in each, or every clip appears in many collections making the content you need hard to find.
If you have too few keywords, then content is not identified, and you spend large amounts of time scrolling through clips trying to find that quote “you know you heard once”!
We considered integrating Clarifai into Lumberjack, but the volume of keywords generated made it unwieldy.
Google recently published a blog post: A picture is worth a thousand words which shows real promise in the content description area, rather than keywording. The natural language descriptions would be great as a way to provide useful clip names, or even be placed in a searchable notes field.
Google and Clarifai are attempting two different things, both of which show promise. Clarifai seek to provide descriptive keywords. What exactly are keywords? From an article I’m in the process of writing:
A keyword is one or two words that summarize the key idea in a range of media.
Key word(s): key idea. The details will vary from project to project, but keywords are an organizational structure to bring together material with the same key idea.
Keywords are for organizational purposes. Descriptions are for describing. Both are useful but I am more encouraged by Google’s research than I am by Clarifai’s – at least at the current state of development. Clarifai is the closest to being available as they already have an API available, but without some way of prioritizing or filtering the keywords the over-abundance of results will hinder, rather than help organization.
When we come to the individual clip level, a good description of the content is what a lot of logging is about. At this point it’s a Google Research project with no available API but I sure would like to have this technology to label my clips for me. It is also very deep and very, very clever. When you read the article note the combination of complex technologies that go into a simple task.