Over recent years, I’ve read a lot on Apple* but only during the flight back did I start reading anything on Google: In the Plex by Steven Levy. While I’m not yet finished it struck me the fundamental difference between Google and Apple is “who’s in control”.
With Google, engineers rule. Data rules. Everyone else is in the service of the engineers.
At Apple, designers rule. (Design in the full sense of how something operates and feels, not just how it looks).
And right there is the difference between the two companies. All else leads from that fundamental focus.
*Becoming Steve Jobs Brent Schlender & Rick Tetzeli
Design Crazy Max Chafkin
Insanely Simple Ken Segall
Inside Apple Adam Lashinsky
Steve Jobs Walter Isaacson
It seems the smartest way to make money right now is to have a startup in speech recognition, machine learning, neural networks or other Artificial Intelligence related startup.
TechCrunch reported late last week the Apple had acquired another machine learning company:
Presumably to beef up its efforts in AI and machine learning across the company.
Not to be left behind, Google:
…said that it’s acquired API.ai, a startup with tools for speech recognition and natural language understanding….
In addition to its developers tools, Api.ai offers a conversational assistant app with more than 20 million users.
I would expect the purchase is to beef up their speech recognition in its AI assistant Google Now.
Google have open sourced it’s Show and Tell model for automatically captioning images. This is an excellent example of how neural networks work: train the model with examples – in this case human captioned images – and then let it loose on new images. From the Venture Beat article:
Google trains Show and Tell by letting it take a look at images and captions that people wrote for those images. Sometimes, if the model thinks it sees something going on in a new image that’s exactly like a previous image it has seen, it falls back on the caption for the caption for that previous image. But at other times, Show and Tell is able to come up with original captions. “Moreover,” Shallue wrote, “it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.”
As the article points out, there are many more players looking to do the same thing. Imagine how much easier life would be in editorial if all the B-roll came in organized like this.
In this latest episode of The Terence and Philip Show Terry and I discuss metadata, my citizenship, smart APIs, Artificial Intelligence and more.
Karl Soule has been with Adobe for over 10 years, focused on the Pro Video and Broadcast market. Karl traveled worldwide as an Adobe Video Evangelist, inspiring video professionals on five continents. For the last 5 years, Karl has been living in Singapore and working in Asia, helping to grow the broadcast business and drive awareness of the video tools for that growing market.Currently based in Los Angeles, Karl is supporting the Hollywood and Broadcast markets for the US West Coast.
Terry Curren pointed me to this example where IBM Watson (one of the Smart APIs I referred to a couple of weeks back) was tasked with determining whether or not an Artificial Intelligence could “cut” a movie trailer. This is the result, with a very interesting insight into how they did it at the end.
IBM Watson pulled the selects based on action and emotion, but an editor created the trailer from the selects. Still, being able to locate the highlights and determine emotion is a big step forward.
The extensive article by Steven Levy – The iBrain is Here – is a fascinating read on how Apple are using Machine Learning, neural networks and Artificial Intelligences across product lines. It’s well worth the time to read through, but this quote from Phil Schiller stood out:
“We use these techniques to do the things we have always wanted to do, better than we’ve been able to do,” says Schiller. “And on new things we haven’t be able to do. It’s a technique that will ultimately be a very Apple way of doing things as it evolves inside Apple and in the ways we make products.”
The ways this could all be aligned with editing? Speech-to-text; keyword extraction (just like Magic Keywords in Lumberjack System); sentiment extraction; image recognition; facial detection and recognition; speech controlled editing (if anyone really wants that), and the list goes on.
I’d like to believe the Pro Apps Team are working on this.
Buried in an article called The iBrain is Here about Apple’s use of Artificial Intelligence across a wide range of products and purposes was this gem:
Machine learning…. It even knows what good filmmaking is, enabling Apple to quickly compile your snapshots and videos into a mini-movie at a touch of a button
At one level this is certainly true, and likely. After all, Greg and I spent a summer analyzing how I made documentary-style edits. It was a fascinating experience for me, analyzing why “that” was the right place to start b-roll over an interview.
I would then have to turn that analysis into a rule of thumb that Greg could program. This was the basis of (the now gone) First Cuts app. That work will resurface at some time. It’s too valuable not to.
It’s a competition piece, so if you’d all like to go to http://indi.com/7fqks and vote for Marlon Braccia, we’d appreciate it.
Edited in FCP X I used significant amounts of speed change, chroma key, crop and blur on the background. Those in LA can see it in person, and learn how it was done in detail at the August 24 meeting of LACPUG.
Bloomberg reported yesterday about a Hedge Fund ‘Robot’ that “outsmarted” its human “master”. The quotation marks are all mine because it’s self learning, so it doesn’t really have a master, but rather someone that created it.
Still, the performance in the quoted instance is quite impressive. It’s currently in charge of about $35 million in investment.