CAT | Artificial Intelligence
While projecting the changes that Artificial Intelligence (AI) and Machine Learning (ML) might bring about in the future, it was interesting to look back and see just what didn’t exist 10 years ago. Keep in mind that the Internet itself is only just over 30 years old.
It’s relatively easy to get an overview of the current state of Artificial Intelligence (AI). It’s probably easier to understand the benefits of machine learning, particularlyMachine Learning (ML) that’s already applied to common tasks that we can benefit from now because we’re fitting those new technologies within existing frameworks.
What is much harder to determine, is how machine learning will be directly applied to post production processes, and what role AI will take in our collective production future.
In September 2010 Apple purchased Swedish facial recognition company Polar Rose, and today we learn they’ve purchased Israeli startup RealFace: “a cybersecurity and machine learning firm specializing in facial recognition technology”.
What is different between the two purchases is that this latest is based on machine learning.
…the startup had developed a unique facial recognition technology that integrates artificial intelligence and “brings back human perception to digital processes”. RealFace’s software is said to use proprietary IP in the field of “frictionless face recognition” that allows for rapid learning from facial features.
Another step towards our software identifying and labelling people in our media.
In the Overview I pointed out that most of what is being written up as Artificial Intelligence (AI) is really the work of Learning Machines/Machine Learning. We learnt that Learning Machines can improve your tax deduction, do the work of a paralegal, predict court results, analyze medical scans, and much more. It seems that every day I read of yet another application.
There are readily available Learning Machines available for all comers, but there are ways to benefit from them without even using one.
Over the last couple of years I’ve become more and more interested in the ways that the research being done into Artificial Intelligence (AI) might be applied to production and post production. In this article I’ll be giving an overview of what AI is at this stage of development, and what technologies are being used. Later articles will cover immediate and future applications and implications.
2016 was a year of consolidation and growth for Greg and I: citizenship, green card, artificial intelligence and a house and yard dominated the year. 2017 looks like being another interesting and exciting year.
One of the smart algorithms that developers can call on is Sentiment Analysis (by that or another name). Sentiment Analysis simply reads the sentiment – positive, neutral or negative – from a body of messages. It can also provide the same information on single ‘documents’, which could be transcripts.
MonkeyLearn – one of the providers of these smart algorithms – has an example of sentiment analysis from the current electoral cycle.
My question is, does this sort of metadata about the content of media, provide any benefit for post production processes; in sorting or organizing footage; or is this something you’d ever want to search for?
I’ve (along with many other people) have been beta testing SpeedScriber, an unreleased app that combines the power of an API for speech to text with a well thought out interface for correcting the machine transcription. Feed the SpeedScriber output to Lumberyard (part of Lumberjack System) and extract Magic Keywords and in a very short period of time (dependent largely on FCP X’s import speed for the XML) and you have an organized, keyworded Event with a fully searchable Transcript in the Notes field.
Microsoft claim a milestone with their Cordana speech to text transcription service, hitting an accuracy rate of 93.1% or a failure rate of 5.9%, which is reportedly the same accuracy as you’re paying $1 or $2 a minute for right now.
No human transcriber is completely accurate. There are generally some words that are unclear, or technical terms not known to the human transcriber that need correcting in a transcript.
I’ve also been one of the beta testers on SpeedScriber, which is built around an automatic engine, and have been very impressed with the accuracy, particularly with American accents. Accuracy dropped a bit when it had to deal with my still-mostly-Australian accent.
One of the powerful way Artificial Intelligence ‘learns’ is by using neural networks. Neural Networks are trained with a large number of examples where the result is known. The Neural Network adjusts until it gives the same result as the human ‘teacher’.
However, there’s a trap. If that source material contains biases – such as modeling Police ‘stop and frisk’ – then whatever biases are in the learning material will be contained in the subsequent AI modeling. This is the subject of an article in Nature: There is a blind spot in AI research and also the praise of Cathy O’Neil’s book Weapons of Math Destruction that not only brings up that issue, but the problem of “proxies”.
Proxies, in this context, are data sources that are used in AI programs that are not the actual data, but rather something that approximates the data: like using zip code as a proxy for income or ethnicity.
Based on O’Neil’s book, I’d say the authors of the Nature article are too late. There are already institutionalized biases in very commonly used algorithms in finance, housing, policing and criminal policy.