A few years ago, we considered supporting transcripts in Lumberjack System. At the time our goal was to quickly prepare for an edit, and transcriptions took days and cost serious money.
Two years ago we supported the alignment of time-stamped transcripts to Final Cut Pro X Clips and a year ago, introduced “magic” keywords, derived by a cognitive service. Since Lumberjack doesn’t (yet, I might emphasize) support a speech to text service internally, what are the options and what do they tell us about the state of play for transcription in April 2017?
Until recent times the only way a transcription was created, was with a trained human listening to the audio and typing the transcript. They tended to have multi-day turn around and be expensive. There are hundreds of smaller shops, but Take1Â and RevÂ are prominent examples.
Take1 service the Hollywood Media and Entertainment industry, as well as productions in their home of the UK, with the advantage of being in a different time zone. Take1 customize their output to the needs of the client, and will prepare Lumberjack ready transcripts upon request. Time stamps are part of their standard service. Because of the high level of human attention, Take1 are at the higher end of the spectrum at $2 a minute (basic service).
Rev (like most services now) use a computer transcription as the base and a human editor to correct. Time stamps are not standard, and they do not do Lumberjack ready, but their pricing makes them attractive. Although turnaround is official 24 hours, experience lately has been closer to 3-4 hours turnaround for $1 a minute. If you need time stamped transcriptions it will require an extra pass by you or someone in your team.
These services make no pretenseÂ of their roots as a machine driven Cognitive Service. Some make it known what underlying service is being used, others prefer not to say, but there are many as I have recently written. While suitable for programmers, hooking in directly isn’t for beginners, which is why several companies have wrapped these speech-to-text APIs in a user-friendly interface.
I suppose I should mention that the developers of all these apps, and some of the people involved with the services previously mentioned, are friends of mine.
First that I was aware of is SpeedScriber, now in beta for nearly a year (but one relatively easy to join via the website), where the Cognitive Service is hidden within an excellent editing and management interface. The developer appears focused on creating the most accurate transcriptions with minimal effort, and get them aligned with clips in FCP X. SpeedScriber requires minimal correction – usually with speaker identification – and is set at 50c per minute.
Discussed over various beverages during the FCP X Creative Summit 2016, and now publicly announced are two more upcoming releases from Core Melt and Digital Anarchy.
ScribomaticÂ focuses more on making media searchable (functionally similar to PhraseFind and its ilk) with basic storyboarding tools, and is well integrated with FCP X.
TranscriptiveÂ is another new tool from Digital Anarchy. It’s the only one that offers a choice of either IBM Watson as the transcription engine, or Cambridge UK based Speechmatics. Speechmatics offers a lot more languages that IBM’s Watson does, so it’s a smart option with a robust European market. Again the focus is on search – as well as subtitles – in an Adobe Premiere Pro CC Panel, integrated in the app.
Shortly after NAB finished, Wired published an article on speech-to-text, highlighting the TrintÂ service. Trint would also seem to be using Speechmatics, based on language selection (and proximity to Cambridge). At 25c per minute, it’s the least expensive option with time stamps.Â While the editing interface isn’t as comprehensive as SpeedScriber it’s fairly easy to use and functional. Not bad for a browser based app. Trint is also the least expensive way to get Lumberjack ready transcripts.
You could go cheaper than that as the wholesale rates for the Cognitive Services but you’ll be writing a lot of code. The inevitable conclusion, though, is that transcription – and the other Cognitive Services – will become commodities.
But we have to rethink our workflows to take advantage of commodity transcription. Simply commodifying transcripts isn’t enough.