Useful Speech-to-Text is hard!

I was saddened, but not really surprised, by this week’s announcement that Adobe were pulling Speech-to-Text transcription from Premiere Pro, Prelude and AME. As Al Mooney says in the blog post:

Today, after many years of work, we believe, and users confirm, that the Speech to Text implementation does not provide the experience expected by Premiere Pro users.

I am saddened to see this feature go. Even though the actual speech-to-text engine was somewhat hit or miss, there was real benefit in the ability to import a transcript (or script) and lock the media to the script. Â So it’s probably worth keeping the current version of PremiereÂ (or one of the other other apps) to keep the synching function, as the apps will continue to support the metadata if it’s in the file.

Co-incidentally, we had a feature request recently, wanting a transcription-based workflow in Final Cut Pro X. When questions on how he’d like it to work, he described (unintentionally) the workflow in Premiere Pro!

In fact, I’d almost implore Adobe to keep the ability to import a transcript and align it to the media, using a speech analysis engine. That way the industry will have an alternative to Avid’s Script Sync auto-alignment (previously powered by Nexidia) tools currently unavailable in Media Composer. The ability to search – by word-based content – hundreds of media files with transcripts, is extremely powerful for documentary filmmakers.

And yes, there is theÂ Nexidia-powered Boris Soundbite, but there is one problem with this waveform-matching approach: there is no content metadata. Nor anything (like text) we can use to derive content metadata.