The present and future of post production business and technology | Philip Hodgetts

Workflow Extensions are definitely my favorite new feature from FCP X 10.4.4. I had long been jealous of Panels in Premiere Pro CC, even with their limitations. Being able to put an interface to someone’s (our) app within the NLE seemed like a nice feature. I think Apple’s Workflow Extension are superior because they run native code (not through a Javascript/HTML interface).

Of course, we immediately get questions about when we’re going to put all our apps in Workflow Extensions. It happens with every new Apple technology release. “When will you do an iOS version?” “When are you going to create a Watch app?”

Since these are Workflow Extensions we need to think about workflow. What makes sense being in the host app, and what does not make sense? What makes sense are workflow apps that you “touch” and get back to FCP X. Workflow apps where you go away from FCP X, perform some combination of activities and then go back to FCP X, do not make sense.

Thus, asset management, review and approval, and training apps make sense. You want to view the reviewer comments in FCP X, in the native timeline or clip being commented on. You want to search for a clip and bring it to FCP X.

As we proved in 1999 with the first of our training products, The DV Companion for FCP, having the instructional video floating over the app makes a lot of sense. (Technically Workflow Extensions don’t float, but they are there in the app, so it’s much the same.)

A limitation on Workflow Extension is that they must have a single interface window, so document based apps aren’t suitable.

So, when it comes to the Intelligent Assistance Software and Lumberjack System apps, it makes sense for some to become Workflow Extensions, and not others, based on workflows. Apps like Producer’s Best Friend – where you generate reports and then get back to FCP X – or Sync-N-Link X – where you have clips in FCP X that you want synchronized and immediately sent back in FCP X – make a lot of sense.

Conversely Change List X makes less sense in a Workflow Extension because the output is not used in FCP X at all. Similarly the two translation apps don’t make much sense as Workflow Extensions.

For Lumberjack System, Lumberyard makes sense as a Workflow Extension because – again – it uses Event Clips as the input and the result is updated in FCP X. noteLogger and backLogger make no sense as Workflow Extensions because the are used before FCP X. They are, as is real time logging in the iOS app, “pre editing” tools to be used before the NLE.

Similarly Lumberjack Builder not only makes no sense as a Workflow Extension, but isn’t even possible. Builder takes an input from FCP X (Event Clips) but work continues in Builder. You can update an Event with Keywords logged in Builder (because it’s more efficient) but Builder is really designed as a companion NLE to FCP X, again to be used before finishing work is done in FCP X.

Transcription Workflow Extensions only make sense if you haven’t really considered the workflow. While it wasn’t automated transcription, Lumberjack was first to bring transcripts into FCP X back in early 2015 for the OJ Simpson Documentaries. We discovered that even a perfect transcript in FCP X is still a terrible workflow. Searching is difficult, and there’s no way to build a story based on text, the way transcripts are used.

Getting transcripts into FCP X is solving the wrong problem, which is working with transcripts in a way that makes sense. That’s why Greg and I spend a lot of time thinking about the workflow, and realized there was no way that transcript workflows could be grafted into FCP X, and it was/is our belief that it’s not a high priority for the Pro Apps team. So we built an entirely new kind of NLE for text driven editing.

Being a document based app, meaning you can have multiple documents open at once, with each document carrying multiple stories, it can’t be shoehorned into a Workflow Extension, but more importantly it would be the wrong thing to do. FCP X is not the place to be editing with text.

So, when thinking about Workflow Extensions, consider what workflow problem they solve. Where the Workflow Extension functionality is used IN a FCP X workflow, it probably belongs in a Workflow Extension. Where it independently enhances FCP X workflows, either before or after the primary FCP X work, then it isn’t appropriate for a Workflow Extension.

In a rather interesting article on creative collaboration, Here Comes the Automation: How AI is Poised to Change Filmmaking, we get this quote:

“When a distinguished but elderly industry executive says that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.” — Clarke’s  Law No. 1, slightly modified

It led me to think of how many of our creative tools in use every day were simply impossible a few years back. You don’t have to go back too far to be in a pre Internet era. Non-Linear Video Editing is less than 30 years old. A million dollar Da Vinci Resolve suite is now a free download from that Internet!

HD and 4K capable cameras on portable computers good enough to edit that with. (Speaking of which, check out LumaTouch for a look at what can be done on those iPhones and iPads carrying the camera.) Creative storytelling is more accessible than ever.

Our creative tools are in a constant state of evolution – a.k.a. change – and we’ve only just started realizing how “artificial intelligence” (i.e. machine learning based) tools are going to work their way into creative tools and workflows. This will likely fundamentally change the way we interact with creative tools, much the way non-linear editing of video on computers did 25 years ago.

Being open to change is essential, otherwise we risk being that “elderly industry executive” saying something was impossible, that others are doing every day!

I’ve certainly learnt to stop saying “that’s impossible” because it’s rarely true for very long.




Thanks to an introduction by a mutual friend, I had the opportunity to chat with Alex LoVerde of SyncOnSet, and it stuck me that the best technology is one driven by a direct, and often personal need. It also struck me how different two ostensibly similar “metadata companies” can be.

(more…)




I don’t always cross post my appearances on Larry Jordan’s Digital Production BuZZ, but I thought I did a particularly good explanation of the basics of AI and Machine Learning and how they might apply in production, that I thought I’d share this one.

Philip Hodgetts: The Basics of AI – Explained




Sep/18

5

Where we’ll be at IBC 2018

Greg and I will be at IBC 2018 and we’re looking forward to seeing you there.

If you’d like to pick our brains for up to an hour, then schedule a meeting with us. We’ll run through your workflow and offer suggestions on where there might be efficiencies, or we’re happy to demonstrate the innovative Lumberjack Builder. If you’re a Lumberjack customer, we’d love to hear how you’ve been using it and how it could be better for you. We’ll even buy you a beer!

Other than those meetings we’ll be mostly hanging around the Atomos ProRes RAW Theater as that seems to be the center of FCP X action this year.

As there’s no Supermeet this year, those of us who’d see each other there, are celebrating the Not Very Supermeet so come join us there.




Web APIs (Application Programming Interface) allow us to send data to a remote service and get a result back. Machine learning tools and Cognitive Services like speech-to-text and image recognition are mostly online APIs. Trained machines can be integrated into apps, but in general these services operate through an API.

The big advantage is that they keep getting better, without the local developer getting involved.

Nearly two years ago I wrote of my experience with SpeedScriber*, which was the first of the machine learning based transcription apps on the market. At the time I was impressed that I could get the results of a 16 minute interview back in less than 16 minutes, including prep and upload time. Usually the overall time was around the run time of the file.

Upload time is the downside of of web based APIs and is significantly holding back image recognition on video. That is why high quality proxy files are created for audio to be transcribed, which reduces upload time.

My most recent example sourced from a 36 minute WAV, took around one minute to convert to archival quality m4a which reduced the file size from 419 MB to 71MB. The five times faster upload – now 2’15” – compared with more than 12 minutes to upload the original, more than compensates for the small prep time for the m4a.

The result was emailed back to me 2’30.” That’s 36 minutes of speech transcribed with about 98% accuracy, in 2.5 minutes. That’s more than 14x real time. The entire time from instigating the upload to finished transcript back was 5’45” for 36 minutes of interview.

These APIs keep getting faster and can run on much “heavier iron” than my local iMac which is no doubt part of the reason they are so fast, but that’s just another reason they’re good for developers. Plus, every time the speech-to-text algorithm gets improved, every app that calls on the API gets the improvement for free.

*I have’t used SpeedScriber recently but I would expect that it has similarly benefited from improvements on the service side of the API they work with.




Jul/18

19

Speech-to-Text: Recent Example

For a book project I recorded a 46 minute interview and had it transcribed by Speechmatics.com (as part of our testing for Lumberjack Builder). The interview was about 8600 words raw.

The good news is that it was over 99.98% accurate. I corrected 15 words out of a final 8100. The interview had good audio. I’m sure an audio perfectionist would have made it better, as would recording in a perfect environment, but this was pretty typical of most interview setups. It was recorded to a Zoom H1N as a WAV file. No compression.

Naturally, my off-mic questions and commentary was not transcribed accurately but it was never expected or intended to be. Although, to be fair, it was clear enough that a human transcriber would probably have got closer.

The less good news: my one female speaker was identified as about 15 different people! If I wanted a perfect transcript I probably would have cleaned up the punctuations as it wasn’t completely clean. But reality is that people do not speak in nice, neat sentences.

But neither the speaker identification nor the punctuation matter for the uses I’m going to make. I recognize that accurate punctuation would be needed for Closed (or open) Captioning for an output, but for production purposes perfect reproduction of the words is enough.

Multiple speakers will be handled in Builder’s Keyword Manager and reduced to one there. SpeedScriber has a feature to eliminate the speaker ID totally, which I would have used if a perfect output was my goal. For this project I simply eliminated any speaker ID.

The punctuation would also not be an issue in Builder, where we break on periods, but you can combine and break paragraphs with simple keystrokes. It’s not a problem for the book project as it will mostly be rewritten from spoken form to a more formal written style.

Most importantly for our needs, near perfect text is the perfect input for keyword, concept and emotion extraction.




Jun/18

26

Putting Words in Their Mouths!

As I research further into Machine Learning to gain a better understanding of what’s possible and how it might be applied, I found a couple of audio related articles. While mostly still in the lab, this research will guarantee the perfect Frankenbite in the future!

(more…)




On the night of the Supermeet 2011 Final Cut Pro X preview I was told that this was the “foundation for the next 10 years.” Well, as of last week, seven of the ten have elapsed. I do not, for one minute, think that Apple intended to convey a ten year limit to Final Cut Pro X’s ongoing development, but maybe it’s smart to plan obsolescence. To limit the time an app continues to be developed before its suitability for the task is re-evaluated.

(more…)




I speak as both a customer of software (among other things) and a developer of niche software and in both voices I want to scream “Read the Help” many times a day.

We get many emails where someone has tried to use one of our apps and “it hasn’t worked” and they’re “really stressed”. At least 80% are solved by copying and pasting part of the Help. For sure it’s annoying for us to write the Help and then have to provide it in bite size chunks to the customer. It takes time and that costs us money, but that’s not the reason you should read the Help.

Reading the Help will reduce your stress and get you answers faster. (more…)




Older posts >>

January 2019
M T W T F S S
« Nov    
 123456
78910111213
14151617181920
21222324252627
28293031