For my entire production life, I’ve heard the trope that you can have any two of “Fast, Good or Cheap,” and it has been true that entire time. Except that today’s release of Lumberjack Builder version four completely breaks the paradigm. For the first time you can have Fast, Great (accurate) transcripts for free, completely breaking the long held “truism.”
Anyone who’;s been in the production industries for any length of time has already navigated many technology changes: from analog to digital, SD to HD, SDR to HDR and so on. Avid Media Composer was first out the digital Non-Linear Editing gate, and pretty much every other interface was inspired by the design decisions made there, until Final Cut Pro X rethought some of those assumptions.
Now there is a whole new generation of editing tools that take very different approaches to video storytelling: from RunwayML, through Builder NLE, Reduct, Synthesis.io, to whatever Adobe’s Project Blink evolves into, that take quite different approaches to solving the problem of organizing media into compelling stories.
It’s no secret that I started to learn to sing back in 2014 because I had been very bad a Karaoke. Eight and a bit years later I’m “competent” as a singer. What’s mildly annoying about it that, in far less time, researchers have taught AI entities to sing, and even have them accepted as students in a musical academy.
A deepfake allows us to animate any face and have it say whatever we want that face to say. Useful in resurrecting long dead actors (or recently dead to finish a movie), but incredibly dangerous when it – inevitably – it moves into the political arena. Exonet describes the problem in great detail in this video. From 2017 when you needed a lot of money and a lot of source material to base the deepfake on, to 2022 where only 20 seconds of video are required to make a believable deepfake.
While video dominates most communication and entertainment, it seems like Artificial Intelligence thrives on text. There’s a lot of new and quite diverse tools for using text prompts for image generation, 3D model creation, Human avatar movement, Animation and editing, all driven with text prompts. While text prompts are the current interface, they are not the end game. After all we already have, in various stages of development and availability.
In the Amplified Actors article last September, I spoke of some experiments from 2018 and later at the BBC and NHK with synthetic newsreaders. Those “experiments” are now productized and available from Virtual Human startup Hour One, with their text driven AI anchors and newsrooms.
If I were staring my career now, I’d be focusing on Artificial Intelligence and Unreal Engine. One for making me more creative, and one opening a whole storytelling universe “on my desktop.” Given that I focused on post production because it had air conditioning, being able to control my “shooting universe” in air conditioning is definitely appealing. Unreal Engine has become a standard cinematic storytelling tool, and now with Netflix’s Validation Framework for it, it had de facto approval. Netflix have used Unreal Engine on 1899 and Super Giant Robot Brothers among other.
A Validation Framework not only ensures that all settings are correct and included files included, but also includes a support system to guide users through correcting any errors. And because I can, I’ve included a couple of Unreal Engine Cinematic examples.
Netflix has approved Unreal Engine on more than the productions noted above, and each iteration has been a learning experience, but one where each production has been “starting over.” With so many settings and permutations it’s easy to get the deliverables wrong. Enter Netflix’s Validation Framework for Unreal Engine, which is a plug-in for the app.
You can read the specific benefits from the framework in their blog entry.
15 Graphic Demos
Take a look at a representative sampling of the cinematic potential.
The Matrix Awakens: An Unreal Engine 5 Experience
A “fan fiction” Matrix experience rendered in Unreal Engine. These examples render in real time on modest hardware like X-Box and Playstation, which is why its very popular for revisualization. This particular example shows the (current) limitations on Metahumans. They are very driven by the performance capture.
It’s an extended example which shows some of the cinematic possibilities of Unreal Engine. Like every creative endeavor, the level of finish and realism is driven by the skill of the creatives.
Speaking of Metahumans
There’s an obvious attempt at a Metahuman Neo in the Matrix Awakens experience above, it’s harder to recreate a person than create a completely fictional character. Here’s an example of Hermione Granger Made Using Unreal Engine 5.
If you want to see more of what Unreal Engine is doing in the cinematic space, search Unreal Engine Demo on YouTube and you’ll find dozens of examples.
Whether or not Artificial Intelligence can ever be creative is an ongoing question. In the Unreal blog they highlighted a project Words, camera, action: Creating an AI music video with Unreal Engine 5. As is usually the case with creative projects, it was a collaboration between human creators and a multiplicity of Machine Learning ‘algorithms’ used primarily to apply looks.
If you’ve used Final Cut Pro, Premiere Pro or Blackmagic Resolve you are already using some Artificial Intelligence/Machine Learning tools. Final Cut Pro is catching up with the others in their use of Machine Learning tools in their NLEs. It’s likely that some of these stand alone tools, or features from them, will migrate into an NLE near you, but in the meantime here are five amazing tools you could/should be using now.
Rotoscoping tools have been consistently evolving and improving from the days of b-splines in Commotion, to todays’s Rotobrush 2 in Adobe After Effects, but it remains one of the most tedious post production tasks, well suited to Machine Learning. RunwayML‘s approach uses an object recognition approach for both object extraction and background fill. RunwayML uses a monthly subscription based on output format: p to 720P output is free and 1080 output is just $15 a month. ProRes and 4K output is $35 a month and is apparently their most popular subscription.
Custom Music Creation
One of the technologies I was in awe of when I was a more active editor, was Smartsound’s Sonicfire Pro, which customized music tracks to match a specified video length. Very clever stuff that indeed was invented there. Having experienced Sonicfire Pro when I came across Soundraw it was immediately familiar.
In Soundraw’s interface you choose a Mood or Video Theme, Style, Length and Instruments. In about 15 seconds Soundraw will generate 15 new, original and unique tracks for you. Choose one to edit and you have even more control over tempo, key, instruments and mix. You can load a video to synchronize with.
Soundraw’s durations lack the precision of Sonicfire Pro as they only increment in 30 or 60 second intervals, but for royalty free music I’ll take it. Two plans from $16.60 per month.
Royalty free and unique custom models that have never existed
One way to avoid being busted using Stock Images to populate your advertising campaign, or to fill a bit of b-roll, is to make sure you never use a photo of a person. Enter Generated Photos where every face is unique and completely fake! You have complete control over how diverse you want your fake population.
The mainstay of many production business is the “talking head” shoot, which require organizing the on-camera talent, gear and crew, and a location. Not every “talking head” shoot is for something wonderful. More often they are fairly basic corporate communication or for education, and often melded with a PowerPoint style presentation. Synthesis.io takes your text input – typed or pasted in – and animates one of their many Avatars to speak that text.
Lumberjack System is using Synthesis.io to replace me in social media and help videos. It typically takes me about 5-10 minutes to set up and preview the audio and about 15 minutes for a rendered two minute video. The base plan allows for 10 minutes of video for $30 per month.
Here’s an example of a version one Avatar, which are pretty good, but you will see minor “uncanny valley” moments if you watch closely. They are rolling out there “more natural” version 2.0 avatars at the moment, but I haven’t yet experienced them yet. One nice editing feature is that all sentences start and end on the same frame with the new version! If only our real world presenters would be so consistent!
AI Studios is a new competitor to Synthesia.io while Rephrase.io takes a similar technology but to modify the one presentation to customize to thousands of unique videos.
When I drafted my Becoming an Amplified Creative article in May last year, I predicted that something like these Avatars was coming in “2-3 years.” In my pre-publication revision three months later, I had to include Synthesis.io and two months after that, I was a customer! This technology is evolving rapidly and it won’t be long before they are indistinguishable from live shoots with humans.
In all cases, these are real people who have been “sampled” in 10 minutes of video before being processed into the Avatars. There are dozens of Avatars and over a hundred languages. All performers have given permissions and were compensated according to the Synthesis.io.
Visual processing is where Machine Learning has made the most advances.Research Papers – technology developments not yet products – can regenerate faces in great detail from very low resolution or damaged inputs, fill in detail to upscale images much more. Topaz Labs have released tools for upscaling and de-noising that are invaluable for documentary work, where the originals often leave much to be desired in a 4K world!
Their Video Enhance AI not only upscales, but de-noises, de-interlaces, does some restoration and frame rate conversion. At US$200 it seems like magic to someone who started with ¾” quality! Adobe Camera RAW edged out Video Enhance AI for upscale quality in an A.I. Upscaling Software Shootout at Pro Video Coalition.