Seamless Video Editing – A Look Toward the Future

Seamless Video Editing – A Look Toward the Future

A provocative first paragraph:

A new application being developed by researchers at UC Berkely and Adobe Systems aims to do just that…helping editors identify the best spot to make a cut based off of audio and visual features of raw footage.  The program can auto generate seamless transitions to make the cuts visually smooth and undetectable.

Which sounds exciting, until you read later:

This tech seems useful for working with on-camera interviews (with only one subject), but in it’s current state it doesn’t seem like it would be effective at tackling more complex shooting situations.

So, which is it? Both and neither. Understanding how and why we make edits is complex, but it is/will be doable. Finding the base information on which to apply that algorithm is even harder. But it is inevitable. Certainly not for every type of edit, and not for every project. Given that an enormous amount of editing is not highly “creative” but somewhat routine.

I have long advocated that this type of technology will be developed and applied. When we were developing First Cuts, the algorithm would product a result and it would be “off” in some way – simply not what I would have done as an editor. That forced an examination of how I would have made the edit. That then lead to needing to quantify why I made the edit there.

That part was not easy, although I am fortunate to have a brain almost equally balanced between left and right – creative and analytic.

In layman’s terms: Spots of the video where there is little audio or on-screen movement are given priority as ideal spots to cut, and are plotted on a “cut suitability” timeline.  If necessary the application will insert natural looking pauses to bridge two cuts together.   From the product demo (embedded below) it appears that editors can simply delete text from the transcript view and the application will go to work creating a seamless transition.  An additional features allows for one-click removal of “ums” and repeated words.

They can go back one step. In an interview situation you generally have two voices: breaking an interview up on voice changes, and then paragraph breaks (which is what this research seems to be doing, but adding in the analysis of motion in video) is “trivial” once we get reliable speech transcription.

Reliable speech transcription is the key to unlocking an enormous amount of metadata-driven tagging/keywording and driving these sorts of automatic assembles. At this stage I see this more as an editor’s tool than for finished projects, although there are some applications in exploring large amounts of video material. (Something I hope to demonstrate by the end of the year using some of the Solar Odyssey footage.)

Should we go down this path? That’s an irrelevant question because, with downward budget pressures dominant in the industry, it’s inevitable. Those that can work smarter – using all the tools at their disposal – will continue to be needed.

And I firmly believe that the emotionally compelling, heart-tugging edit is going to remain beyond the ability of a computer for the balance of my lifetime.