Can a computer replace an editor?

Before we determine whether or not a computer is likely to replace an editor, we need to discuss just exactly what is the role of an editor – the human being that drives the software (or hardware) that edits pictures and sound? What do they bring to the production process? Having determined that, perhaps we can consider what it is that a piece of computer software might be capable of now or in the future.

First off I think we need to rid ourselves of any concept that there is just one “editor” role even though there is only one term to cover a vast range of roles in post production. Editing an event video does not use the same skills and techniques as editing a major motion picture; documentary editing is different from episodic television; despite the expectation of similarity, documentary editing and reality television require very different approaches. There is a huge difference in the skills of an off-line editor (story) and an on-line editor (technical accuracy) even if the two roles are filled by the same person.

So let’s start with what I think will take a long time for any computer algorithm to be able to do. There’s no project from current technology – in use or in the lab – that would lead to an expectation that an algorithm would be able to find the story in 40 hours of source and make an emotionally compelling, or vaguely interesting, program of 45 minutes. Almost certainly not going to happen in my lifetime. There’s a higher chance of an interactive storytelling environment à la Star Trek’s Holodeck (sans solid projection). Conceptually that type of environment is probably less than 30 years away, but that’s another story.

If a computer algorithm can’t find the story or make an emotionally compelling program, what can it do? Well, as we discovered earlier, not all editing is the same. There is a lot of fairly repetitive and rather assembly line work labeled as editing: news, corporate video, event videography are all quite routine and could conceivably be automated, if not completely at least in part. Then there is the possibility of new forms of media consumption that could be edited by software based on metadata.

In fact, all use of computer algorithms to edit rely on metadata – descriptions of the content that the software can understand. This is analogous to human logging and log notes in traditional editing. The more metadata software has about media the more able it is to create some sort of edit. Mostly now that metadata will come from the logging process. (The editor may be out of a job, but the assistant remains employed!) That is the current situation but there’s reason to believe it could change in the future – more on that later in the piece.

If we really think about what it is we do as editors on these more routine jobs, we realize that there are a series of thought processes that we go through and underlying “algorithms” that determine why one shot goes into this context rather than anther shot.

To put it at the most basic level, an example might be during editing content from an interview. Two shots of the same person have audio content we want in sequence but the effect is a jump cut. [If two shots in sequence feature same person, same shot…] At this point we choose between putting another shot in there – say from another interview or laying in b-roll to cover the jump cut. […then swap with alternate shot with same topic. If no shot with same topic available, then choose b-roll.]

That’s a rudimentary example and doesn’t take into account the value judgment that the human editor brings as to whether another interview conveys the story or emotion as well. Most editors are unfamiliar with their underlying thought processes and not analytical about why any given edit “works” – they just know it does but ultimately that judgment is based on something. Some learned skill, some thought process, something. With enough effort that process can be analyzed and in some far distant time and place, reproduced in software. Or it could except for that tricky emotional element – the thing that makes our storytelling interesting and worth watching.

The more emotion is involved in your storytelling output, the safer your job – or the longer it might be before it can be replaced. 🙂

Right now, the examples of computerized editing available now – Magic iMovie and Muvee Auto Producer use relatively unsophisticated techniques to build “edited” movies. Magic iMovie essentially adds transitions to avoid jump-cut problems and builds to a template; Muvee Auto Producer requires you to vet shots (thumbs up or down) then uses a style template and cues derived from audio to “edit” the program. This is not a threat to any professional or semi-professional editor with even the smallest amount of skill.

However, it is only a matter of time before some editing functions are automated. Event videography and corporate presentations are very adaptable to a slightly more sophisticated version of these baby step products. OK, a seriously more sophisticated version of these baby-step products, but the difference between slightly and seriously is about 3 years of development!

In the meantime, there are other uses for “automated” editing. For example, I developed a “proof of concept” piece for QuickTime Live! in February 2002 that used automated editing as a means of exploring the bulk of material shot for a documentary but not included in the edited piece. Not intended to be direct competition for the editor (particularly as that was me) it was intended as a means of creating short edited videos that were customized in answer to a plain language query of a database. The database contained metadata about the Clips – extended logging information really. In addition to who, where and when, there are fields for keywords, a numeric value for relative usefulness of the clip, a field for keywords to search for for b-roll [If b-roll matches this, search for more than one clip in the search result, edit them together and lay b-roll over all the clips that use this b-roll.]

So, right now, computer editing can be taught rudimentary skills. This particular piece of software knows how to avoid jump cuts and cut to length based on the quality criteria. It is, in fact, a better editor than many who don’t know the basic grammar of video editing. Teaching the basic grammar is relatively easy. Teaching software to take some basic clips and cut into a news item or even basic template-based corporate video is only a matter of putting in some energy and effort.

But making something that is emotionally compelling – not any time soon.

Here’s how I see it could pan out over the next couple of year. Basic editing skills from human-entered metadata – easy. Generating that metadata by having the computer recognize the images – possible now but extremely expensive. Having a computer edit an emotionally compelling piece – priceless.

It’s not unrealistic to expect, probably before the end of the decade, that a field tape could be fed into some future software system that recognizes the shots as wide, medium, close-up etc; identifies shots in specific locations and with specific people (based on having been shown examples of each) and transcribes the voice content and the text in signs and other places in the image. Software will recognize poor exposure, loss of contrast and loss of focus, eliminating shots that do not stand up technically. Nothing here is that difficult – it’s already being done to some degree in high end systems that are > $300,000 right now. From there it’s only a matter of time before the price comes down and the quality goes up.

Tie that together with a template base for common editing formats and variations and an editing algorithm that’s not that much further on than where we are now and it’s reasonable to expect to be able to input one or more source tapes into the system in the afternoon, and next morning come back to review several edited variations. A couple of mouse-clicks to choose the best of each variation and the project’s done, output to a DVD (or next generation optical disc), to a template-based website, or uploaded to the play-out server.

Nothing’s far fetched. Developing the basic algorithm was way too easy and it works for its design goals. Going another step is only a matter of time and investment. Such is the case with anything that is repetitive in nature: ultimately it can be reproduced in a “good enough” manner. It’s part of a trend I call the “templatorization” of the industry. But that’s another blog discussion. For now, editors who do truly creative, original work need not be worried, but if you’re hacking together video in an assembly-line fashion start thinking of that fall-back career.

5 replies on “Can a computer replace an editor?”

Leave a Comment
  1. An interesting read Philip. I don’t feel too threatened as an editor though. Editorial assistance/auto-logging, etc. is plausible, but as you say, creative/emotional is the toughie and that’s where I draw my job security from. As someone who is fairly familiar with current state of the art AI, and it’s significant shortcomings, I think it would be a very long time indeed before software could do the sort of creative ‘thinking’ needed for creative storytelling…even for corporate video.

    Nice to see a new blog late last night! Good job as always.

  2. Another thought-provoking blog, Philip!

    In the mid 80’s I recall seeing a seminar from a music professor who was doing research on “electronic” music, which was still pretty new at the time. He had an early Mac, and we all oohed and aahed over it’s abilities to play polyphonic music and imitate various “analog” instruments. One of his discoveries was particularly interesting to me, and I wonder how it applies to automated video. As the quality of the digital “performance” improved, the computer’s ability to play a piece with “feeling” and the natural ebb and flow of a live performer, rather than simply ticking off notes metronome-style with exact precision, the apparant quality of the instrument sounds got worse. They were always the same samples, but somehow listeners noticed the synthesized sound of the instruments more when they were played with “feeling”. I tend to think it was a contextual thing – computer sounds with a computer performance is a natural “fit”, but computer sounds with a natural performance was somehow jarring.

    This was almost twenty years ago, but I wonder if this kind of perceived discontinuity will be evident in auto-edited video.

    I could see this kind of auto-editing being a time saver for editors, simply by grouping subjects together with appropriate b-roll, doing simple string togethers that make it easier to see your footage group together by content.

  3. There will always be the few skilled craftsmen who continue to define the excellence of the art form while the majority of users lumber away refining adequate mediocrity.

    All “bad” techniques become acceptable if repeated enough. Look at the acceptance of jump cuts as a result of MTV exposure to work done by people who had access to equipment but not “training” in how to make information flow smoothly and seamlessly. Remember, too, the perennial desire to destroy tradition and explore the sensations available only through novelty.

    All breakthroughs eventually become cliches if widely adopted.

    Can algorithms be programmed to successfully mutate?

  4. All I want in the next 10 years is a reliable mouth-morph plug-in with a pitch-perfect mimickry audio flammerjammer and then I can fix any story that comes into my cybernated oxygen tent in two hours or less.

    okay, maybe three.

  5. And then on Judgement Day Skynet became self-avare and the machines started the war against man…

Comments are closed.
Send a Message