Metadata is Expensive

As big a fan of all metadata as I am, I do acknowledge that the most useful metadata is also the most expensive to obtain. Technical metadata from the camera comes at almost no cost: set the camera correctly and metadata on frame size, codec, frame rate, Timecode, Time of Shoot, etc comes effectively free.

Not so content metadata about where the shot it, who is in the shot, what people are saying, what people are talking about (requires understanding), or where people are. That metadata takes time (and mostly human involvement) to add, making it quite expensive.

Back in 2008, when we released First Cuts for FCP, we knew the power of metadata to kick start the editing process for non-scripted production. First Cuts didn’t reach its potential because of the expense of the metadata offset some of the benefits.

That’s why I am so interested in the potential for Machine Learning to reduce the cost of acquiring Content Metadata. Once we can derive the metadata affordably, we can use that to kick-start the creative process and avoid the paralysis of an empty timeline!

With a Little Bit of Luck

Over the last five years I’ve been learning to sing for my own enjoyment. Over that time I’ve met many talented students of the same vocal coach. It’s a constant reminder that there are way more talented people, than there are parts and fame for them!

The opening montage in A Chorus Line also demonstrates how few opportunities there are for talented dancers. Getting a long running show (on Broadway or TV) is a life changing event, but getting one usually involves a dose of good luck.

For every talented actor, dancer or singer that breaks through, there are dozens – probably hundreds – of equally talented people who were not in the right place at the right time. Who’s parents couldn’t get them to that audition, who couldn’t get time off parenting to make their first single, or however life interferes.

Obviously we prepare for the opportunity (although I doubt I’ll ever be talent scouted for my singing!) because having the opportunity and being unprepared won’t work either. The more prepared you are, and the more ways you are presenting yourself, the more the opportunity might happen.

Luck is morally neutral. Good luck happens to some, and not to others. Bad luck also happens indiscriminately. Like I said yesterday, life is unpredictably good and bad.

There are no small decisions!

We rightly focus on the “big” decisions of life: high school major, college choice, partner, where to live, what job to take, etc. These are important decisions and will affect your future.

The funny thing is, some of the smallest decisions I’ve made, have had the biggest affect on my life. Deciding to leave a fairly boring nightclub to go to another, and overhearing a comment that may have been about me, and therefore changing my mind, turned out to be close to one of the most important decisions of my life, as I met Greg that night.

Some decisions are obviously “big,” but five seconds either way could be the difference between being in the path of an out of control car, or not. A life changing moment one way, barely noticed the other. Five seconds.

A stray bullet could radically change the path of our lives, with zero forethought or planning.

Or a chance meeting in a hallway at NAB might lead to one of the most productive partnerships or radical innovation the industry has seen!

It’s probably best that we don’t realize there are no small decisions, because to give every minor decision the attention it could deserve, would be crippling. I find it much easier not to dwell on it too much.

How to teach a Machine

In a follow up Tweet to my comments about Resolve 16, the Tweeter suggested they could use the Neural Engine to “improve translation of XML from other apps,” which led me to spend a little time (oversimplifying) how we train Machines for Machine learning.

There are three approaches: training sets, oppositional training, simple goal.

Most Machine Learning neural networks are trained using massive amounts of “qualified” data: examples that have been tagged with the desired results to train the machine, then more known examples to test the machine. This has been the approach used for facial recognition, speech-to-text, emotion extraction, color grading, etc. Most of the things we’ve known Machine Learning for so far, have been trained using this type of data set.

Oppositional training is where you have one machine – for example – trying to create a human face, and another machine determining if what is presented is a human face. Because these are machines they iterate very, very quickly and some amazingly realistic human faces have resulted.

Then there’s the Simple goal. The clearest example I’ve seen so far is where a bipedal stick figure in a simulated environment was given the challenge “Stay upright and move forward.” After millions of iterations and experiments the goal was achieved, so they made it more complex with terrain.

Given those gross oversimplifications of some very clever technology, let’s examine the idea of improving XML translation. I’m not aware of any existing training set, or how you’d go about creating a training set that had perfectly matched timelines in two different apps, and the XMLs that they represent. The matching timelines would have to be built manually (or corrected manually) so they were a perfect representation of the other app’s timeline. Not particularly practical.

I don’t see how we can simplify the request to use either oppositional training or boil it down to a simple goal.

Now, another Tweeter suggested they use the Resolve Neural Engine to improve cut detection, and that’s an entirely reasonable, and feasible goal, as we have a substantial body of timelines with EDLs representing the cuts.

Ultimately, what we can do with Machine Learning will come down to how we train it, which is why I am not expecting a Machine-based editing tool for a very long time.

Resolve 16 and Machine Learning Thoughts

While I’m still to see a demo, there are few announced features that I definitely think are in the right direction, particularly those driven by Resolve’s “Neural Engine.” It seems, like Adobe Sensei and Apple’s CoreML, to be a playback engine that implements the Machine Learning models in practical tools.

Improved retiming, facial recognition, color matching, color balance and upscaling are the first implemented features. These are in line with what I have been expecting from ML: smart features that make the process easier for editors. All Resolve’s current Neural Engine driven features are better ways (faster) to do things we’ve been doing for years (other than facial recognition).

Adobe have already implemented ML driven features in their apps, and marketing tools. I’d hope that Avid have taken likely future use of ML into account in their NAB-announced revamp of Media Composer, although I doubt we’ll see any ML driven features there for many years to come, for two reasons. The Media Composer market is largely not ready for it, and Avid will have enough on their plate bringing a newly rewritten version to maturity. I expect we’ll see ML in Media Central before any desktop app.

Apple have made very good use of ML across all their products. It’s why Apple Mail predicts mailboxes for you, among dozens of little features across their devices. The have an excellent playback engine in CoreML, and developers of macOS or iOS have access to some of IBM Watson’s models. The Pro Apps team were also advertising a position for a Machine Learning specialist in late 2017.

Thanks to the Content Auto Analysis (that no-one uses) they even have a pipeline within the app to bring ML derived keywords into the app.

I’m sure we’ll see ML driven tools in FCP X at some future time. It largely depends on the priorities within the Pro Apps Team. I’d love to see a big focus on ML in a future FCP X, but there are those who would rather see collaboration, dupe detection or scrolling timelines.

What Drives NLE Choice?

One of the best things about conferences (like NAB) is the opportunity to discuss topics of the day with other smart people. I frequently imagine that the best part of being in a larger company would be the robust discussions during feature development.

In one of those discussions, I found myself saying “Track based NLEs are for people who make video for others; FCP X is for people who make video for themselves.” Even if true it would be a gross oversimplification, but indulge me for a minute.

The original NLE for making TV for a lot of “someone else’s” is of course Avid’s Media Composer. Absolutely track based.

The NLE’s that followed – Media 100, Final Cut Pro (classic), etc – were all track based and shared conceptual similarities with Media Composer, because that’s what people expected. That they were used to make video for other people was almost a given: that gear was expensive so it had to generate a good income.

NLE software became affordable when it was separated from a hardware component, so FCP Classic started to empower those who would have otherwise gone to a production company, to start doing it themselves.

At that time I was still producing video for others, but our good friends in Sydney, who “produced video” by hiring camera operators, cameras and edit bays and expertise, bought a DV camera and FCP and started doing it themselves. It was FCP 1.2.5 that bought us together when I answered a forum question for them, before we realized we were in adjacent cities.

All generalizations are going to be inaccurate somewhere. Trackless FCP X is definitely being used by those making videos for others. And Resolve, Media Composer and Premiere Pro all have individual users making video for themselves, but those using track based tools, tend to have been in the industry the longest, and they know track based tools because the only way we could afford them was by making video for others.

Now there are a hundreds of ways people making video for themselves as part of their primary job, or as their primary income, and they tend not to have come out of a traditional production workflow. FCP X is definitely easier for those people to pick up and be productive, which I think is why it is so popular (outside the traditional industry circles).

Some Reflections on NAB 2019

NAB 2019 was almost like returning to the past as were were podcasting from the show floor, just like 2004 – 2007! Back then it was DV Guys/Digital Production BuZZ and we had a huge case of gear for an audio only show.

Fast forward a decade and a half, and we’re live streaming with Switcher Studio, adding multicam video switching to the live stream! Of course, we all know that no-one watches live streams, but we had some quality interviews recorded for OWC Radio hosted by Cirina Catania interviewing creatives about tech.

This year’s production gear would fit in a bag about a quarter the size of our audio only rig of earlier days! Not to make it easy we added lighting.

Greg mans the control table for audio (the mixer), video (the iPad) and screen support material (the MacBook Pro).

In return for assistance with the podcast OWC gave us a home on the corner of the booth for NAB. Let’s just say that OWC Radio got a lot of attention while the corner table was mostly empty.

Lumberjack System was one of the sponsors of the Content Creators Celebration where we mostly handed out finger lights and glow sticks, while talking to people about how Lumberjack gives them more time to be creative.

The highlight of the week was, without a doubt, the Faster Together Stage where LumaForge did all the heavy lifting organizing a truly inspirational evening. The Faster Together stage replaced the Supermeet on the Tuesday night. Dan Berube and Michael Horton – the people behind the Supermeets – called an end to a successful 18 years in January of this year.

Now, I’ve heard some criticism that the Faster Together stage was “very corporate” because one company was hosting, but in reality, the event was way less corporate than the Supermeets, where the stage presentations were mostly sponsored. Occasionally Avid, or Adobe would put up a creative on stage, but mostly it was “dog and pony” shows of the latest, just announced, features of their flagship apps. Not that there’s anything wrong with that.

When I first talked with Sam Mestman about the Faster Together Stage he indicated he wanted to move the event back toward creativity and community – away from the corporate approach!

He, and a very dedicated team from LumaForge, delivered on that with a stage full of editors, producers, colorists, YouTube stars and technologists. Not a corporate or product presentation in site.

Our Faster Together table highlighting all the ways that Lumberjack System gives you more time for what you are really passionate about.

We had planned to demonstrate the entire Lumberjack Workflow at our table, but I didn’t expect our reliable NEX 7 to simply stop recording!

As a team, Lumberjack was able to contribute to the on stage presentations by doing what we started doing; logging during the shoot. Chris Fenwick, Alan Seawright and Brad Olsen (and probably others) had planned to interview guests at the event before it started, asking them six questions like what inspired them to get into production, and edit those into six on stage presentations during the event.

Ambitious yes, but Chris recruited our Greg Clarke to log during the shoot, and used Lumberjack to do six string-outs, one for each question. This gave Chris a huge head start and he made the screen time for the presentations. After the event he posted on Twitter:

Could NOT have pulled it off without @philiphodgetts and Greg from @LumberjackSys – their logging system made the quick turn possible. I’m SO thankful they were able to assist. 

Chris Fenwick @chrisfenwick

After more interviews for OWC Radio on the Wednesday, we packed up the production gear on the booth, had a relaxed evening when we finished, and returned home on the final day of the show.

I Am Grateful for My Friends

Driving back from NAB last week I realized how grateful I am to my friends, not only for their friendship, but what I’ve learnt from them.

From Larry Jordan I’ve learned to be a better talk show guest. Larry always makes sure his audience “keep up” with the conversation, and has frequently guided me to “back up a little and explain” before we go forward. He’ll probably need to do it again, but I am learning to make sure those listening are understanding.

From Sam Mestman, Cirina Catania and Michael Horton I am relearning the importance of community, and that we’re not in the software business, we’re in the business of giving people time. Time they can spend doing what’s really important to them.

From Cirina I’m learning, not only how to be more positive, but also how to simplify topics when I discuss them.

I’m sure there are dozens of other lessons I’ve learnt from friends, but these come to mind right now.