The present and future of post production business and technology

What are the different types of metadata we can use in production and post production?

I’ve been thinking a lot about metadata – data about the video and audio assets – particularly since we use metadata extensively in our Intelligent Assistance software products and for media items for sale in Open TV Network. And the new “Faces” and “Places” features in iPhoto ’09 show just how useful metadata can be.

Back in the days when tape-based acquisition ruled there wasn’t much metadata available. If you were lucky there would be an identifying note on or with the tape. For linear editing that was all that was available at the source – the tape. The only other source metadata would be frame rate and frame size, and tape format and perhaps some user bits with the Timecode. With a linear system that was all you could use anyway.

With non-linear editing we moved media into the digital domain and added additional metadata: reel names; clip names, descriptions etc and with digital formats we’re getting more source metadata from the cameras.

But there are more types of metadata than just what the camera provides and what an editor or assistant enters. In fact we think there are four types of metadata: Source, Added, Derived and Inferred. But before I expand on that, let me diverge a little to talk about “Explicit” and “Implicit” metadata.

These terms have had reasonable currency on the Internet and there’s a good post on the subject at Udi’s Spot “Implicit kicks explicit’s *ss.” In this usage, explicit metadata is what people provide explicitly (like pushing a story to the top of Digg) while implicit metadata is based on the tracks that we inadvertently leave.

Actions that create explicit metadata include:

  • Rating a video on Youtube.
  • Rating a song in your music player.
  • Digging a website on Digg.

Actions that create implicit metadata include:

  • Watching a video on Youtube.
  • Buying a product on Amazon.
  • Skipping past a song in your music player as soon as it gets annoying.

We didn’t think those terms were totally useful for production and post production so instead we think there are the four types noted above.

Source

Source Metadata is stored in the file from the outset by the camera or capture software, such as in EXIF format. It is usually immutable.  Examples:

  • timecode and timebase
  • date
  • reel number
  • codec
  • file name
  • duration
  • GPS data
  • focal length, aperture, exposure
  • white balance setting

Added

Added Metadata is beyond the scope of the camera or capture software and has to come from a human. It can be added by a person on-set (e.g. Adobe OnLocation) or during the logging process. Examples:

  • keywords
  • comments
  • event name
  • person’s name
  • mark good
  • label
  • auxiliary timecode
  • transcription of speech (not done by software)

Derived

Derived Metadata is calculated using a non-human external information source. Examples:

  • speech recognition software can produce a transcription
  • a language algorithm can derive keywords from a transcription
  • locations can be derived from GPS data using mapping data (e.g. Eiffel Tower, Paris, France) or even identifying whether somewhere is in a city or the country
  • recalculation of duration when video and audio have different timebases
  • OCR of text within a shot.

Derived metadata is in its infancy but I expect to see a lot more over the next few years.

Inferred

Inferred Metadata is metadata that can be assumed from other metadata without an external information source. It may be used to help obtain Added metadata. Examples: 

  • time of day and GPS data can group files that were shot at the same location during a similar time period (if this event is given a name, it is Added metadata)
  • if time of day timecode for a series of shots is within a period over different locations, and there is a big gap until the next time of day timecode, it can be assumed that those shots were made together at a series of related events (and if they are named, this becomes Added metadata)
  • facial recognition software recognizes a person in 3 different shots (Inferred), but it needs to be told the person’s name and if its guesses are correct (Added) 

We already use inferred metadata in some of our software products. I think we will be using more in the future.

So that’s what we see as the different types of metadata that are useful for production and post production.


Posted

in

, ,

by

Tags:

Comments

4 responses to “What are the different types of metadata we can use in production and post production?”

  1. John A. Mozzer

    In your paragraph about “the days when tape-based acquisition ruled…”, you mentioned timecode, but you didn’t mention the recording date and time. I know from personal experience that miniDV camcorders record the date and time in every single frame, in addition to timecode. And I believe that all of the professional DV camcorders do as well. Recently, David Pogue wrote an article, “Moving Taped Past to Hard-Drive Future”, New York Times, April 14, 2010, about his experience capturing family miniDV tapes with Final Cut Pro, and his horror when he discovered the date stamps were missing. He started his project all over, capturing into iMovie, in order to see the date and time information. Actually, I found out that Final Cut does not discard the information — it is actually still present with the DV data in the Quicktime file(s). But as you know, Final Cut doesn’t use it.

    1. DV Timecodes covered both Time of day and “SMPTE-like” timecodes. Date was optional – some recorded it, some didn’t. No professional NLE uses it. (Years ago Radius had a DV editing tool that showed the TOD TC as well as the “SMPTE-like” TC.

      Philip

      1. Tbob

        “No professional NLE uses it” is a strong statement. While it’s not an NLE, you can view both in REDCINE-X for RED cameras, as RED has realized both are useful.

        1. Philip

          As I was specifically referencing DV Timecode (time of day) I’m not sure how REDCINE X is relevant as it cannot read DV/HDV time of day TC either.