Category Archives: Interesting Technology

The Advantage of Web APIs

Web APIs (Application Programming Interface) allow us to send data to a remote service and get a result back. Machine learning tools and Cognitive Services like speech-to-text and image recognition are mostly online APIs. Trained machines can be integrated into apps, but in general these services operate through an API.

The big advantage is that they keep getting better, without the local developer getting involved.

Nearly two years ago I wrote of my experience with SpeedScriber*, which was the first of the machine learning based transcription apps on the market. At the time I was impressed that I could get the results of a 16 minute interview back in less than 16 minutes, including prep and upload time. Usually the overall time was around the run time of the file.

Upload time is the downside of of web based APIs and is significantly holding back image recognition on video. That is why high quality proxy files are created for audio to be transcribed, which reduces upload time.

My most recent example sourced from a 36 minute WAV, took around one minute to convert to archival quality m4a which reduced the file size from 419 MB to 71MB. The five times faster upload – now 2’15” – compared with more than 12 minutes to upload the original, more than compensates for the small prep time for the m4a.

The result was emailed back to me 2’30.” That’s 36 minutes of speech transcribed with about 98% accuracy, in 2.5 minutes. That’s more than 14x real time. The entire time from instigating the upload to finished transcript back was 5’45” for 36 minutes of interview.

These APIs keep getting faster and can run on much “heavier iron” than my local iMac which is no doubt part of the reason they are so fast, but that’s just another reason they’re good for developers. Plus, every time the speech-to-text algorithm gets improved, every app that calls on the API gets the improvement for free.

*I have’t used SpeedScriber recently but I would expect that it has similarly benefited from improvements on the service side of the API they work with.

Maybe 10 Years is Enough for Final Cut Pro X

On the night of the Supermeet 2011 Final Cut Pro X preview I was told that this was the “foundation for the next 10 years.” Well, as of last week, seven of the ten have elapsed. I do not, for one minute, think that Apple intended to convey a ten year limit to Final Cut Pro X’s ongoing development, but maybe it’s smart to plan obsolescence. To limit the time an app continues to be developed before its suitability for the task is re-evaluated.

Continue reading Maybe 10 Years is Enough for Final Cut Pro X

Looking back on 2017 on the Digital Production BuZZ

I was honored to be invited – as one of many – to provide my thoughts on 2017: what technologies were important, what major changes happened.

Here is a link to the full show –  
Here is a link to the Transcript –  
Or if you want to go direct to my segment:
MP3: 

NAB 1998 in Retrospect

Because I am researching my journey through my earlier writings on metadata and interactive story telling I came across my ‘review’ of NAB 1998 thanks to the Wayback Machine. This was the year everyone was coming to terms with ATSC – digital broadcast – and how it was to be implemented. From my review it seems my attention was on interactivity and QuickTime 3, neither of which is surprising.

Continue reading NAB 1998 in Retrospect

Transcription Services: State of Play

A few years ago, we considered supporting transcripts in Lumberjack System. At the time our goal was to quickly prepare for an edit, and transcriptions took days and cost serious money.

Two years ago we supported the alignment of time-stamped transcripts to Final Cut Pro X Clips and a year ago, introduced “magic” keywords, derived by a cognitive service. Since Lumberjack doesn’t (yet, I might emphasize) support a speech to text service internally, what are the options and what do they tell us about the state of play for transcription in April 2017?

Continue reading Transcription Services: State of Play

The Danger of AI Modeling

One of the powerful way Artificial Intelligence ‘learns’ is by using neural networks. Neural Networks are trained with a large number of examples where the result is known. The Neural Network adjusts until it gives the same result as the human ‘teacher’.

However, there’s a trap. If that source material contains biases – such as modeling Police ‘stop and frisk’ – then whatever biases are in the learning material will be contained in the subsequent AI modeling. This is the subject of an article in Nature: There is a blind spot in AI research  and also the praise of Cathy O’Neil’s book Weapons of Math Destruction that not only brings up that issue, but the problem of “proxies”.

Proxies, in this context, are data sources that are used in AI programs that are not the actual data, but rather something that approximates the data: like using zip code as a proxy for income or ethnicity.

Based on O’Neil’s book, I’d say the authors of the Nature article are too late. There are already institutionalized biases in very commonly used algorithms in finance, housing, policing and criminal policy.