Revinar

Add Intelligence to your Media with Rev + iconik



Webinar Transcription

Danny Lambert
We have a really exciting topic for today's webinar. It's myself, Danny Lambert, on the marketing team here at Rev, in partnership with iconik and Mike Szumlinski, who you see beneath me. And we're going to be covering how to add intelligence to your media. What we mean by that is using the power of artificial intelligence to make managing your media assets a lot easier, a lot more efficient and a lot faster.

So before we begin, let's go through a couple of housekeeping items that will make your experience much better as well as ours. So this is a webinar, which means it's view-only for attendees. You should be able to hear and see me and Mike, but we can't hear or see you. So if at any point during the webinar, you feel that you would like to ask a question or chat in, at the bottom of your screen, you will see a Q&A option. If you can use that for questions, it makes it easier to field the questions and answer them at the Q&A section that we'll ultimately have. Then if you just want to chat in any comments or questions, you can also do that in the chat section in the bottom as well.

We also have something really special for you. We have live captions available on this webinar. So if you go to the bottom of your screen and you see a closed caption option denoted by a CC icon, you click that, you can show or hide the subtitles and you can change their formatting. What's really cool about that is they're going to be talking about Rev AI's artificial intelligence and ASR technology, and these captions are actually being powered by that as we speak. So dogfooding, if you will, that is the ASR technology in action and it's really cool to experience. We'll also be sending out a recording of this with captions and the slides probably two days or so after the recording of the event happened. So if you had any questions or you want to share this with anybody, we will be distributing that after this event is over.

So before we begin, again, my name is Danny Lambert. I'm on the marketing team here at Rev, and it's an absolute pleasure to have Mike Szumlinski on with me. He's the chief commercial officer at iconik and is going to be doing a really also demo and intro to how iconik works and the partnership between Rev AI and iconik. Mike, thank you so much for joining. Do you mind just giving a brief introduction to you in your words for the people who are on today?

Mike Szumlinski
Yeah. Thanks Danny. My name is Mike Szumlinski, as Danny said. I am the Chief Commercial Officer for iconik. So as you guys know, I'm the guy that talks to you all about the product and convinces you that it's worth taking a look at. I've been working with the team since its inception and been with the company in some way, shape or form for about seven years now. And I've been working within media and entertainment around data management, archival strategies, that sort of thing for around 20 years now. God. Man. So yeah, that's a little of my background. I'll hand it back over to you Danny to dive in.

Danny Lambert
Yeah, I absolutely love the pre and post COVID headshot, the beard before and the beard after, onsite and now at home. It's like the perfect reality.

Mike Szumlinski
Outside and inside, right?

Danny Lambert
Awesome. So what we'll be covering today, we'll talk a little bit about managing media assets with artificial intelligence and go into improving your production workflows with AI and rich metadata. And then the real meat and potatoes, the exciting part is Mike is going to actually go into a live demo of how you can use artificial intelligence using iconik to make your current media workflow faster and easier. And then as I mentioned, we'll open it up to a Q&A section at the end, so get your questions in while we're presenting so we have a chance to get to all of those.

So talking about managing your media assets with AI, I think this is one of the most interesting things is artificial intelligence, AI as a category has probably been one of the biggest buzzwords over the past couple of years. And much of the time people don't know how it's impacting their world, don't know if it's actually a real thing, or if it's just a buzzword used by brands to sell you all sorts of shiny objects. Our goal in this presentation is to pull back the curtain on that and show you exactly how you can and should be using artificial intelligence to make your workflow in your life, both personal and business, easier.

So if you've ever worked with Rev or heard of Rev, you're probably most familiar with our human services. So human transcription, captioning, subtitles, where you take an audio or video file and you give it over to Rev, and then we have human transcriptionists on the other side process that, usually in less than 24 hours and return back a caption, subtitle, or transcript file. The difference between that and ASR, automated speech recognition, and what Rev AI provides is this is automatically processed audio and video. So it's a speech recognition engine that's trained on over 50,000 hours of human transcript and processing millions of minutes of audio and video monthly to refine itself and always train itself to become a more accurate speech recognition engine so that you can feed in audio and get back out transcripts and captions. And ultimately the output of that is a highly accurate, easily integratable API, that you can plug into any software that you want to use and output captions and transcripts like Mike is going to show how iconik was able to do it.

If you're new to ASR, or even if you're experienced with it, one of the biggest performance indicators of how good a speech recognition technology is is the word error rate or WER. And this is essentially the calculation between the amount of words that speech rec engine processes and how much error rate it has on a per word base. It's a little bit more complicated than that, so I'll actually share a blog article in the chat that talks a little bit more about how word error rate is calculated. But in a study that was done between different vendors of ASR technology, us, Google, Speechmatics, Amazon, Rev actually had the lowest word error rate, and by a pretty decent margin. You can see we're about 15.7% to Google's 20% to Amazon's 27%. So this is what we're talking about with constantly processing millions of minutes of audio. The speech recognition engine just learns from it and gets better and better and ultimately more accurate.

People are shopping for ASR services like ours, it's usually a combination of word error rate from an accuracy perspective and how easy it is to integrate with Rev as compared to maybe some of the other options on the market. So I'm sure Mike is going to get to both of these things during his time talking about iconik's process of integrating Rev AI into the iconik product. So without further ado, I'll kick it over to Mike and Mike can tell you a little bit more about iconik, what they do, and then ultimately show us how it works.

Mike Szumlinski
Great. Thanks Danny. So, yeah, as Danny said, we're a media management, or really a collaboration tool within the cloud. So it's a software as a service platform. And one of the things that we really focus on is being able to organize, find, and kind of recover media content over time. And that's done through a variety of different ways. Metadata tagging is a very large part of that. And by integrating AI to when we first launched iconik, it had simple AI object recognition tagging that was available. But now we've got the integration with Rev that allows us to also use transcription as searchable media or metadata across all of your different items.

So what this really allows you to do is not only search for what is in the shot, but what is being said in the shot or what was being said in the audio feed too, because we have quite a few customers using it for both audio and video. And when we dive into the demo a little bit further, you guys will get to see that. What this really allows you to do is streamline how you organize your content. And because iconik is what we call a hybrid cloud model, that means that the high resolution data actually can live in the cloud or on premise. So unlike a lot of other solutions where it's going to require you uploading 100% of your content to the cloud, we accomplish that by bring your content to the cloud, we accomplish that by allowing you to bring your own storage on prem or from remote cloud providers, generate low resolution proxies from that content, and then we can use that oftentimes to actually generate the AI tagging, meaning that if you have that 600 terabytes worth of data, you don't have to move that all to the cloud to make it searchable with cloud services. And if you have a mix of different vendors or providers for your different storage, that all works as well. And when you tie this all together, it means that we're able to build this organizational structure that lives as a super set of your storage, as opposed to the past, where your storage was your only organizational structure, and you were always trying to tie the storages together under one big namespace. Now disparate storages can be brought together, searched by any different number of metrics, and that data doesn't all have to live in the same place. So we can kick over to the next one here.

So again, our focus within the scope of AI is to provide visibility into the data that you have inside of your system. So whatever data is important enough for you to keep around, but maybe not important enough for you to have a human being going through and tagging, we can oftentimes recover a lot of value of that data by simply AI object tagging it or transcribing that data. And that's going to allow your production process to really grow quickly, in the sense that if you have all of this archive content and this archive content is functionally just in files and folders, but there's 10,000 hours of spoken word in there, how do you know who said what when without having this sort of data associated with it?

So you can go to the next one now. So this is where we integrate our AI-powered ... Or, well, it's Rev's AI-powered transcription behind our entire model. And what we're able to do is very, very simply integrate any bit of content that has audio tracks to be sent automatically to Rev. We're going to convert that voice to text. We are going to timestamp every word in that so we know exactly where it is. We'll be able to chase that while you're playing back the content. You'll be able to actually fix that data, so if there are errors, Rev is pretty darn good, and I believe it was 86th percentile, Danny? There's still 14% of things that you might want to fix for future. You can fix that right within iconik and keep it stored there forever, and you can also use that as an export tool now. So any additions or modifications you've made to the transcript inside of iconik can be exported as just raw text transcripts, or even captioning files. So we're going to show all of that here in a few seconds.

So, again, just to kind of reiterate what we're talking about here, within iconik, we have the ability to share any content wherever it lives across anywhere in the world, because the proxies and the metadata live in the cloud, meaning even if you have that large on-prem sand, you can open up access to outside users, to data. I'm sure many of you have been in scenarios where you've got a large shared storage system, but you've got other stakeholders within the business that can't access that sand, or even worse, can access the sand or NAS, but don't even have the software necessary to play back the kind of content that's on it. So iconik opens it up that way.

The other option that you've got is that iconik really does allow you to find an asset from multiple paths. Not only do we have the search capabilities, we also have these things called collections, which are virtual folder structures, and we also have saved searches, and we also have the ability to do relationship management within the system, which means just relating individual assets to each other via arbitrary relationships, and that means a lot of different rabbit holes you can dive down to find the same piece of content in multiple different organizational structures.

And this all flows right into Adobe, meaning that you can take all of this information, find what you're looking at, even make markers and notes that can flow into Premiere as well, right from the transcripts, and really when you tie all this together, this just makes it easier to drive forward some of the things that you are looking to do.

So with that said, I think we can dive into a little bit of a demo here.

Danny Lambert
We'll stop sharing my screen and I'll let you pick it up.

Mike Szumlinski
Yep. So Aaron will share up mine here, and we will do desktop one. So we are now inside of iconik, and the very first thing I'm going to do is just to kind of show off some of the speed that's capable within Rev. So I'm going to search for a clip that some of you may have actually been on a while ago, but we did a remote edit webinar, and this remote edit webinar was something that Tim and myself from our team did. And you can see it's a hour and 20 minutes long. You can see my beard's a little bit shorter, so it was a little while ago, and I'm not in my kitchen anymore. So hey, man, times change. But what I'm going to do within iconik is I'm simply going to click "transcribe" button, and it's going to say, "Hey, which language do you want?" This is in English, so I'm going to go ahead and hit " transcribe," and what this is going to do is actually submit that transcription job to iconik in the background.


And the reason I wanted to do this first is just to get a few guys an idea of how quickly this thing actually goes. Because an hour and 20 minutes, I'll let you guys make up your expectations of how fast do you think this should come back. In the meantime, I'm going to hop in really quickly to some content here that I've already run a transcription against, and lets you see kind of the different views within iconik. So this is another longer form clip. This was about an hour and 11 minutes, that I did a little while ago, and I can very quickly just click on any word within the clip itself, and that will hop me immediately to that location within this particular clip.


The other thing I can do is if I just hit play, it's actually going to chase along with exactly what's being said in realtime. So as people speak, we're going to get the feedback with this little white outline in realtime, and I've muted the audio right now, but I can unmute it real quickly.

So hopefully that picked up in some of my microphone there, but you can see that we're very quickly just making notes about this, or getting that feedback in realtime.

The other thing I have the ability to do is you can see that there's diarization here. So there's different speakers. Those different speakers have different names, and Rev gave us back the fact that there were multiple speakers, but in this case, I actually know that these people have names, and who they are, and that sort of thing. So I could very simply go up here and say "manage speakers," and I was able to go through the various different speakers and give them names so we knew exactly who was talking at any given time. A pretty powerful capability as well, so you're not just stuck with Speaker 1, Speaker 2, when you're searching in the future.

You can search inside the transcription directly from the page, too, so if you search for certain phrases or whatever, we immediately bring that back. We highlight them if they're said multiple times. You can see that "laptop" was said a few times within this presentation. We can hop right to one that gets said, whatever particular time it is. And we can also switch between this kind of digest view, into what we call common view. And what this allows us to do is modify entire statements. So if, for some reason, the search statements. 

So if for some reason, the search came back incorrect, and I'm going to search really quickly for iconic here, because I know that sometimes iconik is spelled incorrectly. I can go in and at the word level, I can fix it. And that's word for word timing. So we can see exactly where that is. But sometimes it's not easy to do that. Sometimes you actually want to do this at a phrase level. So if we kick into timing phrase view, we can say "I'm having beer for the nuns". Maybe that wasn't what we actually said. And maybe we can say nouns, I don't know what was actually said here, but we can change the entire phrase and it maintains that timing within the clip. And we can still click to any one of these phrases to play just that in to out point. If we click the little play point, it plays just that in to outpoint as part of this particular clip.

So you can see this one was one minute 32 seconds, seven frames to one minute 41 seconds and seven frames. So it was exactly nine seconds long in this case that that in to out point played by going to here as well. So lots of abilities to just change the various different people. And this is where we can also change who the speaker was. So if Johnny didn't actually say this, maybe he sounds a little bit like I do. And I was the one that said it or Jason said it, I can just kick over to that other person and then get the appropriate transcription back at the same time from that particular item.

Now, the other thing we can do within this is we can also highlight entire phrases. So maybe I wanted to highlight this phrase right here, because we're doing a doc series and I want to turn that into a marker for Premiere. So I can say use this in my cut and make a quick comment. And that makes a time-based comment and those comments can flow as markers into Premiere. So if I pulled down Premiere onto this screen really quickly, so you guys can see it and log into iconik, back to Premiere, bingo, and we'll go open that exact same clip.

And what I can do in this case is I can open either the asset or the asset proxy. If I open the asset, that's actually going to point to the high res. If the high res is locally available to me on my local file system, it'll just link to it immediately. But in this case, it's in the cloud and it's an hour and a half long and I don't want to download an entire hour and a half right now while we're on this webinar. So I'm going to click open the asset proxy. And what this is going to do is now just grab the proxy of that item and we can track the status of how that's going. Hopefully my Internet's working pretty well right now so we can go take a look at our transfers, and see that we've already got half that hour and 20 minute long call or clip downloaded.

And by the time that this clip is done, it'll just throw it right into the Premiere timeline for me. And there we go, boom, importing the files. And there's our clip. You'll also notice that there's a marker in the clip already. And you'll notice that the marker is, "use this in my cut with my in to out points" already associated with it. So I used the transcription that came from Rev inside of my iconik panel here. And I used that to make more or less a paper cut and then just converted them really quickly in the comments that gave me all my in and out markers within Premiere as well. Now, above and beyond this, a lot of this goes down to search. So we went to this particular clip here. We saw it. We saw that there was a transcript. Maybe sometimes you want to search at a higher layer.

So what you can see now that there is transcription text over on the side here, and I'll do that same search term. I will look in all my transcripts for anytime anybody said the word laptop, and we can see that these two clips came out. And you'll also notice that this thing that didn't have a transcription about four minutes ago, that was an hour and 20 minutes long, already has a transcript back because it returned results with the word laptop. And if I hop into that particular clip and look at it, now we can see the transcript that got returned from that entire hour and 20 minute long discussion, where I clearly was very wordy because I was speaking most of the time.

But in general, this means that we've got a very, very strong tool for using AI and transcription to drive content search, to drive editorial review and to drive... Especially through long form archival, or if you're in documentary filmmaking, the ability to find when certain phrases are being said quickly, just by running through transcripts. And I think it's pretty important to note here too, that the cost through iconik of using the Rev service is actually a little under $2 an hour. So running hours and hours and hours of footage through the system is not cost prohibitive like it is in some other services. So with that, I think I will hand it back over to Danny because I think we've covered most of the demo around the transcription service and we'll take it from there.

Danny Lambert
Mike, while you still have the demo up, do you mind just showing what the process is like for people who either already use iconik or those who would like to, to actually set up the integration between the two?

Mike Szumlinski
There is no integration between the two. When we ship iconik and you have an account, you simply go to any asset that you want and click the transcription button. You're going to get the transcription back. If you have a enterprise agreement with Rev where you've got a purchase discount pricing, we also support bringing that as your own AI. So you can say, "Hey, I've got a new account and it is with Rev." And you put in the access token that's provided associated with your account and then it'll bill directly to Rev instead of us. That said, don't tell the guy on the phone, we're actually a little bit cheaper than just using Rev if you don't have an enterprise account. So we're a pretty strong driver of being able to do this stuff cost effectively.

Danny Lambert
Yeah. Mike, I can't tell you how impressive, not only the response of the transcript is, but just being able to search all of your assets, being able to get those cuts into Adobe Premier in such a seamless fashion. I mean, it's really, truly incredible from what the process would be if you're waiting on someone on your team to transcribe it or either a human version and then just being able to integrate all of it so quickly, is truly impressive.

Mike Szumlinski
There's actually one other thing I did forget to show while we were going through this. And I mentioned this when we were speaking a little earlier, but you'll also notice we have this download as text and download as web VTT option. This actually allows you to download that diarized transcript with the modifications you've made as just a straight text file. So here we go. And now I can open this up and Word and throw it into a Google doc, do whatever I want it to do and have just straight text to read.

We also support doing that as a web VTT file, which is an actual true transcript file. So now you can see if I open this up, it's actually got all the timing markers per those statements in the standardized format with the timing that matters for web VTT. And pretty soon here, we will also support SRT, SCC a few other formats as well for conversion. And believe it or not import as well. So if you are just... You've been using Rev for a while, you already have a lot of captioning information that is associated with your assets. You don't have to necessarily re-Rev them to speak of. You can actually import the caption files themselves as transcripts. They won't have diarization obviously, but you'll at least get the times transcript in to the system and then everything moving forward, you could use the AI to generate. So yeah.

Danny Lambert
That is really impressive. Mike, if you wouldn't mind dropping your screen share, I can pull up just our close out here. All right.

All right, so before we hop into questions, if any of you are interested after the fact, organizing all their media assets in one location, using the AI that Mike had shown you and making all of your assets searchable and streamlining your workflow, you can visit iconic.io and start using this today. I mean, it's immediately available. It's not some beta or anything like that, this is a process that you can start using immediately for your media workflows and if you are at all interested in the Rev AI API for any of your processes, you can just go to revai.com. So I know we saw one question in here for any support for Final Cut Pro. Mike, do you want to speak to that at all? I know you already typed in an answer for that.

Mike Szumlinski
Yeah, because I can speak it out loud too. We are planning on releasing our final cut X in either Q4 this year or early Q1 of next year. We were just waiting for the latest FCPX to drop because it had a lot of fun new features we wanted to implement and that just happened about two weeks ago, a week and a half ago or so. So now we can get to work on encompassing all the things we wanted to do in the early days with Final Cut X and Iconic.

Danny Lambert
Awesome. I'm not seeing any other questions. Let me stop my share, make sure there aren't any in chat. Oh yeah. How does it work if a video file already has a transcription that we paid for and is corrected? I think Mike, you might've touched on that at the very end there, but you may want to reiterate the upload feature.

Mike Szumlinski
Yeah. So we're going to be adding very soon the ability to import existing closed captioning files. So if you have taken a transcript, modified it, and saved it into a captioning format that we support. So SRT, SCC, or the most popular, or Web BTT are the most popular by far. We'll be able to actually upload that file associated with an asset and basically convert that into our transcription view so that you can continue to use it that way.

Danny Lambert
Thank you, Mike. Any future support for Rev non AI human processed services?

Mike Szumlinski
So there is an API for that. If there's enough demand from our customers, I don't see why we wouldn't take a look at it because obviously we've built the framework around that. We do have concepts of multi day jobs right now in the sense that we have support for Amazon Glacier, which can take 48 hours or 72 hours or something like that. So the concept of sitting around and waiting for something to happen that is external to a computer process is possible within Iconic. We would just have to have enough customer demand to drive that moving forward.

Danny Lambert
Thank you, [inaudible 00:02:55], for that question. It looks like that is it. Mike, would you share this recording as you mentioned? Oh, sorry, I'm making a documentary and wants to have all my assets together and searched, not a CC file.

Mike Szumlinski
So unless your data has timing associated with it, we're not going to be able to do, obviously, all of the timing information. Associating text with a transcript isn't that difficult as what we call a segment, but we'd have to use the API to do that. So you could essentially extract all that data, make it a transcription based tag inside of our system and it will show up, but if you don't have timing information, your searchability is only going to be limited to the top level asset itself and if that's a half hour long clip, you're still going to be manually searching for it. We won't know at what second things happen, but it is doable. It is not a point and click sort of thing to do.

Danny Lambert
And then they followed up with timing, but does it read time codes? I think that's what was asked.

Mike Szumlinski
It has to be in a format that is a standardized format. So if you just simply have a text file that just has raw time code at various different places, that's not a standardized format. So it wouldn't necessarily be easy to do. It would be doable with our API and scripting and we've got integrators that can help with that as well. So it's not that it's impossible by any means, but it's not going to be a point and click or a drag and drop sort of thing.

Danny Lambert
Awesome, it looks like that is all of the questions we've received. Mike, if people have questions when I'm watching the recording or afterwards, if they think of something, what's the easiest way for them to shoot those questions over to your team?

Mike Szumlinski
You can just hit us up at info@iconic.io, if you want a general inquiry. You can hit me up, it's just mike@iconic.io. If you'd like, know full well that I get a lot of emails so response times may vary and also if you're an existing customer and you just want to learn more about this, you can always open a support ticket with us or reach out to your CSR and they could also help you with that sort of thing.

Danny Lambert
Awesome. If you ever want to ask any Rev AI related questions, you can always email me directly. It's danny@rev.com. While we're saying that Mike, we did have another question come in and it was, what is the price?

Mike Szumlinski
So all of our pricing is based on, primarily we're consumption based pricing, meaning how many people log on a calendar month, how much storage is used, primarily for proxies, we don't charge you for your own storage. And then the AI itself obviously does cost money to run through. That's all available on iconic.io. If you click the plans button right up at the top of the page, there's actually some estimates and then if you click the little create custom estimate button midway down the page, there is actually a full on calculator so you can play with all the different pricing and see exactly how much it might cost given your scenario. And if you got any questions about that after you've played with it and say, "Hey, I don't quite understand this." Just hit us up. We're happy to walk you through it.

Danny Lambert
Awesome. Well, if there aren't any more questions, Mike, can't thank you enough for coming on. It's really an amazing demo. It shows so well and the features, I'm sure it can save tons of people and tons of media workflows, a lot of effort and a lot of time. So as you mentioned, if you'd like to reach out to either of us, you can do so via the email address that we provided or reach our at revai.com or iconic.io. Any closing words for you, Mike?

Mike Szumlinski
No, everybody should just go by Iconic because it's rad, that's all I got to say, you know.

Danny Lambert
Fair enough. Well, thank you again for your time, Mike, and I really appreciate it.

Mike Szumlinski
Yep, thanks everybody. Have a good day.