
Voices of Video
Explore the inner workings of video technology with Voices of Video: Inside the Tech. This podcast gathers industry experts and innovators to examine every facet of video technology, from decoding and encoding processes to the latest advancements in hardware versus software processing and codecs. Alongside these technical insights, we dive into practical techniques, emerging trends, and industry-shaping facts that define the future of video.
Ideal for engineers, developers, and tech enthusiasts, each episode offers hands-on advice and the in-depth knowledge you need to excel in today’s fast-evolving video landscape. Join us to master the tools, technologies, and trends driving the future of digital video.
Voices of Video
VR and MultiView: How Tiled Media is Revolutionizing Sports Viewing
Step into the future of video consumption with Rob Koenen, founder and chief business officer of Tiled Media, as he unveils the revolutionary potential of VR and MultiView technologies. These innovations are fundamentally transforming our relationship with video content by putting viewers in control.
The core of this transformation lies in Tiled Media's sophisticated "tile streaming" technology, which allows viewers to seamlessly switch between different camera angles or create personalized viewing arrangements from multiple simultaneous feeds. Imagine watching your favorite football match and being able to follow a specific player even when the main broadcast camera is focused elsewhere, or viewing multiple tennis matches at once during a tournament - all while maintaining perfect synchronization and optimal video quality.
Rob demonstrates this groundbreaking technology live, showing how viewers can drag, resize, and arrange video elements on their screen to create truly personalized viewing experiences. What makes this approach unique is its remarkable efficiency: unlike competing solutions that require multiple encoders or decoders, Tiled Media's approach uses a single hardware decoder while intelligently managing bandwidth by only retrieving video at the resolution it's being displayed.
The applications extend far beyond sports. From music education that allows students to focus on specific instruments in an orchestra to electronic program guides that show live previews of what's actually playing on each channel, this technology opens endless possibilities for engagement. And while VR adoption has been slower than initially expected, Rob shares his vision for how these immersive technologies will eventually transform how we connect with events and each other across distances.
Ready to experience the future of video? Discover how these innovative viewing technologies could change your relationship with the content you love and give you unprecedented control over what you see and how you see it.
Stay tuned for more in-depth insights on video technology, trends, and practical applications. Subscribe to Voices of Video: Inside the Tech for exclusive, hands-on knowledge from the experts. For more resources, visit Voices of Video.
Voices of Video. Voices of Video. Voices of Video. Voices of Video.
Jan Ozer:Welcome to NETINT's Voices in Video. Today's episode is about VR and MultiView, a technology that lets your viewers choose among different available feeds from the same or different events and create their own immersive viewing experience. We're talking today with Rob Covenant, founder and chief business officer of Tiled Media, which develops and supplies VR and multi-view technologies to publishers like Sky Sports, BT Sports and LG U+. If Rob's name sounds familiar, it should. Rob has worked with Codex going back to the start of H.264, when he founded and served as the president of the MPEG Industry Forum. He also founded and served as the first president of the VR Industry Forum and has initiated and guided work in MPEG-I, the standard for immersive media in MPEG. Rob, you're a busy guy. Thanks for joining us Pleasure. Thank you, John, for having us. So why don't you walk us through your background real quickly, Take us through your education, take us through your job history and end up in Tiled Media?
Rob Koenen:Very. I don't want to make it too long, but I started in electrical engineering and studied information theory Actually a little bit of expert systems, things that are coming back now in the form of AI and my first real job was at KPM, which is a Dutch incumbent telco. I'm doing a lot of video coding research, basically image communication research. I then moved to uh, to intertrust in the us silicon valley where I worked on drm systems. Then it was back to the netherlands where I worked for tno, which is a national largest national research institute, also doing media research again, and then in 2016 we tried to. We showed some of the technology we had developed in TNO at IBC. It was called Tiled Streaming and we applied it to VR, which was a hype at that point. And, long story short, we decided to set up our own company, spin it out of TNO and founded Tiled Media to provide Tiled streaming services for virtual reality.
Jan Ozer:What's the high-level vision for the company?
Rob Koenen:The vision is that we want to have the most advanced streaming in the world and we think we have the most advanced audiovisual player in the world and we think we have the most advanced audio visual player in the world. It's all founded on on tile streaming, which means that we see video as as as composed of a number of tiles. That and we get to talk about this I hope a little bit, but all of these tiles can make, can together make up a single video, which could be a VR video of insane resolution, or it could be the multi-view that we will get to talk about, where we compose what the customer sees of all sorts of tiny little elements. Since this is a technical podcast, we pushed them all through a single decoder.
Jan Ozer:So tell us about VR, tell us about MultiView VR was a huge promise seven years ago.
Rob Koenen:It's still a promise. It's not grown as fast as everybody would have hoped, including us, but it's healthy and alive and we've done amazing things in VR, like we've done the Premier League with two customers. We still do with Sky. Sky has a couple of Premier League football matches every week and it's an amazing user experience. But VR video so VR is a lot of things we focus on.
Rob Koenen:Vr video it's still, I would say is, nascent. I think it's the bridge from making video something for gamers to making VR to something for a larger audience, but it's slowly growing. We're doing interesting projects. How did that evolve into MultiView? Well, by the same token, we saw that VR wasn't growing as fast as we had always hoped and we saw the basis of our child streaming technology actually wasn't VR until we decided to apply it to VR. So we went back to the basics and said there's more that we can do with this and, for one thing, we saw video consumption change. We saw the need for increasing interactivity, engagement, and we decided to build a multi-view product, which was about two years ago, I think, and it's now about ready. It has taken a bit of time, but it's interesting and it was kind of prescient because if you see what's going on now in the market, apple is starting to do multi-view, youtube is starting to do multi-view, so it seems the timing is right. Why don't you show us what?
Jan Ozer:multi-view is what it looks like.
Rob Koenen:Yes, I will need to do something that's slightly complicated, which is move the camera to my iPad. Screen sharing doesn't work for this, so I'm trying to do it on a real iPad. So this is streaming video. You see one stream here and what we have here is a like we call it, a 25 item multi-view clip or experience. The first thing you can see is that we've got all these moving thumbnails. They're not stills, they're actually moving. I can switch to any of these immediately and this is actually streaming. Right, it's not a demo, it streams from our CDN account.
Rob Koenen:But I can do other interesting things. It's like drag in, I can change. By clicking in this thumbnail, I'll change the main videos, and what actually happens here is interesting also because it actually changes what is being retrieved from the network. But I can go a little bit overboard. I can add things so you could. You could imagine that these are cameras in a, in a, in a race or in a or in a golf, in a golf event, and this is kind of a busy background, so let me give it a bit of a simpler background. But there's a lot of flexibility here. You can do a grid. This could be like the channels you normally watch. This is what I wanted to show you. I think this gives an impression of what it does, right?
Jan Ozer:The early adapters. Are they doing this for multiple views of a single event or are they doing that to enable different views for different events? What's the application?
Rob Koenen:So we're just rolling this out, but the most interest right now is from people that have multi-cameras. So it's like there's these events. It could be racing, it could be, like I mentioned, golf, it could be athletics, where there's stuff happening. The director is forced to make a choice, but there's stuff happening elsewhere.
Rob Koenen:You may have a favorite athlete. You want to follow them, you want to be able to switch to them, but mostly also you want to be able to. You don't want to lose the story that the director tells you, because they have amazing tools at their fingertips to tell stories. But sometimes you just have your favorite athlete that the director isn't following and you want to keep track of them, and this is where the technology shines. The other example where we see this deployed by other parties is like NFL. It's called Game Day, I think, a ticket, a Sunday ticket, where with YouTube video, you can get like four matches in a certain fixed configuration in the grid. But there's also a case where you want to watch multiple matches at the same time. You could imagine a tennis tournament. But the most interest that we are seeing right now is for these events that inherently have a number of cameras, a number of views and the director just can't show you everything that you might want to see.
Jan Ozer:What are we looking at? Give us the you know, break the technology down into encoder, streaming and player side of all that.
Rob Koenen:I will show the the ways that this could be done in principle and then the way that we do it. There's a couple of approaches to this, and the first is many encodes, which means basically you encode all the possible permutations, and YouTube does something like this. But if you have like, take an event like Formula One, right, formula One already broadcasts cockpit cameras. They have like 24 feeds. You can see the amount of permutations and combinations is going to be completely insane.
Jan Ozer:The viewer chooses a feed and they're stuck with that, so they can't customize.
Rob Koenen:Right now they can choose in that app. They can choose either a cockpit camera or they can choose the director code. They can't choose both at the same time. If you wanted to use this approach, the many encodes approach, it would be completely impossible. Another approach is what we call the many decoders approach, so you encode all of the videos individually at a number of different ABR levels and resolutions, and then you use the fact that most devices have more than one decoder available Not all, but most. And that's precisely the issue with this approach is that it's hard to do cross-device. You have to adapt to all of the individual devices. Sometimes you have to use software decoding, which quickly eats a battery. And what's also an issue is that syncing is a complete nightmare to keep all of these decoders in sync. And also ABR. There's ABR fighting between all of these different decoders, so you see them switch up and down in quality level. So that's also not a very good approach.
Jan Ozer:Who's using this approach? So you mentioned YouTube is using the first approach. Who's using, I know, apple's in this level, so that's also not a very good approach. Who's using this approach? So you mentioned YouTube is using the first approach. Who's using, I know, apple's in this space? What are they?
Rob Koenen:doing. Apple is doing this, indeed that's our understanding and that's why it's only working on Apple devices. And then yet another approach is cloud edge processing, which means basically you do the interaction on a server and it requires a separate server per customer. So if you've got an event that attracts millions of users, this is insanely expensive. So it's very hard to scale and, depending on how you do it, the interaction is slow. I know there are ways to speeding it up, but even if you can make it fast enough, then still it's very, very, very difficult to scale.
Rob Koenen:So then obviously this sets it up for our approach, which we call Mosaic MultiView, and basically this uses this tiled encoding, or actually it uses tiled decoding, and what happens here is that we take all the videos in the resolution that they are on the screen and then we create a single frame for it which goes through the single hardware decoder on the device, the single hardware decoder on the device, and there's some logic in the player that the player knows the resolution that's on the screen, so it only retrieves it at the resolution that it's on the screen. There's no bandwidth being wasted, and the nice thing about this is that the interaction is completely local. Everything you saw me do on that screen just now is local composition with videos that are being decoded. But as soon as I switch something, it actually switches the retrieval. It's all HTTP streaming, so it actually switches what's being retrieved from the CDN, I think. In a nutshell, that's it.
Jan Ozer:Let's keep this up, just for the sake of the discussion. So what's this look like on the encoding side? So I'm I'm bringing in in this case, I guess, five separate streams. I'm creating five separate encoding ladders. Is that correct?
Rob Koenen:yeah, yeah, yeah, so, and in five it actually. So this goes up to 10. If you, it is as many, as many, as many cameras as you get in, you will have encoding letters being being encoded and being written to a CDN origin server and all of these things the encoding side entails. There's a few modifications in the encoding that will allow us to do the merging client side of the bitstream. So what we're actually doing is retrieving snippets of bitstreams and then rewriting these snippets into a single bitstream. It's almost like you're recombining DNA and you're feeding that single bit stream through the hardware decoder in the device, which is why it's not just bandwidth efficient, it's also power efficient.
Jan Ozer:So it feels like you're doing a lot of hard work in the player. I mean, what are the CPU requirements for making this happen? It's not that bad.
Rob Koenen:There is hard work. It's all relative in the sense of what's going on in a player. There's a, there's a module that defines the strategy that determines, uh, what videos to retrieve, at what bit rates and at what resolutions, and then there's a little bit of bit rate, bit stream rewriting going on, but that's not all that cpu and then the decoding itself again uses the hardware decoder in the device. So it's not a software process, it's not that bad We've been doing. We've done this with a customer in India and there's some pretty low end devices there and we've done this for VR and it's without much of a problem.
Jan Ozer:What's the installed base of available decoders for this? If I'm a publisher and I want to implement this, take me through. I guess I care about smart TVs, I care about mobile, I care about computer playback. Where does this work? Where doesn't it work?
Rob Koenen:This relies on the tiling capabilities of HEVC or H.265. So but H.265 is pretty universally deployed in certainly in mobile devices and also in TVs and set-to-boxes, et cetera and, interesting for us, it's starting to be pretty universally deployed in web browsers. Even so, we have we've never been able to address web, but we're now planning to have a web player SDK available at the end of the year, which will be a huge step for us. It's very interesting. But HEVC H.265 is, from my perspective, is universally deployed, which is amazing because I wouldn't have been able to build my business without it.
Jan Ozer:So I need HEVC decode. Is that hardware, software either?
Rob Koenen:It could be either, but in this case we always end up using a hardware encoder because it's so universally available.
Jan Ozer:So what's this look like on a smart TV? I mean, is it HTML5 so it just works, or do I need a driver or some kind of player for each TV set?
Rob Koenen:Yeah, we haven't done a lot of smart TV things yet and it does require a fairly deep level of integration with the platform. So we're taking one at a time with the platform. So we're taking one at a time, but, as you suggest, with HEVC and Decoder becoming available in browsers, that also opens up new avenues to smart TV platforms. But as far as setup boxes go, there's a lot of Android out there now which we have deployed. It's pretty straight, very straightforward, I would say, not just pretty.
Jan Ozer:It's very straightforward. Do you have any use cases of this actually working in the field or actually trials or anything you can talk about even if you don't identify the publisher?
Rob Koenen:We've done interesting tests. I have to be really careful now With some 20 feeds live with something that could really benefit from multiple cameras and it worked like without a hitch. It was really encouraging and I hope to be able to deploy it soon and then be able to talk about it?
Jan Ozer:What does the player side look like? Because if I'm a publisher, if I'm Sunday night TV, I guess I'm assuming that people are watching their TV for their main feed. So maybe TVs aren't as important as I think they are. But you know, look at the. I guess the US Open is top of mind today because it's in process.
Rob Koenen:People see it as a second screen device. But what I've also done, what we have shown, is gassing. It works really nicely because the good thing about if you want to do interactive video, there's nothing like a mobile device to do the interaction. A TV is very hard to seamlessly interact with, right. The thing I just did by dragging around videos, putting them in my favorite spot, enlarging them, shrinking them. It's very, very hard to do with remote control, even if you have, like an LG, like smart remote. But what does work is casting, chromecasting or AirPlay and it works really nicely. And obviously the thing I showed with the grid is more amenable to interaction on TV sets where you just select and click and enlarge and maybe have a few picture-in-pictures available in fixed positions.
Jan Ozer:So what's this look like if I'm the publisher? You know, going back to the tennis tournament, you know, typically for television I'm integrating all my cameras into a single feed. So at some point I guess I need to create different feeds and feed them through different encoders you know, take it from. There.
Rob Koenen:Yeah, every event that we have done, be it multi-view or VR, basically uses the same pattern. People do a mezzanine encode of their camera or their feed. They use SRT to send it to our cloud platform. We do the transcoding and we egress it to their CDN origin server, and it's very straightforward. It's a very standard workflow. It's very straightforward. When we did this with the event I was talking about, it was just, it was only a matter of hours. Our partner made the streams available in AWS, we picked them up and we were up and running. So it's yeah, as long as you have the individual camera feeds available, and many, many organizations do.
Jan Ozer:It's very straightforward and it's all HEVC encode. What's it look like in the cloud? Are you using software or hardware transcoders in the cloud we're using software in the cloud.
Rob Koenen:It's interesting. Let me take the example of we did the Beijing Games and we did 8K VR180, which is mind-boggling resolution. It was higher than any current headset can display. In this case, we had let me get this correct we had two cameras and we had so we do for VR we do a number of different GOP versions because we need to be able to switch on every single tile. Now we had one camera, we had three GOP versions short, intermediate and long and then we had ABR levels.
Rob Koenen:So in the end there is one single manager that manages 900, literally 900, parallel HEVC end growths and spawns them, waits for them to complete, then collects the results. Does clever packaging because we do some clever packaging scheme. If you have your headset and you move around, you will retrieve different tiles from the sphere, so you have to be really quick. So we do clever packaging so that um adjacent tiles are likely to be available at the cdm edge cache if you move ahead around. Uh, but, but to answer your question, there's just a an enormous amount of tiny anchors, tight tiles I could be 480 by 480 or 640 by 640, or maybe even a bit larger Just churning away at doing encodes and sometimes yeah, and then packaging it up and then writing it to a CDN origin server and then the player.
Rob Koenen:The playback side is a player that they would download and install, or yes, so the player would have our SDK player, which is pretty much a complete AV player, at the heart of an app, an application that provides a nice user experience and our AV player is at the heart of it and that handles everything. Like which tiles do I need to retrieve all the retry, all the rewriting? And, interestingly, in the case of it, and that handles everything like, which tiles do I need to retrieve all the retry, all the rewriting? And, interestingly, in the case of multi-video, you can do some very interesting ABR strategies, like my bandwidth is dropping what you could do with normal video. You can only drop the quality of the entire video. Right here you could say, okay, hmm, bandwidth is too low, I'll prioritize the main feed. Or, uh, maybe I should use uh skills for the thumbnails. Now let me not retrieve the thumbnails. You could do all sorts of clever strategies now.
Jan Ozer:it opens up a whole world of new uh, abr, uh research, almost, and you're saying that this is all uhVC at this point. Yes, it's all HEVC, correct, because it relies on the tiled structure you talked about. Is that going through in VVC as well? Is that going to be, I guess, kept alive?
Rob Koenen:VVC is even easier to implement this. If you use VVC, it will be a long while before it's as widely deployed as HEVC, but it's. Yeah, no question. We participated in MPEG for a long time and we contributed some of our thinking to it, but people were already well aware of tiled encoding and all the flexibility that was required for this, and there was a lot of VR stuff going on and back. So, yes, it's, it's in VVC as well. It's a little harder in AV1 because there's some, there's some parameters that you can only set at the at the global level, whereas in HEVC you can set them at the tie level, which makes it easier to do this in HEVC.
Jan Ozer:Talk about the, the standards, the role of standards in this market. If I'm NFL, do I go to one publisher I don't really care if it works widely or is there an opportunity to work together with standards that you've done so successfully in the past?
Rob Koenen:Let me say that everything we do is built on standards. We use standard HVC encoding. All of our streams are individual decodable. Certainly, if I look at the, it's a little harder for VR, but if I look at the multi-view stuff, any of these individual feeds are just you'll play in FFPlay or in VLC, no problem. The packaging is standard MP4. We contributed some stuff to MP4 as well to make it efficient, certainly in the case of OVR, but it's all standard MP4. And then all the retrieval is just standard HTTP retrieval, byte range requests, which is what makes it so scalable. By the way, I think I said before we wouldn't be able to build our company if it hadn't been for standards, and notably the HEVC standard, being so widely supported in all devices.
Jan Ozer:What's the commercial side of this? If I'm a publisher, you know what do I spend money on.
Rob Koenen:It depends, obviously. But you basically you spend your money on a technology license, you get our cross-platform SDK, you can build your apps around it and then, depending on what you want to do, we will deploy in your cloud account our transcoders. We will use our our transcoders, we will use our cloud transcoders, mostly for VR and for multi-view. We're seeking to cooperate, and we're actually already cooperating, with encoding vendors to make their encoders tiling enabled. And there's one is coming Again I can't name names, but it's not like VR.
Rob Koenen:Encoding is really hard. We want to control it. We want to either do it in our cloud platform or we want to control what we deploy, for instance, to your AWS account. But for MultiView, it's a lot simpler and it's easy to enable third-party encoding vendors to make their encoder standing enabled. And the magic, all of the retrieval magic, all of the sync magic, by the way, because we didn't really touch on it, but it's all frame accurately synced, which we obviously need for VR. So we carry that on for multi-view, most of our magic is in the client side, the license.
Jan Ozer:is that going to relate to the number of players? I mean, how is that going to? How is that cost going to scale?
Rob Koenen:We are too young to give you a standard answer, but in general we would like to grow with the success of our customer. There's basically, we seek a minimum level of engagement because otherwise we can't pay our developers, and then we seek to grow with the success of our customers, pay our developers and then we seek to grow with the success of our customers, and that could be active devices or active or minute streams or those are some parameters.
Jan Ozer:So convince me that this is a market that's going to prosper. I mean Apple's in it, youtube's in it. You know what's Apple? Apple wants to sell hardware. So what have you? What has Apple shown us that tells us that this market is going to be quite dynamic going forward?
Rob Koenen:What Apple and Google are telling us is that this is something that people want. It's clear that especially the younger generations have a much more dynamic way of interacting with content content with video, uh, with their fans. People used to be a fan of the sports club. Now they are a fan of a player. They want to follow that player, maybe, so it's that's changing. So I think it's a really good fit for a more interactive and an immersive way of consuming content, and we're starting to see some real traction for it, so I'm confident it will catch on and, yes, youtube and Apple are cases in point. They've started to do it. It will be very inflexible and may I call it a little bit primitive. We think we have a better solution, obviously, but it's being trialed and it's being used and, interestingly, it's being criticized for its lack of flexibility. I heard Dan Rayburn in a podcast recently saying that about the NFL Sunday ticket that it only had certain configurations, which was a shame.
Jan Ozer:Okay, we've got a question in. Now's a good time for it. What are the bandwidth implications of this approach? I mean, are you sending X times the bits to the viewer, or is it still a TVP?
Rob Koenen:type. Basically, the bandwidth is the sum of all the videos that you're retrieving, but we're retrieving them at the resolution that they're being displayed. So if you see a thumbnail, it's not being retrieved at HD resolution. If I switched remember I did the switch between the thumbnail and the main video it actually switches around what's being retrieved and the main video. It actually switches around what's being retrieved. And we learned how to switch really fast because we had to do this for VR. If you move your head in VR, you don't want to wait on a high-resolution tile for a second. So we did a lot of switching times optimization but we actually only retrieve what you see at the resolution that you see. That's how I usually explain it. So obviously, if I do an HD video and a PIP, then it will increase the bandwidth, the PIP, but it doesn't increase it unnecessarily. And also our software takes screen sizes et cetera, into account, so it doesn't retrieve a necessarily high resolution content.
Jan Ozer:It's not extremes times 1080p, it's a little some multiple or some small increment over the 1080p.
Rob Koenen:Yeah, and then thumbnails are packaged together cleverly, so there's like four thumbnails in one small video in one tile. It's tricks like this that we use to optimize the bandwidth.
Jan Ozer:What's the typical resolution of the main display? Is this 1080p stuff or is it mostly 4K?
Rob Koenen:at this point, no, it's 1080p, or maybe even lower if you want 720p, but typically I think 1080p is a good one.
Jan Ozer:Question what about the Apple headset? Is that total VR, or is that tall media as well? What do you know about that?
Rob Koenen:I've applied for one for an SDK the good thing about the app. So our stuff works on all Apple platforms and it also has a standard hardware decoders. Interestingly, it's a collaboration with Unity. All of our headset work uses relies on Unity. So I think and I hope that we should be able to port our VR stuff to the Apple headset pretty quickly, and the stuff that we did in Beijing it was our production partner, cosm produced in 8K 180. That's exactly the resolution you need. Interestingly, for the Apple headset it's about 14.5K 360 equivalent. It's interesting. But what Apple also does, if you allow me to say a few more words, is they really have a focus on this multi-view paradigm. If you look at the interface, it's sort of a multi-view paradigm, so I think it fits. It would be an amazing fit and it would be an amazing experience to watch like a golf tournament or maybe a tennis tournament, with all these different matches going on and be able to switch between them, et cetera. In a headset. You get all these virtual displays right at your disposal.
Jan Ozer:What's the middle ground? Not the augmented reality, I guess, et cetera. In a headset you get all these virtual displays right at your disposal. What's the middle ground? Not the augmented reality, I guess? Where does this fit into that, or are they totally different things?
Rob Koenen:So we focus on video. There is, if you look at the Apple headset, specifically the Vision Pro, it's also an augmented reality device because you have this see-through thing, you have the environment is there, and so you can just put virtual screens in your environment.
Jan Ozer:It's a good match. Getting back to the VR market, that's kind of where you started. Is that ever going to happen and why has it been so slow?
Rob Koenen:It's a longer cycle than people think. First, headset adoption. Second, doing a good production. Third, making a good user experience. Related to this having a broad enough audience. It's a virtuous cycle, right, but it's happening only very slowly. Only if there are enough people to go to will it make sense to do a good production. Only if you do a good production will enough people go to the headset and start watching games in VR. And again, I'm focusing on video right in VR. And again I'm focusing on video right. So it's a cycle that just takes some time to get launched, but it's happening.
Rob Koenen:And what's encouraging is there's a lot of development in headsets and headset quality. And again, the things that we did, that we enabled we didn't do them ourselves, but that we enabled with our streaming technology. In Beijing, they will look amazing in a Vision Pro Because of the higher resolution than the current headsets have Totally mind-boggling. And if I see how much people already today appreciate what Sky does with its Premier League matches, then I'm convinced that this will fly. And then you add the social bit, because you can talk to your friends. You can go to the match. You can talk to your friends who are also in a headset, even if they're remote, I'm convinced that this will fly. Even if they're remote, I'm convinced that this will fly. It's just we wanted to be deployed faster, so we thought we'd work on multi-view as well, before VR was the big hit that it will eventually be.
Jan Ozer:You tend to talk about sports almost exclusively for both VR and multi-view. What are the follow-on type of productions that you see is working well with these technologies?
Rob Koenen:Yeah, sports is probably one, two and three, but there's also interest in music. No, yeah, that was not even intended, but there's also music. We have one customer looking at multi-view for doing music education, which is really nice. They record an orchestra from different angles and then people can play along and they can switch views, et cetera. So that's one, but there's also pop music and another one is just multi-channel interfaces.
Rob Koenen:Right now, if you have an EPG, it may have pictures. If you're lucky, the pictures are relevant and if you're very lucky they're also right in the sense that it's what's actually on TV at that moment. Now, if you make an EPG with multi-view, you will just show what's on the channel. Now you can make a grid of your favorite channels. It's completely customizable. By the way, I think it has an application in EPGs as well Sports, music EPGs and maybe training, maybe another, just filming things from different angles, being able to switch to a real close-up but also zoom out. But sports, still, it's one, two, three, not just because that's what people want to interact with. It's also because there's a lot of money being spent on sports.
Jan Ozer:Is there any magic on the audio side or is that just simple? Like you know, one feed and and that's another good question.
Rob Koenen:Audio obviously is relatively low bitrate compared to video. We have in our, even in our demo, because there's we have a demo app, just a tiled media player that people can find in stores. But even in our demo Tiled Media Player that people can find in stores, but even in our demo we have allow options like keep the audio with the main feed while you're switching auxiliary feeds, even if they're large, or keep the audio with what you decide to be the main feed. Or, conceivably, you could even mix audio If you like. Say, you have a camera on a motorbike and a direct cut. You may want to be able to mix the individual audio with the director audio, but that's all doable and supported.
Jan Ozer:I have a question on AV1. Do you have AV1 ready or is that another development that has to happen before you can actually deploy that?
Rob Koenen:No, we don't have it ready. We've looked at it. We think it's probably feasible, but in order for us to deploy it, the only reason would be if HEVC was no longer as widely supported and AV1 was, and I'm not seeing it happen anytime soon.
Jan Ozer:We've apparently got some people my age in the audience, or maybe a little bit younger, but they're asking about what was the magic bullet that H.264 had, and are we ever going to see that again? So I guess, going back to your days with the MPEG forum, what happened to make H.264 such a success?
Rob Koenen:Well, it started with ITU and ISO coming together and deciding to have one standard, which was, what is it? End of the 90s yes, midway the 90s, because there's always been this getting back together, doing that part again getting back together. H.261, h.261. So MPEG-2 and H.262 were separate, no, were the same. And then H.263 and MPEG-4 split off again and H.264 and MPEG-4 split off again and H.264 and MPEG-4 Part 10 came together again.
Rob Koenen:So, but it started with the world's experts coming together and deciding we will join our forces and we will make the world's best codec. And there was an objective process. This is the best part of standardization, I will say, when people compete and then work together and create something. That's amazing. And then I had a small role, I would say. When this was all ready, we decided OK, so MPEG-4 is ready. What are we going to do with it now? We have to set up some sort of an industry for it.
Rob Koenen:It started as the MPEG-4 industry forum that took the standard to the market, street Forum that took the standard to the market. And then, pretty quickly, we found out that the gating factor would probably be the licensing, because for MPEG-2, it was clear, it was all hardware, there was a royalty per chipset, but MPEG-4 was all going to be software, and how do you deal with that? So we had a number of workshops with licensees and licensors and the good thing about that was that from certain companies there were both licensees and licensors, people that had IP. They could also look at it from the licensee side, because you need to be reasonable and I think some people at some point got dollar signs in their eyes, but at that point we were able to have a number of workshops to discuss the issues.
Rob Koenen:It was not easy because anything, if you start to discuss IP, you get into antitrust issues et cetera. So we had lawyers involved et cetera. But we managed to do that and out of that we let it go. It became licensable, basically AVC became, and this was a process that took many more years, by the way, and I think what also helped and stop me if I'm going too long, john, but what also helped was that Microsoft and then SMPTE published this VC9 codec and then VC1. And I think it put some pressure on the AVC license source to get the rec together.
Jan Ozer:There was real competition Interesting back together. There was real competition Interesting. So the fact that there were alternatives, that I guess in retrospect they never had a chance of succeeding because it was never going to succeed on the broadcast side, but that pushed the license, the technology owners, to come up with a. I mean they went. I guess the royalty for MPEG-2 was like $2.50 and it was down to 20 cents for H.264. And I guess you're telling me that the justification for that was you could do it in software. So instead of just TV encoders and TV decoders, we're looking at every browser and every mobile device. Of course mobile devices didn't exist when H.264 came out.
Rob Koenen:No, we were working on them. We had these projects. We had a dream of mobile audiovisual terminals, tracy watches. They exist now, but, yes, now.
Jan Ozer:I feel old. You mentioned that standards that was the great thing about standards. I mean the negative everybody points to with standards is the royalty side. Do you see that as a hindrance to future generations of technology? Or do you see that, you know, as just kind of the head of pay for all the innovation that you're getting the benefit of?
Rob Koenen:I see it as a fact of life and nothing comes for free. And as long as it's reasonable, nobody will complain.
Jan Ozer:Long as it's reasonable, nobody will complain. Htvc has gotten a lot of bad press and certainly a lot of it was, you know, well-deserved at the start, but, as you mentioned, it is incredibly widely deployed at this time, which is why you have the technology and it's now also in browsers, so for all intents and purposes it's almost universally deployed right now.
Rob Koenen:And for all the bad press, which I think is a bit overrated- Well, listen, we're out of questions and we're out of time.
Jan Ozer:Rob, I appreciate you making time for this. Tell us about what you're going to be showing at IBC and where people can find you.
Rob Koenen:They can contact me. We don't have a booth, so I'm going to be roaming the halls and doing lots of meetings, but I'll be showing what I showed just now with a few more interaction modes, and I'll be telling about how this can revolutionize streaming and make it just a lot more engaging. That's great, okay.
Jan Ozer:I should say that NETINTt also has a booth at IBC and if you go to our homepage, netentcom, you'll be able to. You know, we'll have a pop-up that shows you where that booth is and how to get in touch with our people there. Rob again, have a great show and thanks for taking the time today.
Jan Ozer:Thank you, John, it's been a real pleasure. Thanks for the insightful questions. This episode of Voices of Video is brought to you by NETINT Technologies. If you are looking for cutting-edge video encoding solutions, check out NetInt's products at netintcom.