Is Nano Banana Google's New Image Model? Plus a world model you can edit in real-time! Artwork

Denoised

When it comes to AI and the film industry, noise is everywhere. We cut through it.

Denoised is your twice-weekly deep dive into the most interesting and relevant topics in media, entertainment, and creative technology.

Hosted by Addy Ghani (Media Industry Analyst) and Joey Daoud (media producer and founder of VP Land), this podcast unpacks the latest trends shaping the industry—from Generative AI, Virtual Production, Hardware & Software innovations, Cloud workflows, Filmmaking, TV, and Hollywood industry news.

Each episode delivers a fast-paced, no-BS breakdown of the biggest developments, featuring insightful analysis, under-the-radar insights, and practical takeaways for filmmakers, content creators, and M&E professionals. Whether you’re pushing pixels in post, managing a production pipeline, or just trying to keep up with the future of storytelling, Denoised keeps you ahead of the curve.

New episodes every Tuesday and Friday.

Listen in, stay informed, and cut through the noise.

Produced by VP Land. Get the free VP Land newsletter in your inbox to stay on top of the latest news and tools in creative technology: https://ntm.link/l45xWQ

All Episodes

Denoised

Is Nano Banana Google's New Image Model? Plus a world model you can edit in real-time!

August 15, 2025 • VP Land • Season 4 • Episode 53

0:00 | 40:09

AI tools for film production are advancing rapidly, with multiple new releases this week that could transform creative workflows. Hosts Joey and Addy dive into Nano Banana (a suspected Google image model), several new world models from Tencent and Skywork, FantasyPortrait's multi-character animation capabilities, and updates from SIGGRAPH including Meta's new hyperrealistic VR headset prototype. Plus, practical updates to fal's workflow tools, Autodesk's free Flow Studio tier, and a fun color grading game.

The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.

All right. Welcome back to Denoised. We're gonna do our Friday weekly roundup of all the new AI stories for film production. There's a lot. There's a lot. Yeah. It wasn't, at first I was like, Hmm, I don't know. Was there any, there wasn't anything like big groundbreaking this week, but as I was going through the links, so I was like, actually there was like quite a bit of interesting stuff and it's some new stuff under the radar that, uh, you know, I think we'll point out that might be big in a, in a few weeks.

For sure. Cumulatively, I mean, AI is leveling up incrementally every week and we're covering it. It's, it's a lot of fun.

It's funny, I think it was a comment where it's just like, even like last week it was just like, uh, you know, like GPT-5, like not that big or so.

Somebody had AI fatigue.

Yeah, I know.It was like, and it's funny 'cause it's like. We're so used to so many new stories where it's like any one of these stories a year ago would've been like,

whoa,

groundbreaking new news, and now we just keep getting all these crazy updates. It's like, oh, okay.

Yeah. If you think about like all the VC money and investor money, it's all funneled into this.

Yeah.

And so there's,

yeah. People working on this constantly.

All right. First one. Nano Banana.

I almost spat out my coffee.

All right. So I've seen nano banana pop up a little bit on X and I was like, what is this? So this is a new image model that appeared on LMArena, which is the main website where people, uh, companies upload their new models.

Yeah. And they get tested and ranked. And that's where the ranking stuff, when it's like this model ranked blah, blah, blah, and this thing that's from LMArena,

it reminds me of the coliseum. I know it's like an actual arena.

You not contained. Yeah. Are you not generating, so there's a new model that showed up called Nano Banana and uh, a lot of people think this is a new model from Google, so it's not confirmed yet that it's from Google, but it seems most likely this is a new Google model to either a separate model or something to replace, uh, imaging their current model.

Mm-hmm. And the examples that I've seen are very good world understanding and very realistic. Yeah. This is, looking

at this one that's pretty good.

It was a side shot of a woman and that says, Hey, have her look at camera. And it looks like, I mean, everything about the person's face, I dunno what she really looks like, but it looks like an accurate rendition of like what this person would look like.

Yeah.

The fact that you could do this prompt based with just text and not actually like, you know, put a spline or anything else in the image that's pretty imp. And then this

one too, where I think they gave the same image but then had it like, hey, just generate a complete. Character profile.

Okay. Like a passport photo from this, this random photo photo, like the

person and the consistency looks

Wow.

Accurate. That's impressive. So yeah, very similar to kind of flex context, uh, or Chacha BT image. Yeah. This really impressive world understanding and just simple text prompting to modify Yeah. Images.

I mean, to me it feels like it's so much more than just having a larger sample set. Like more billions of parameters.

Yeah. It's, it's beyond that, it's starting to take on like a level of intelligence.

Yeah. And just understanding, I, there was, I forgot the specific name, but I was digging into, 'cause I was like the, I've talked about how I just chat. GBT image is like so good at just modifications for specific things. And it does a special 'cause when it generates, you can see it generate like blocks.

Like Yeah, like a, like a printing thing? Yes. A printer matrix. Yes. If, yeah, I was, yeah, remember that it

was doing a line scan line? A line scan, yeah. It was

a different method. It was like a, a like pixel diffusion, or it was a different diffusion method than the diffusion method we've talked about that most image generators use where it kind of takes the whole image and then diffuses it.

It was a different method, and I wonder if this is adapting that as well. Interesting. So yeah, that would be a whole new architecture under the hood. Yeah, yeah, yeah. Um, I also wanna point out, 'cause I was looking out, trying to find out more info about nano banana and it was just to show how crazy fast things are moving in AI and just more of the like vibe website and just.

Opportunist market. Someone, I couldn't look at the exact date, but someone bought nano banana.ai and then spun up this website called Nano Banana that is like, oh, hey, you could like, you know, use our AI image editor. And I think it's just running on a, a flex context, API, but the fact that they're like, saw nano banana trending, spun up this whole website.

Yeah. With like all these features and all these things, which are most likely vibe coded in literally less than a day. Is, uh, crazy.

The power of AI to exploit AI. Yeah, I love it. And also like the color scheme, and it looks like, is banana logo. Yeah, it's, it's on point. Yeah. Yeah. Google should just, uh, hire this guy

for who knows, or girl who knows how to vibe code.

Yeah. All right. Other updates. So kind of a group full updates. We talked about Genie 3 last week. Yes. And that was sort of the big, that was huge. Yeah. That was impressive to watch. Yeah. I feel like now it is revenge of the world models where there was a new slew of. Other companies be like, Hey, we got stuff too.

So this first one is Yan from Tencent.

Mm-hmm.

And it is, uh, it's a world model. They're calling dead foundation, no, interactive video generation. Mm-hmm. And the interesting thing, I mean, it does a lot of similar things to, uh, Genie 3, you can generate from a text or an image prompt and start having this wall that you can navigate.

It is persistent. It remembers, you know. Things that it generates. And if you move back to that thing, it remembers that it's still there, but it's composed of three separate models. Mm-hmm. Also, this one, it says it could run at 10 80 p at 60 frames per second, which for,

for

games, which is cool. For games.

Yeah. Yeah. I mean, I think Jen was, uh, 24 frames per second. Right. But the, uh, interesting thing about this is it comprises of three core modules. Yan-Sim, Yan-Gen, and Yan-Edit. So Yan-Sim enables high quality simulation of interactive video environments. Gen uses text and image prompts to generate the video. But then Edit supports multi granularity, real-time editing of the interactive video so you could be in the world and edit it, which also reminded me going back to Cristobal of that demo of speaking of generating the world and speaking to it and modifying it in real time. Yes, yes. This is like. The first actual practical implementation.

Did they beat Cristobal to it?

mean, if this is out yet, I don't know if this is research paper.

Is that accurate to say?

It's Addy's way of saying come on to the podcast.

Yeah, I mean, you could come on the podcast and let, we could talk about it.

So at the editing, they have some demos of doing a whole style transfer.

You're in the world and they just change the complete style of it. Yeah. Or structure editing. So like a character's moving and then you generate a wall in the world. So yeah, the editing thing, I think this is like the big,

that was the promise of Gaussian Splats. You remember a few years back. It's like, no, we have.

Editing capabilities coming to Gaussian Splats?

No, I don't remember that. Yeah. What was that? So like

a lot of the Gaussian Splats startups were working on the way to get in. While you're in the gian splats, you can delete this building or you can, yeah. But I guess it never really fruition. The really interesting thing here that I'm noticing, if you look at the headlines, it says AAA level simulation, obviously.

Mm-hmm. Super targeted at games. Yeah. And again, it goes back to what I was saying during the Genie 3 coverage. Everybody's coming after the unreal unity space. It's like, move over real time renderer. Here comes AI renderer, right? Mm-hmm. It's essentially, it's gonna be, and I think it's gonna be like that scene in Indiana Jones where he puts the treasure and the sandbag like just does a swap.

That's what it's gonna be like. The consumer's not even gonna notice that this game is generated in ai. It's just gonna be,

well, you're talking about, I mean, 'cause there's two elements there. There's talking about like the, the, the world creation of like mapping it out and planning it and like the experience and then the rendering.

Yeah.

And that the re the AI rendering part, that's DLSS, like that's part of that where it with that Oh

yeah. DLSS is not able to fully generate, it's just able to make something better.

Right. But that's like, it, it it's using AI to speed up the real, the quality and the rendering. Yeah. Right. But it's like, that's based off, you have an existing world, you built something.

And then you just want to, to look better on be on hardware. Yeah. And, and faster, right? This thing, so the, the generating the world part. That's where, I guess, I mean this, I feel like from a game developer, this ties in the same question of like the, the filmmaking space where it's like, okay, well is it gonna be a new game generated every time for the user?

Right. Is it more like a prototyping method or like a initial world creation method and then you're. Saving that and, you know, storing that elsewhere. Yeah. Yeah. I'm curious like, I mean, I don't know as much about game development. I mean, yeah, like how I,

again, we haven't taken any of this for a spin, sorry, folks.

Um, but, uh, my guess is like the guys that are developing this know that this is for aaa, so of course persistent world generation is mm-hmm. At the top of the list. Yeah. This like speeds

up generating a massive map.

What's really interesting is that on the AAA level simulation headline, they're using something called 3D Variational Auto Encoder.

Okay, what is that? So remember the VAE block in Come to ui? Yeah. Yeah. This is a 3D version of it. So it's essentially going from a latent space. So the world is being generated in latent space, and now the VAE is not outputting a 2D thing, it's outputting a 3D thing. So I think that piece alone is magic because now you can attach that to other open models like let's say Genie 3's latent space.

Out to a 3D Variational auto encoder out to a 3D object. The problem with Gian splats was that. You're still in a Gaussian splat world. It wasn't really meshing with a 3D world, like it was just fundamentally a different type of asset. Right. But here it feels like you're outputting something that could blend right back into Unreal or Unity or Maya.

Right? Yeah. As long as you get that mesh, it's something that you could Yes. Keep use in a persistent space.

Yeah. I wonder where like inworld.ai is, there were a startup that was, uh, very early into the game to do exactly this AAA gaming with ai. Mm-hmm. I feel like I, if I haven't heard from them, if we're not covering them on our Fridays, then they're kind of behind on all of this stuff.

Mm-hmm. Yeah. And yeah, I mean, you know, I think of like prime examples, GTA 6, which has taken forever to develop. Yes. And like, you know, it's like, well, what if you could speed up making the map? Even something like GTA, where it's like sort of based on real world maps, it's take the Google map data and Yeah.

Speed up the foundational level of, of building out

these worlds. Yeah. And you can take something like Cesium. Right, which is, uh, so Cesium ties into Google Maps and builds a 3D representation of the planet. Okay? So then if you have like a latitude, longitude coordinate. Of your game. Like you could be on this exact street in Google Maps and it's a 3D representation, this house.

Okay. And everything is accurate. Yeah. So it's like a digital twin of the world. Yeah. Yeah,

yeah, yeah. If anyone else is more of in the game development world, I would love to know what you think about this. 'cause like, I'm curious how this, not knowing as much about like AAA game development, how this would.

Fit in.

Yeah, I bet the AAA game folks are rolling their eyes right now. Nothing's gonna change. Same way.

Same thing with like when we see the RIP Hollywood.

Yeah, exactly. Yeah.

It was like, yeah, this is not gonna, this is not gonna, you're gonna make it.

I made a Oscar nominated short film for $200

and then other one from Skywork is Matrix-Game 2.0.

Mm-hmm. And this is another world model. Oh wow. But they're saying, well, they said this is the first open source real time, long sequence interactive world model.

Long, long sequences, meaning is just, uh, like a longer video. May,

may, that it, well, we, we could, we had that, um, the limit with the Genie 3 where it could 'cause it is how much memory can save to keep the world persistence.

So Genie 3 can only run for a few minutes. Um, this one doesn't give us exact number, but it says minutes of continuous video.

Mm. Mm-hmm.

Uh, it's running at 25 frames per second. Yeah. Interactive, move, rotate, explore. Yeah. Yeah. So another world model, but this one's open source. Yeah. I

think, uh, these are great steps into this world.

You know, over like the next year or two, we're gonna forget about these early steps and just wonder over the achievements that we have made in this stride.

You know? Yeah. Think of like this. I mean, think of how far, you know, image generation has come Yeah. Over a year or two. Right? And this is like the, the, this is like the dolly to.

Of image generation, these wall models. Do you, do you

even remember, uh, 5.25 inch floppy disks? Yeah. You don't? Well, I do, because the three point fives took over and that became the standard, but that was the gateway into that floppy disk model. Yeah, I mean, I did that in the same way, like these beginning steps were not even gonna really remember.

The 5.2 were actually floppy. The plastic ones were not floppy.

Oh, but you mean like floppy?

Floppy. They were literally floppy. They could, you could bend them like they were floppy. Yeah. Okay. That is a, what was the, uh, it, what was 700 kilobytes, 800 kilobytes per disc. The black ones, the, the, the plastic ones were 1.25 megabytes.

Yes. I think, but the, actually the black ones, yeah. They were like,

I don't think you're getting megs, dude.

No, no. It was kilo. It was like 600 kilobytes or 700 kilobyte. Yeah. It was

enough to put doom on one disc.

You could, yeah. I remember I had this tool where I wanted to back up a file or something and it would split up the zips Yes.

Into one megabyte chunk, so I could like do, it was like, put it in your first disc and it was like, okay, put up your second disc. Like, okay, third disc. Alright, we can

talk about this forever. I'll, I'll do one thing for you. A OL used to

send out those free discs. And then you just reformatted and then just have, have another extra disc.

Oh, I didn't know you

could

do that.

Yeah, I've just used 'em as coasters.

They had a cool Lord of the Rings. One Lord of the Rings Perma one. Oh, really? Yeah. When the movies came out. Oh, that's awesome. All right. No nostalgia. And then last, whoa, there's actually another world model update, but this one, um, this one's not new. We've talked about this I think like a month or two ago.

How do you pronounce it, Joey? Gosh, I dunno why my, like, American brain keeps not messing up. Uh, Huan. Onion. Onion. Yes, onion.

I need to see you order at a Chinese restaurant.

Onion's game craft, which we covered. Yeah. Like a month or so ago and was a similar-ish world model. I don't think it was as persistent memory, maybe it was. Anyways, they opened, sourced it. So not new in the sense that this has existed, but now it's open source. You can go mess around with it.

Yeah. The Chinese manufacturers, Tencent, Alibaba, you know these guys are quick cranking through it, crank and then release, crank release.

You think there is like some bigger thing at play here?

I don't know if it's just like the way to compete with US models or companies is like, well we'll make something similar but just push it out there. So like

yeah,

maybe more people adopt it 'cause it's free. And then, yeah, they're more independent on it.

What is their

strategy like? Obviously they're not making money from any of this, and this costs millions and millions of dollars to make, I

mean, something like Tencent, this stuff's getting rolled into TikTok, or will be,

you mean seed dense by 10? Uh,

yeah, yeah. Those other models. Yeah. Aaba. I don't know. I mean,

yeah, that's the thing.

I, I mean maybe I don't know the Chinese market enough, uh, to just know what their media landscape is, but it feels like to us, we're just getting stuff for free here.

Yeah,

like a nonstop bargain. I mean, I just dunno if

it's just a free to adopt play to like get everyone using it and then, right.

I don't

know.

They charge later and that would make sense, but, uh, I don't know.

Okay. I am

curious.

Yeah. And, uh, from the, some of the AI researchers that I've talked to. They're all impressed by Chinese models. Like yeah, everything Tencent drops, even on the image video generation side is solid.

Look, uh, talk about it a few weeks ago, like Seedance 1.2.2. which is yeah.

Completely run locally. Right? That one is, um, like the top comfy run locally on your computer, uh, video model. Yeah. Uh, Seedance is great. I've used that for a bunch Yeah. Of stuff. And, uh, I think it's getting more and more. People are picking up on that one and being like, oh, it is like actually like a really good decently priced video generation model that can handle a lot of different things.

Yeah. I think, uh, Caleb at Curious Refuge dropped that video where he compares Seedance to Veo 3. Okay. Supposedly that's better in a lot of ways.

Yeah. It depends on the thing. I mean, look, I think Veo 3 is still sort of king in one shot generation. 'cause it's the only one I think now that could still generate audio with the video.

Yeah. So if you're just like trying to make. Rough. Like if you're

trying to make people

uhhuh, maybe there's no better

tool 'cause of the lip sync situation and everything else. Yeah. For like a one

shot. Don't have to mess around with it too much afterwards. Right. V three is probably still can't at that, but CD dance for like giving a different inputs and getting outputs and also not spending three bucks a generation.

Yeah.

Really good. Okay. Other, let's see, other one. Multi one. Yes. Okay. FantasyPortrait. So this one's interesting. Uh, this is from Alibaba. I believe it's open source, but, uh, this is, uh, character animation, but with multiple people. So that's sort of been a limitation with a lot of, you know, like Runway Act-2 or Hedra or a lot of the, uh, character animator tools where if you give it some driving audio and like a still image and you want it to, you know, have the image start speaking the audio, it's usually limited to.

Yeah. Or Hey Jen. Yeah. It's usually limited to like only one person in frame, right. At a time. If you wanted to do two, you'd have to kind of run 'em separate and composite it. Yep. Using traditional methods. Later. FantasyPortrait says that it can do multiple character portrait animations. So this image, yeah, looks like it's

at a, a research paper level.

And they have provided GitHub code. Yeah. So all somebody has to do is grab this, turn it into a product.

Yeah. Yeah. You can run and test it. Yeah. So I got some videos of like the driving performance with a video with multiple people, and then. Then image with multiple people and they're respectively getting the correct performance.

Yeah. Even this one with the presidential debate. The split screen from the pre, from the debate.

Oh, that's great. They do have a sense of humor in those researchers

and then it works in different styles. Um, yeah. I'm gonna try this one out 'cause we're working on some. Speech detect projects. Yeah. So, I mean, cool to see development in here 'cause it's, I feel like character animation and driving, uh, works on animals too.

Talking dog talking pug.

Yeah. I mean this is all in like the deep fake territorial, right? Like the big fear with ai I remember a couple years ago is like, what if somebody just makes a video of some politician saying something horrible? Yeah. And that starts World War iii. Yeah. Like, I think we know now that, that we could see sense AI versus not.

In a lot of these situations.

For

now?

For now. For now. Oh God, yeah. It's like that meme when it's like, oh, I'm seeing less AI generated images on the internet, and then it realizes like, oh, it's because you can't tell the difference.

I guess. AI went away. Yeah.

Yeah. I think it goes back to what we've talked about before where it's like we're gonna have to have.

The manufacturer camera manufacturers for like the news gathering source is gonna have to have some sort of verification in the files or something to verify legitimacy of

Sam Altman had a take, uh, Sam Altman was interviewed by Cleo Abrams. I think I covered the same I mentioned last time.

Yeah. So Cleo had asked like, yeah, what, what is gonna happen when we can't differentiate from AI anymore? And his answer, more or less paraphrasing here is like. We're just gonna have to live with it. I, I mean, I think, I

think so. Yeah.

Yeah. I mean, his argument was really on shaky ground. He said something to the effect of like, the iPhone is lying to you because when you take a picture, the iPhone is doing so much processing.

Yeah. And then by the time you see it, it's not really reality anymore. Process image.

Yeah.

In the same way you're accepting that that is reality. You're gonna have to accept some of the AI stuff as reality. I know it is again, he's out there, dude.

I mean, yeah. He's got the whole World Coin thing and the UBI stuff and the, yeah.

Okay. Uh, SIGGRAPH. Yeah. SIGGRAPH happened this week. The big, I believe it's still happening. I think it just, oh, no, no. You're correct. Because one of these papers is presenting today.

Yeah. I've been getting, uh, pictures at midnight. From parties, sorry. People that I know are there having drinks, they're like, oh, I wish you were here.

I was like, thanks.

Yeah. So it's big. Um, uh, the graph, what's the official title of it? It's a Graph of Computer Graphics.

Yeah. It is, uh, probably the most respected and oldest running, uh, conference of nerds of computer vision researchers. Traditionally, it was computer vision and machine learning, and now.

Then in the middle there was a big rendering focus, uh, computer graphics focus. And now it seems like it's a lot of AI pivot. Yeah,

yeah. But focused on like the scientific and research part of It's all the white

papers. Yeah. The, so well the

computer graphics Yeah. Image generation, ai image generation. Yes.

Like very, the, like very nerdy science part, like say

graph moves, uh, uh, media and entertainment forward every year, uhhuh. So, uh, when those. Papers drop. It takes a few months for those papers to turn into products, and those products move the world. Uh, I mean, you have people like Disney Research who's like a big arm of Disney and Zurich, they dropped their annual papers there.

You have people like Nvidia, they also do the same.

Yeah. So yeah, there are a couple, I, I didn't see a ton coming out of it. Like, oh, it's super new, interesting things. So yeah, a few things. Um, but there was a opening talk from, uh, Ed Catmull, co-founder of Pixar. And, uh, someone asked him about AI and, uh, paraphrased the answer.

AI will create lots of good things and lots of bad things. We don't know where it will go, but it's not going away. So artists need to engage with the technology.

Don't we say that on deist something here? I feel like we kind of say that

a lot of the time, but, uh, yeah. Wanted to recount that. Uh, but yeah, the one, the two things that I saw.

Coming out of it, they were interested, actually, three things. One in NVIDIA's opening note, they talked about Cosmos, which is their new world model.

Yeah, they've had it for a while.

Yeah. I think it was an update, but uh, this one definitely seemed geared more towards automation, infrastructure. Autonomous vehicles.

Autonomous

vehicles, physical AI with robots. That's really what it's built for.

Yeah. The other one was GAIA and this just popped up on my radar.

Yeah, GAIA is pretty awesome, man.

Yeah, and I think they're literally giving their talk right now as we're recording this. Uh, somebody will have more info or the video next week, but it is a, uh, generative animatable, interactive avatars.

Very cool with expression condition cautions. Basically, it is a, yeah. Human looking person that's generated with GA and splats and a bunch of sliders and you can adjust them and manipulate the look of a person that is all generated with GA and splats. Which is unique. Yeah. And unique. And I think

we're gonna be looking at the video as we're talking about it.

Yeah. So you could see that the quality there is pretty impressive. And as this, uh, woman is adjusting her expression, she's also shifting in parallax and perspective, and it's all holding up. Pretty decently.

What did you say this reminded you of a music video?

Uh, the Michael Jackson video. Yeah. Black,

Black or White was It was, oh, yeah.

I was gonna say, I don't even remember the name of the song. I know the video, but I forgot the name.

People have done a lot of recreations of that video with AI recently. It goes, just always hearkens me back to like, how did they actually do it in the eighties? Right back? Yeah.

Yeah. Yeah.

They didn't have AI back then.

Was it an early version of computer morphine? I

believe

so,

yeah. Yeah.

But it looked great.

Yeah. I'm gonna say it was by PDI who did the term, the Terminator T-1000 liquid going through the prison grade Uhhuh, like that was a very early version of a digital human. Yeah. Okay. Yeah. Yeah. Okay. So here, uh, my prediction is that this is.

Potentially gonna change the way we do video conferencing and remote work.

Okay. How do you see that?

Because, uh, if you look at the Google Beam product, which is their really fancy conference system previously called Project Starline. Mm-hmm. It's essentially two spaces that are connected by the internet and uh, the reason it makes it so immersive is the other person is.

I think potentially represented by Gaussian Splat. So the Google Beam system has multiple cameras, so it's almost like a volumetric capture of you, and then it's somehow compressing and sending it over the network. And then the other side, you're being recreated with Gaussian Splat. So if you like, you know, shift your head around a little bit like that, you actually see like the left side of that other person's face, right side of that person's face.

So if NVIDIA pulls this off and releases this as a technology, other people can adapt, let's say somebody like Zoom. Mm-hmm. Then I think we could see potentially a real big level up in video conferencing.

You think any m and e applications, like is it like a Gaussian Splat work, or what are the limitations of like doing.

You know, like actual visual effects with that?

Uh, very little to be honest. Uh, you know, digital humans are the most usable when they're rigged. When they're skinned. And it's some of the Houdini stuff we talked about. Right, right, right. You put muscle sins on, or, uh, Metahumans. Metahumans, yeah. And I, I think that's where you get repeatability and high level of control.

So I don't see this stuff really useful for primary character work. However, I think for crowd work, like background, character, stuff like that, or.

Something where it's like you need something more lighter weight, real time. Yes. So like that streaming stuff?

Yeah.

Maybe, uh, the like real time livestream avatars to use.

Yeah,

yeah. Or uh, metaverse applications. Yeah, yeah,

yeah.

You know, um, remember when, uh, Mark Zuckerberg was interviewed by Lex Friedman and there were. In the, in the meta metaverse. Yeah. And like their torso, in their face was represented by Ging. Splats or some form of volumetric, however Meta's

doing

it.

Yeah.

Similar with like, uh, the Vision Pro, how it Yes. Scans you to represent you on voice calls and stuff.

So both of those technologies, although they were impressive at the time, they weren't quite. You know, lifelike. And I think this is a step forward in quality. So imagine this in the metaverse.

Yeah.

Yeah. I mean, looking at the demo, the details is, is great. All right. And then other, one other thing I saw outta SIGGRAPH is from Meta, Meta speaking of Meta, and it is a prototype next gen version of, um. Their glasses. Uh, the display glasses, the headsets, uh, called Amisu. Mm-hmm. Hyperrealistic vr. Uh, so high contrast, roughly three times the contrast of Meta Quest three with a wider angle of view and brighter.

Those are all good things. Yes. And so some of the

demos I saw in X people trying it out were saying like, uh, things swipe. Excellent. Wild. Yeah, I mean, it's definitely a prototype. As you look at the image right now, it is like, oh, wow. Then it's, it's like the, it looks like the cartoon, uh, with the coyote shooting, its eyeballs out, like it sticks out a lot.

Yeah. So it does currently look like this. So it's a prototype. Yeah. But, um, yeah, I mean, wild to see, you know. Where this technology is going.

Yeah. Uh, so it's interesting that they, uh, put the retina thing on there. So do you know what retina is? Like what Apple uses as retina is actually not retina. I mean, I know it's just a higher pixel density, right?

Yeah. So in optical science. Or color signs when a display is quote unquote retina. Mm-hmm. That means you can't tell the pixels apart, right? Like you can't see screen visually. It's as lifelike as what you and I see perceive the real world uhhuh. So if there is a display this close to your face, obviously you need far more pixels than a movie theater, which is like hundreds of feets away.

So for them to achieve retina on a headset. I mean, you're talking eight k plus perhaps, uh, 16 K in resolution.

Do you know if that's more dense than the Apple Vision Pro? I don't know how this Apple

Vision Pro, I don't think is

retina. Oh, you don't think?

Okay. Yeah. They may use that term for marketing, but in, in the sort of truest sense of that word.

It hasn't, for the

truest sense of the word, is it a combo of the pixel density but also the like display distance? Yeah.

Both

because like you, it's an equation, right? It's not like a set number. 'cause it depends how, uh, how far away your, your, would normally be looking at the display. Yeah. So

here they're saying 60, 60 pixels per degree.

So you can imagine one degree of your eye at that close resolution is not that much real estate. Right? Mm. Mm-hmm. So you're looking at like a, maybe a fraction of a millimeter. Yeah. And within that millimeter, you need to have 60 pixels. So you're talking about an extremely pixel rich environment.

Yeah. I know there's some other headsets that were like flatter two, but I don't know the specifics of them.

I'm trying to scan, but, uh Oh, they're called Boba one, uh, Boba 3. Uh, but it's basically more of a flatter version mm-hmm. Of the headset.

Yeah. Yeah, yeah, yeah. I mean, nobody knows that. The biggest barrier to headsets is the fact that you have to wear headsets. Uh, yes. More than meta. Yeah.

Yeah. I mean, it looks like the Boba has the same style as the, the Sony adapted with their professional headsets where it's sort of like a flip down, flip up, flip down somewhere in a professional environment, if you're like going back and forth, you could just.

Keep it on your head and just flip the screen up. Right. Which, yeah. You know, you look cool for a hot second. I mean, it works if you're in like an industrial environment and you're like, I need to like, look at something or, you know, and flip it up and down. It's like a welder's

glass of thing Exactly. Kind of thing.

The the

coolness factor isn't as big of a deal. Yeah. Just don't

do it on a plane. We talked about this. Don't be a glass hole.

But yeah. Did we talk about how like for some reason that guy like had no reason to be putting his arms like everywhere you could just like sit there and probably Yeah. Pension off operate. Yeah. I didn't

know you could do that, but yeah, he was grabbing stuff, you know, his hands in the aisle. We were all trying to get in the plane.

Yeah.

Yeah. All right. What else We got SkyReels. SkyReels. Uh, so yeah, sky Res popped up. This is similar, another audio to character animation. At first, I, when I watched her demo reel, I was like, I don't know, it doesn't look that good. But then I saw some of their other tests and examples and some of it looked, did look the human performance on top of an image.

It looked pretty good. It's, I think the company is SkyReels. It's another AI company. I haven't used them. Um, I don't know. Have you used them? No. Yeah, so I mean, I might give it a shock because like I mentioned before, working on a project with some. Uh, audio to, to character animation. But yeah, starting to put on people's radar, uh, because it looks like, I mean, I'm, you know, like all these things, it probably works well with some stuff, not so well with other stuff.

Yeah. Challenges with video generation is length. The longer you go, the more you start to hallucinate, the more memory you use. And there's all types of problems.

Yeah. This one did say, I think that it can run on longer videos. Oh. You know, actually for also, this reminds me, uh, I forgot to save it, but one other update this week that, uh, in the character realm is, uh, Pika.

So Pika also rolled out a new update. I think it's just coming to their app though. I think we haven't

talked about Pika in a while. No. We haven't talked about ever since when

we had a little octopus Yes. Run around the table. Yeah, yeah, yeah. It's uh, yeah, their own version of a, a character animator.

Mm.

Uh, where you could give it a audio or a performance video, sort of like their own Runway act two kind of feature.

Right. But it sounded like it was just rolling out on their. App, like I think they're focusing more on

mobile, the

social space and mobile and the creator space than, yeah, the desktop space. Yeah. They're

probably looking at the user data. Mm-hmm. And maybe most of the generation is happening on a mobile.

Yeah. You know, and I mean, that's probably just a, you know, talking to other people about this stuff. It's like a bigger addressable market. For for sure these companies for sure to sell $10 subscriptions than m and e space where they're gonna still pay the same subscription but demand more.

When, when I think desktop, I think you and me like, yeah, people that are trying to do professional work versus.

Mobile, everybody. Uh, right, right. Yeah.

Or the, or the generation that grew up native on like cap cut or or stuff for video editing. Exactly. And video creation.

What's

Premier? Oh, you mean Adobe Rush That I never used. I use Cap Cut. So Foul. Who? Uh, I think I've talked about 'em before. I'm a fan.

I'm a fan of Foul.

Yeah. They basically every like pretty quickly between file and replicate. Yeah. They'll take all of the AI models, put it on their own servers and then you can just. Quickly call them up with an API and you just pay per the usage. Um, so if you don't have a powerful computer Yeah. Or some of these models, like you just, you can't run 'em on a computer anyways.

They're like a good one stop shop. You get like one API key one central billing and you could like run and use any relatively cheap too AI model. Yeah. They're usually either just give you the rate that the API from the company Yeah. Is charging, or if they're hosting it on their service. It's like a pretty reasonable, uh, compute rate.

Yeah. One of the, I'll talk about this in the future, but one of my pET projects has been trying to bring in fal's APIs into a ComfyUI so that you could use

That's a good one.

You can call them up. Yeah.

Um, so you're doing everything on the cloud so you can build your but still have a workflow.

Exactly.

You build your workflow in Comfy, um, but call up fal APIs, um, and because there are some nodes that already exist, but they haven't been updated and so file updates stuff all the time and yeah. Was like, I just want a way to bring in my own APIs. So that's a pet project I'm working on. I'll talk about it later.

But, uh, file did just release a big update to their own canvas based editor. Much

needed. Much needed. Yeah.

So this looks cool. It's called Workflows 2.0 and it is basically you can drag and drop all of the file nodes and APIs and build out your own workflow and then just run it and file so they completely stomped on your work a little bit.

Yeah. I mean, I still think there's advantages through Comfy because, uh, you could. The file saved to your computer. And so if you're doing a bunch of batch stuff, you can just have 'em all saved locally. But this also looks like a great solution and I don't think there's any additional cost. I think you're just paying for whatever, for the

generation APIs

you're running, but it's just a really nice workflow.

And now

they're getting into the invoke, uh, LTX FLORA territory. Exactly. Yeah. Yeah. Where it's like a very comfy esque UI without the technical barrier of comfy. If that makes sense.

Exactly. Yeah. There's very a, the canvassy click and drag thing. Yeah. It's just nice that they have this built in. Yeah. So you could An easier way to use their API models without Yeah.

Being a program. I

mean, we talked about this when we did our comfy UI episode. It's like, I don't understand why Comfy doesn't do this. Uh, what, as a, as a cloud service or as a generation service? What do you mean? What mean They have, like, they can easily build a comfy on the cloud that connects to a bunch of GPUs.

Oh. Oh. Why they don't have, right. 'cause there are third parties that have. Comfy as a service on the cloud. Right. Why Comfy doesn't do it. I don't know. Yeah. I mean, comfy has the API nodes,

they're the most ready for it than anybody else because they're the universal standard in the world of AI development.

Yeah. They've sort of been around the

longest. They've cornered that market.

Established. Yeah. I don't, I don't know why they're not their own cloud hosting service.

Yeah. Uh, I would imagine like, uh. Services like Fall Invoke or anybody in this competitive space. I wonder if they take the JSON files that the comfy workflow is in and then start to build that in.

Their own environment.

Oh, I see. You know what I mean? Like mean, do you even need that though? 'cause I mean these, most of these APIs are pretty self-contained. It's not like how Comfy works with the local model, with the Yes. Clip and the VAE. Yeah. And like they're all kind. Yeah. You don't see

the

clip or the VAE here.

Right? You just see Yeah. When they call up the API, the api, I just already does it. Yeah, true. See, I don't know. Okay. Yeah. I mean, I think it's a cool tool if you mess around with it. I know we turned someone onto. Um, not invoke, what's the other one that's not FLORA, not Invoke?

Uh, there's a, the other

work, uh, flow chart.

Yes. They're, uh, canvas tool.

Yeah. The, the, actually the one that we're thinking of is, I think the most powerful one 'cause it's layer based and node based.

Yeah. Uh, with the Canvas based tool. So, uh, I think it was Invoke, uh, someone reached out and said that like we actually turned them on to Invoke and that they've been able to build out a bunch of like cool concept, uh, commercial videos.

Um, yeah. Using these all in one platforms.

Invoke seems powerful and purpose built for our industry. Yeah. Versus like for social media content or anything like

that. Yeah. Yeah. To give you a lot more control and like a one stop stop shop for, for building out these flows. And especially if a node based workflow kind of works better with how Yeah.

Your brain operates. And

also invoke has layer based stuff on top of the node based stuff.

So like layers, like you could layer that dimens, like, uh, you

can have masks and stuff. Oh, okay. Yeah. Regional in painting stuff.

Oh. And then run different functions on them nodes to like that specific section. Yeah.

Oh, that's cool. That's cool. Alright, other updates. Actually, this did sort of come outta SIGGRAPH, but, oh no, this is, I just realized on this press release thing, it says SIGGRAPH 2024 and I'm like, is this the right, this, this is an article, not this year. I know this article did come out this year. Um, basically Autodesk has released a free version of Flow Studio.

So Flow Studio was Wonder Dynamics Wonder Studio that they acquired, acquired. They rebranded it to Flow Studio. Mm-hmm. Wonder Dynamics is a great tool where you could give it a video and then it could automatically map and replace Yeah. Uh, and rig the video to a character, actual character. Yeah. And then you bring that into your software professional 3D software.

Yeah. And use that as like a first pass to start animating.

I could be wrong here, but I don't think Wonder Dynamics or Flow integrates very well with Maya. Uh, from what I heard, yeah. Right. Uh, since the acquisition, Wonder Dynamics is still not super well integrated into other Autodesk products like Motion Builder, Maya.

Okay. Yeah. Max,

uh, you would run it in, like it says a separate product and then export to your export, then import. Yeah. I don't think there's a direct

path, uh, which is a shame because Autodesk is certainly a company that's capable of doing something that's really, you know, well executed, elegant, and available for m and e.

Yeah. So they. Have a new free tier. Actually, I'm looking at this pricing tier 'cause I haven't checked it out since. I used Wonder Dynamics last year and it was, wonder Dynamics was like either you could try it or it was a hundred bucks a month. There was no in between. Yeah. Which kind of pricey. Um, so it does that, they do have a bunch more tiers.

Uh, the pro tier is 95 a month. Sure. Now they have a free tier where you can. Run it. You get 300 monthly credits. Yeah. I forgot. Each generation is like, whatever, handful of credits and some like duration caps, but Oh no, it's watermarked. So you gonna say it's not watermarked? It is watermark. But you can let best around with it.

You can export stuff. Yeah. You can export to other scenes in the free tier. I'm guessing if it gives you the files, the files aren't gonna be watermarked.

Yeah, I'm, I'm, I'm also guessing that this will help drive adoption. My guess is they're having a hard time with. You know, recurring usage, usage and daily active users.

So this is something that, that a freemium model, if you will. Yeah, it's good. We

able to be able to mess around and try it. And they have two other plans in between the pro a hundred dollars a month plan. They have a $10 light plan and a $45 standard plan. So, yeah. Um, it's good too that they have more in between options to, to get more usage.

It also goes to show that no matter how brilliant your AI technology is, end of the day, it's still a business and a business has to generate revenue, offset, you know, margins and all that stuff.

Yeah. You need money to keep. Developing Yeah. To run this stuff. So,

because Wonder Dynamics was one of the early ones, right?

You remember they got acquired almost two years ago now.

Was it two years?

Yeah. Um, yeah, I'm gonna say, yeah, it was pretty early on. And so their lifecycle is actually much further ahead than Runway or Luma. And they've actually gotten acquired by a big company. Mm-hmm. Versus Runway or Luma. And you could see the interesting sort of curve where, how a AI company matures and now they're at a maturity point where they need to generate real revenue.

Yeah. Yeah.

Yeah. So that's where they're at. The

company, Autodesk bottom. Yes. And it's like, well, no. Now we got it. We acquired you. We need, yeah, we need the ROI. Yeah. We need this acquisition of to, to make money. Yep. All right, another quick one. Uh, Gemini is now also rolling out memory. I think we talked about Claude added memory last week.

Memory is one of the best functions, like that's. The main reason I go back to chat BT is 'cause I remembers stuff. Yeah. But now Claude remembers stuff and now Gemini remembers stuff. So that makes these way more useful for you think

they're using floppy disks?

There's someone just in the back just like, like, we need

swapping the discount.

Yeah. That's how it remembers. Sorry about that joke, folks. All right, the last one. This one's a fun one. A colorist sort of vibe coded. I dunno if they actually vibe coded, but I'm gonna say that I'm gonna assume they vibe coded. Tobia Montanari Lughi, sorry if I slide name shout out to Tobia. He built a color grading game and it's called Match the Grade.

And he basically shows you an image of like some shapes and colors. Yeah. Uh, in one side. And then you have a couple sliders and you're trying to match source sliders. To match the grade, and then you check it and you try to, and it gives you a score based on how well you matched it.

It's

fun.

Yeah. I mean, you know, we all could use more color exercise to sharpen

our eyes.

Oh, for sure. Yeah. And just even to learn or start about this. Yep. So, yeah, I think that's just a really fun, uh, example use to, uh. Wait, does the, this is the best score I've ever gotten.

Should we make an episode? Oh, 95 is great. Literally just

like the image is way brighter. I messed around with this before for like, it has a 62nd time limit and I messed around with it before and, um, I was getting like 60, 70.

Now I'm gonna beat you, dude, doing this on the show, I get 95. I, this was a very easy image to do. So, yeah. Anyway, shout out to Tobia. This is a fun game. That's really good use of, um, yeah, just, I'm assuming vibe coded, but of building something fun and practical.

We should do a episode of you and me playing this competitively for an hour most boring episode ever.

just live, we live stream, stream, live, yeah. On Twitch color in common. If you want to see that. Yeah, let us know. 'cause I'm thinking no, but if you're into it, we'll do it.

We'll do it if you guys want us to.

Yeah. All right. That was pretty much a wrap up for this episode. Uh, links for everything we talked about, as usual at denoisedpodcast.com.

Thank you for your support on YouTube. Hope you got to see Freepik episode with, uh, CEO Joaquin.

Yeah, check out the interview.

Give that one a watch.

Yeah.

All right. Thanks everyone. We'll catch you next week.

Addy Ghani

Host

Joey Daoud

Host