Denoised

Wan 2.5 Brings Talking AI Video

VP Land Season 4 Episode 62

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 43:03

Alibaba unveils Wan 2.5, an AI model competing with Veo 3 that generates video with integrated audio and speech in a single output. In this week's AI Roundup, Joey and Addy analyze the flurry of new AI models and tools transforming media production, including Qwen-Image-Edit-2509, Google's Mixboard, Topaz's new upscaling models, Nano Banana's integration with Photoshop, and more.


--

The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.

All right. Welcome back to de nos. It is time for our weekly roundup of all the AI news stories and filmmaking.

Here's AI Roundup. 

Let's get into it.

All right. We're back. Good to see you again remotely this time, Addie. 

Yeah. Good to see you too. Uh, we're not in person again, but, uh, we're still in the same town. Yeah. 

You know, traffic, it's later in the day, so you might as well be in two different states. All right. Uh, so big week in AI for the open source model world.

Sort of, it was like, there's a sort of a caveat to that, uh, but probably the biggest story this week is Wan 2.5. I texted you, I'm like, Hey, did you hear about 2.5? And then you're like, cling. I was like, no, Wan, another model, they dropped another one, but this one's well, so that's why I said, uh, this is sort of where, where my quotes in open source were.

This one is not public or open weights yet. I, I'm assuming they will eventually release it. Uh, so you have to use it through their API or through their website, or Comfy has integration with it, but it's basically the next level of Wan. I dunno why they skipped 0.3 versions, but it is the next level that is sort of a Veo 3 competitor.

It can generate video with audio. With text, with, uh, speech. All in one platform or in one output. 

Yeah. Yeah. And, uh, it's, it's really powerful. Well, first of all, because it's completely open source and free, like anybody can just grab it and use it in their, well, that's, I'm this one. You 

can't. Right now you have to use it through the API and pay.

Yeah, that's what I'm saying. Oh, I mean, I'm assuming really eventually there will be an open weight model. You could run locally, but right now, because they're calling it 2.5 preview and, and you have to use it off of through their API or through the Wan's website or through any of the partner integrations.

Oh, interesting. Um, wonder why they're doing that. Um, maybe because they want to just see how it's being used and feed that into further improving the product. But the test that I've been seeing online is a lot of character to character video transformations. So, you know, think of it as puppeteering for animation rigs and things like that.



think you're mixing it up with another model. Again, there was Wan2.2-Animate. Which is a

Got him again. Are you gonna fire me? No, but I think now we need to like make a, like, make something where it's like got outta again. Nailed him. 

Oh, okay. So. God, I'm so embarrassed and I'm on camera, 

so. Well, I mean, it also goes to show that there's so many new updates. I think. I think we covered it. Okay. 

We got Wan2.2-Animate, which is open source.

Open source and use download. It is basically like an open source 

Runway Aleph or a Runway, um, um, Act-Two, where you can give it a driving video for performance. And a still image of a character and you can animate, drive the performance of that character. That's free open source. You could load it in Comfy.

They got workflows and stuff that's like a week old. So it's a million years old already in AI land. This week was Wan 2.5. The newest model. Preview model. You can't download it quite yet. You gotta use it on the API. This thing is either text or image input. But the, the big lift of this is it generates audio.

It could generate speech inside the video. It's one unified model. The closest thing we've seen to Veo 3. Yes. Out of like any other model. 

Absolutely. Absolutely. And, uh, yeah, they're making a big fuss about it being multimodal, which means you can train it on audio, video, and text at the same time. I'm guessing it's gonna be very fine tuneable just the way Wan 2.2 is, yeah.

And, uh, you know what that means is if you have like the likeness of a person, let's say. I don't know, I'm just pulling an example here. Like you take Robert De Niro's, uh, physical likeness, his vocal likeness, and then you could train the model to have a digital Robert De Niro, um, that is outputting audio and video interested.

Yeah. Having that 

fine tune control. 

Yeah. So that's just like a digital human example. Other examples could be taking entire, you know, um, styles of story. So let's say you are building a world. And then that world has its own noises and sounds in it. You know, we talked about Star Wars having mm-hmm. Like the lightsaber sound and all the signature sound of Star Wars.

So let's say you designed a world like that where it's not just a visual design, but a sonic design as well. You could train the model to have those two things output together and work together. 

Yeah. Yeah. That's a good, that's a good point. Yeah, I ran a test, uh, one quick shot inside Comfy using the model.

And the nice thing about it is you do have a lot of, um, resolution and framing options, so your usual one-to-one, but you can also do full horizontal or for full vertical, uh, starting from four 80 p all the way up to full 10 80 p outputs duration. You can either do five seconds or 10 seconds, and then you have the option if you want it to generate audio or not.

Uh, and then. By default. It came on with a watermark turned on, which is a bit weird. It had this little Chinese watermark in the bottom corner. I was like, why is it there? Oh, it's still, and then I, uh, found that you could just turn it off. I will say, I mean, look, it's, it's, it, it was maybe a dollar, I think 10 80 for a ten second generation.

So a lot less than mm-hmm. Veo 3 and once, assuming they event will eventually release this as a. Open source, then you can just run it on your hardware, which depending on your hardware, could be take a long time or not. But I'll say the quality. This was like a quick test for some action shot prompt.

Oh yeah. That great. Yeah. It 

still has like that AI ness feel. 

Yeah, it definitely feels synthetic. I ran the same thing in Veo 3. I mean, I'm sure you prompted for that camera rotation thing, 

right? I did. I, I asked it for like a Michael Bay, like 180. Orbit kind of thing. So that one is a bit more of a pushin.

Yeah. Oh, it didn't get Michael Bay at all then because Michael Bay does around the character. Yeah. It didn't 

get that. 

But uh, Veo 3 

didn't get that, but it got Veo 3. Definitely has more detail. 

Oh, well, Moonvalley would probably nail that camera move. Moon 

Valley. You could 

tell. You could tell that the camera move.

Yeah. With the spline. Yeah. There's a, I'm guessing, yeah. Sp blind camera mover thing in it. Yeah. That was my one quick test. It was definitely, obviously not definitive, but it's good to have another Veo 3 competitor, maybe 

one. Yeah. Who makes one? Is it Alibaba? Yes. 

Yeah, Alibaba. 

So maybe Alibaba's strategy here is to get market adoption with, uh, early versions of one, 2.1 and 2.2, and then start charging people when they get to 2.5.

Yeah, I don't possibly 'cause Yeah, we have wondered for what is the business model for all the other models being. Free, uh, essentially, I mean, yeah, it is probably to a point too where eventually you want to power process powerful enough stuff, then you want to use their, either use their service for to, to not have to wait an hour or two hours for one shot to generate.

I, I, I'd be, I mean, but their whole playbook has been open bottle, so I would be surprised if this, this one, they're just like, no, we're just gonna, you have to use it through an API and pay for it. 

I think the benefit to Wan over Veo 3, if I were to pick one versus the other is, uh, the only way you can customize something in Veo 3 is if you give it reference images and you sort of just trust it to build, um, those references into the generation with one I, you know, at least with Wan 2.2, you have really high level of control in that you can build mm-hmm.

Laura on the side and then attach it to the generation. So I'm guessing if we have the same level of. LoRa and fine tuning available for 2.5, that in itself could be a major game changer versus comparing Apple. Yeah. To be able to put 

these and do pipelines and do a lot more customization, that's a huge lift.

And then you can start to combine multiple Loras. Mm-hmm. At the same time. So you know, if for example, you have your, I hate this example, but you, you know, you have your Star Wars world. Like you have your galactic world, Laura, which is building the backgrounds and sort of just putting you into that environment.

And then you can combine it with the actors likeness, Laura. 

Mm-hmm. 

So then you have to control over who the people are. And obviously with audio attached to them, they would even sound like the actor. 

Yeah. I'd be curious too, if there's more control over the audio too, if this releases as an open model.

Because right now it's sort of like the same bucket as as Veo 3 where you can just describe the audio that you want and the prompt. But if there would be a way where it's like you can give it a voice sample or you can give it audio input so you can give it something else to have more control over what the.

Audio part of the, the multimodal, uh, video generation would be. 

Yeah. And also can you generate like completely new. Um, audio, like not just dialogue, but Foley or soundtrack and things like that. Yeah, it 

did the similar, I mean, it did a similar thing in that test clip. My audio's not hooked up right now, but it, it added like some explosion sounds and some music sounds so like it did, how 

did it sound?

And you know, 

like similar with Veo 3 where it's like, oh yeah, that sounds like it kind of fits, but it also sounds like it was. Ripped from a movie sort of 

garbage. 

I had, uh, I did another test where I had like a Talking Dog podcast and I said, make the dog say something. And it did do it, but the like, lip sync wasn't.

As good as Veo 3 in this one test. I want to caveat, these were just like one test outputs that was not very, it was not very thorough test 

the, the Veo 3 audio, at least the examples that I'm thinking of. Uh, I mean, that audio is not usable at all for anything professional. 

No. Unless you want it to be weird.

Gimicky funny, but 

it makes for a great LinkedIn post, but, you know. Yeah. But this, at least 

the, the, I mean the audio sounds weird, but the lip sync usually like. Pretty close to spot on. You know, it would, it would, it would look like, I just noticed to here in the, in the, uh, the one update, uh, the, with the new model that they specifically called out image capabilities with advanced image generation and image editing.

Control. And this is interesting because this was sort of like one, 2.2 came out and it was like intended to be a video generation model, but then a lot of people realize like if you just do one frame, it's actually also a really good image generation model. And so it seems like they're also leaning into that too, where it's like, oh, you do the same model, you could just do the same.

Use the same model and same control for just doing image outputs and, and editing. 

Yeah. I've used, uh, one for image generation in the past. It's equally as good as any image generator, if not better. I think the one thing video generation, uh, models have over image generation models is that when you go into the latent space, you have not just a notion of the things that need to be generated, right, like the tree or the sky or the person, but you also have a notion of.

Time. And that really helps with, uh, image generation because if you're generating somebody you know, that's in motion or they're doing something, they're, the AI model is aware of how they're moving in time. And it's almost like a spatial understanding, an animated understanding of that person, which really helps with achieving like a slightly higher level of quality.

Mm-hmm. 

Yeah, that's, yeah. All right. Yeah, so I mean, Wan 2.5, excited to see where that goes. All right. Next up also from Alibaba, uh, Qwen. Open source image generator got an update. So Qwen-Image-Edit has existed, but now there is a new version, which with a terrible name called Qwen-Image-Edit-2509, assuming that is for September of the September update of 2025.

But they basically said they rewrote the entire model and it is with text-based editing on the level of like a Nano Banana or a seed dance. For, but it's open source free. There are Comfy workflows you could download and run it, or you could use it on open platforms that have integrated it. 

Yeah. I'm on the Qwen blog right now and the main AI researcher, he's, he's all over it.

He, he puts his likeness on all a bunch of the tests. He, on a, on the blog post? Yeah. If you go to the Qwen blog post, so the first one's really funny. They just take two arbitrary people and put them into a wedding photo. Yeah. It's kind of a bleak. Uh, you know, o bleak showcase of the future, where it's not just synthetic people, but synthetic marriages.

You see 

the, you see the guy like, who has all of these, um, fake, uh, relationship photos of him and Taylor Swift 

Oh, 

that he posts on Facebook? Yeah, 

this is right up that alley if you're into that kind of thing. Um, look, Qwen actually is quite well known in the open service community. It's, it's been around for a while and, um, you know.

Along with, uh, you know, stable diffusion and, uh, flux and some of the other open source image generation models, it's like a building block for a lot of custom workflows. Mm-hmm. 

And this is a cool too, and this is also kind of ties into why, you know, it's powerful. You have Comfy in the workflow. 'cause you could use a pose, the pose node to draw out Exactly open, open pose, what you, the exact pose you want.

And build that into the workflow. And so then it poses your image generation with exactly what you want. So it really gets into that fine tuned control of being able to like maneuver and, and, and tweak things to get the outputs that you want. 

Yeah. And I wonder if there's any overlap at Alibaba between the Qwen technology.

And they, right. Yeah, I was wondering that because those not at the same time. Yeah, 

I, I mean, yeah, they're two powerful products with different names. Like, I've been curious, like where is, are these completely separate teams, completely separate models or the, is there a lot of stuff pulled from the same training or, or models?

Yeah. I mean, to train models is so expensive, as you know, and I wouldn't be surprised if the one foundational model. Is, uh, just under the hood of the Qwen model and the Qwen is just running like an additional couple of, you know, layers of network on top of that. Mm-hmm. To achieve all of the customization and adjustability.

Yeah. Yeah. And then also showcasing high text, uh, adherence, both in English. And 

are we ever gonna have a week where we don't talk about a new model? 'cause it hasn't happened yet. I don't know what we're gonna talk about 

then. This better look at this better restoration one too. That's pretty wild. 

I know that's crazy.

Yeah, that's uh, actually it's funny, uh, some of the early examples of Nano Banana was photo restoration. Yeah. 

Yeah. I mean, it's great. It's a great use case. Yeah. And new week, multiple models, people have 

been rivals for, sorry. 

Oh no. Saying another week. More models, 

mo models, mo problems. 

There are more problems like keeping track of them.

No, what I was saying was, um. You know, Google and Alibaba have been tech rivals for decade. Right. Or even more. And uh, you know, I think when VO came out, Nano Banana came out, they couldn't wait to drop their version of those two things. It's just like an arm race between those two 

companies. Yeah. But I am still just so curious of like where the drop in, the free, the free models where that.

Comes into play. I mean, I guess maybe the play is because, you know, they're coming from Alibaba and they're basically kind of like the AWS of of China, right? Am I, I'm not off base there in saying that, that like, oh, you got, these models are cool, you mess around with them, you build 'em into your apps, and then when you wanna deploy and scale it up, you can't run that onto your local computer.

You gotta. Power up some servers and if Yep. And Alibaba would be an option so that you might use their servers to run these models. That's my, yeah, exactly my guess at the business model. Alright, next one, another one from Google Labs. This one is a Mixboard. 

Yeah. Interesting product. So this feels like somewhere between Miro and Nano Banana.

If N Miro and Nano Banana had a baby, this would be, I 

mean, it is literally n it is. It is literally Nano Banana and it's a bit bit Miro. Yeah. I mean it's a very lightweight Yeah. Kind of like, well, yeah, I guess it's too low lift of a comparison to call it, uh, to compare it to like Flora or the other Ai mo not mood boards, but the AI.

Yeah, like node based. This is definitely not a node based workflow. No. What I would call this is maybe just a, a visualizer for multiple outputs. Uh, an organizer mm-hmm. Of ideas perhaps. Yeah. It's a very nice, simple interface. You got your read 

board features. You could drop images in, uh, down here on the bottom.

You could just start generating stuff and it'll just throw it in there. It, you don't have any model selections, it's just everything's running through Nano Banana. 

You know what would be nice is if it did have model selections 'cause or if Freepik or somebody like that released this product, you know, within that platform.

Well, 

we'll have, we'll have news about that in a second. Yeah, it's simple. And then it's like, if you just click on an image, it adds it as a source image. If you highlight your images, it adds it as multiple source images. So you could just tell it what you want. Again, if 

you're ideating and, uh, you're going through a lot of generations, let's say you're, you know, you're ideating on a movie.

And you're going through a hundred different generations, you have to organize them. 

Mm-hmm. 

At a shot level, at a sequence level. I mean, this is one way to do it. You know, it's, yeah. Or maybe 

you have a batch of your characters and then a batch of some locations. You click on the character, click on the location, and then say, you know, make a medium shot of, from, you know, a high angle of this.

Yeah. It's just, it's, 

it's like a digital white 

thing. Yeah. And it's just convenient 'cause it's like, yes, you, it isn't really anything new that you can't do in any of the other photo editing tools. But, uh, it's nice where it's like, okay, 'cause you don't have to download the files and re-upload them. It's like, here you wanna do something different.

You just highlight 'em. You tell it what you want, it makes a new image, and then you, uh, you just repeat. I not quite clear how, as far as like plans or if you have to be on a paid plan to use this or if there's any usage limits. Um. It's like, you know, in their labs feature. So that's usually not, I don't know, as restricted.

Um, but I'm sure there's some usage limits. Yeah, I'm on, um, 

I'm on an enterprise Google Workspace account, uh, and I was able to use it just fine, like it's part of Gemini and Nano app. Yeah, I mean, I have a workspace account 

too, so it's like I know I'm logged in and so like I have Google ai. Plan. So, but I don't know, like if, if I was just on my regular Gmail account with like, no Google AI plan, I don't know if there's a different usage usage limit.

Different usage limit because there's no indicator here of like credits or usage or, or anything like that. So, I don't know. I mean, I assume, I'm 

guessing not. 

Yeah, I mean, look, if you wanna free Nano Banana hack, this might be the way to use it. But I'm sure if you churn out a bunch of stuff, we will eventually, um, yell at you.

But, uh, as of right now. Can't, can't quite tell if there actually is a usage limit. Yeah. The other thing, I think got some of other handy tools here too. I'll say that to, uh, either, uh, make more images like this one or remove background, which is also a handy feature to have. Absolutely. Yeah. Um, so yeah, new cool tool from Google.

I'm 

surprised at how good AI is at removing backgrounds, just in general. That's, it's across all the models. It's so much. Yeah. It's 

like funny how like that became like rotoscoping or just delete, removing background stuff's like. Pretty, pretty, pretty, pretty solved. Yeah. 

All right. Next one is Google Flow.

He has a 

grab bag of updates in flow. So Flow is Google's web product for their video creation tools. So it's where, if you want to use Veo 3, that's sort of the home base if you're using Google's. Um, built in product, they added a couple, uh, new features. One is prompt expanders, and so this is actually kind of cool.

Basically, you can assign sort of presets of like your scene or your look inside. Flow. So when you do your scene generations, it can have some persistent memory to keep some elements the same, like keep the style the same or the location the same so you can kind of build out. It has like more prompting options to keep some elements consistent in your prompting while you try to change or modify other elements.

So I think that's like a nice. Useful, like kind of filmmaking, AI filmmaking. 

It's like a natural evolution of, uh, of a platform, or I guess enough user feedback was gathered where they're like, yeah, we obviously need a notion of what we're working on, memory of what we're working on. 

Yeah. I mean, this is something you could have built yourself when you're building on a prompt thing, like in a Gemini Gem or in your Claude Project where you can kind of build settings and have it incorporated in to the prompt every time you generate a prompt.

But this just. It's like if it could save you a step of having to like round trip between one tool to make your prompts and one tool to make your videos like awesome. Great. 

Yeah, I mean to be honest, there's just, the Google tool set is still a little bit unorganized in my opinion. Like there, there's still just a little bit all over the place, 

I think.

I mean, yeah, I think a lot of their stuff is like they kinda just throw a bunch of products out there and see what sticks and then potentially double down on it. I mean, that's also why I'm always like a little bit hesitant of. Fully adapting a Google tool into my workflow, because I think we talked about this a while ago, but like I have a long history of using Google tools over the last 15, 20 years, uh, and that they end up killing, um, whether it was like Google, 

Google 

Notebook, Google Reader.

I, I did, I I was not a big user of Google Wave, but I don't know if anyone remembers Google Wave when they tried to build an alternate to email. Yeah. You 

talked about this before. Um, I'm guessing. S you must have lost something significant in one of those product discussions. I mean, they're 

really good products that I used a lot.

And then there's just, you know, there's just left a void in my life of not having Google Notebook around. 

Yes. Which you're still trying to fill, still trying this day you just go to Alibaba. Yeah, 

so, I mean, I don't think flow's going anywhere, but um, you know, like if you're like, oh, Google mix, uh, mix board, I'm gonna like, that's gonna be my new mix board.

It's in the labs feature and it's like. I wouldn't be surprised if one day they're just like, well, you know, we'll just toss it out. Bye bye. Yeah. 

Yeah. 

Other update and flow too, and this is also ties into just like making the platform easier to use. Now when you have your images, your frames inside flow, you can instantly edit those images with mount a banana.

So now a banana is like integrated inside flow, so you could modify your images. Again, saving you like that round trip step of like pulling your images into different, into different platforms, which is always good to see. And then also, speaking of Nano Banana now in Photoshop, beta Nano Banana. Is integrated inside Photoshop.

So you can, that's amazing. Prompt do changes. It comes into a layer. You can also use it with, uh, the harmonized feature, which blends elements together, which we talked about a few weeks ago. So, yeah, that's, I think there was that guy that built that plugin that we talked about like a, a few episodes ago, and then we're like, well, it's only a matter of time before Adobe just does it natively.

Yeah. Yeah. And it totally makes sense. It's, it's solving that last mile problem. Right. It's what you just talked about. It's like eliminating all the round, tripping, uh, when you're mm-hmm. You know, doing a, a giant project, imagine like the number of times if it's like, uh, three operations per image times a hundred images, that's a lot of time.

You're just wasting. 

Yeah. And like to be in your main tool set that you're using and just have everything there. And so you just call up what you need, but stay in your tool set and have that control and send layers like just the layers you need and save all those export steps. There's, yeah, there's an awesome setup for doing that and we're having to Adobe be opening 

up.

Adobe opening up their ecosystem to third party models Yeah. Is probably the smartest thing to have done in a long time because stuff is just so powerful now. Yeah. Way more powerful than Firefly could have ever been. 

Yeah. I will say, um, 'cause I've been investing around with Firefly boards, which is actually like the best Firefly product than Adobe has made.

Um, it's just, it's a good board generator, kind of, it's like. Sort of like the mixed board thing we just talked about. I don't, I, oh, I think we were both outta town when Firefly boards came out, but it's basically same idea. It's a mood board, but you can call up a bunch of image models, you can call up a bunch of video models, but it also tracks the prompts that you used and it tracks the model that you used in that image.

So if you move that image around to other apps, so it'll track the, uh, where it came from, which is just nice for like kind of ai like authenticity and tracking where data came from. 

Yeah. 

But it has Firefly four, their newer image model built in. And I messed with it, and it was actually like decent. It was, yeah, no, Firefly was sort of the punching bag of AI models in the early version.

But, um, it, I was pleasantly surprised for it was 

You're also a really nice guy. I have 

a, yeah, well, I mean, I try to be optimistic, but, uh, you're 

very easy on, on things. 

I, well, no, I mean, we, we've talked about how the first fireflies were pretty useless, but, uh, the newer ones were pretty good. Good for Firefly.

I mean, it's not a 

good for Adobe, good for Firefly, good for, yeah. But the Firefly 

boards is actually a pretty, pretty handy, um, product that they've built. 

Well, that's, you know, it's funny you bring that up. That's like the number one dilemma with, uh, any image generation model is like when you go a commercially safe route, at least the notion of what commercially safe is today, uh, you're looking at hundreds of millions of images to train on.

Mm. 

But if you go with publicly available data, then you're looking at billions and billions of images to train on, and so you can't have maybe both. Licensed commercially set training data and output quality at the same time. Like these are some of the things that are still up in the air. 

Yeah. Um, yeah.

To, to, to be figured out. All right. And then speaking of boards, another update from Freepik. So they've sort of been teasing this, uh, they're calling it Freepik Spaces. It's not out yet. I think they have a wait list or some beta testing, but it's basically a board node-based version of Freepik. So similar to That's cool Weebly or similar to Flora.

Um, but Inside Freepik. This is cool to see. I'm, yeah. 

Sh shout out to Joaquin. Yeah, he was on our show a few episodes back. Uh, I mean, look, uh, you and I use Free. We're not sponsored by Freepik. We, we just use it now. We paid for it, 

but I like their model. Yeah. 

I pay for it too. And it's just a such a well designed aggregator of everything that's available today, more or less.

Yeah. They've got the models in there, right there. And I mean, like. The unlimited image generation, it just removes that mental barrier where it's just like, yeah, cool. Like I feel free to, free to make pics. 

Exactly. Yeah. 

I I'm also hoping that this does, 'cause the one weakness, and we talked about this in our interview with Joaquin, is, uh, is um, they're collaboration tools they like, are kind of non-existent.

So like if, you know, we're working on multiple projects and we're trying to do images in the same feed, they kind of, their organization is kind of rough. It's so, like stuff gets mixed in together all the time. It's kinda hard to find elements from other projects. Uh, it's hard to collaborate with other people.

So I'm hoping the board features also the board feature, uh, also. Adds more kind of collaboration tools and features into Freepik. I see. As a, as a platform. 

Yeah. I mean it keeps history of everything you generate, but it'd be nice to like put it into Project Buckets or something like that. Yeah. 

Where it's like, oh, we're working on this project.

Everything we generate should just stay in this project. And then if you flag it, it's like a select or something and you can organize it there, but just to keep better project separation. You know who 

does that really well is Frame io. Frame io. Frame io. Just the way it sorts out projects and keeps. Like, who should access what and so on.

So, oh, 

you mean just more from like a, like an asset management perspective? Yeah. Oh, yeah. Yeah. I like Frame.io for that. I was thinking like, I was just jogging my brain. Like you could generate, you could use ai, you could generate AI and Frame.io 

know Joey, excuse me, 

Frame.io my favorite model. Yeah, they dropped last night.

Yeah. I mean, they, they did finally add transcription to videos like a few months ago. So that, that, that, that was a cool feature. 

Yeah. 

All right. Uh, next up some other updates from Topaz. They dropped a couple new, um, upscaling models. So, uh, one is Starlight Sharp, so it's their newest model in the Starlight family, and it basically can upscale with an extra level of definition.

Not present in any other offering. 

Yeah, it's interesting that it's a diffusion based video restoration model, so it's doing a lot of infilling with generative AI is what I'm guessing. 

Yeah, I mean, that's what their initial Starlight model did. So this one just seems like extra sharp. But yeah, the Starlight model is good, um, for both synthetic data, like AI generated stuff that you wanna upscale and just restoring video footage that.

You want it to look sharper? Like I've, I've run the original Starlight on some archival video and I do struggle with my brain of trying to decipher like it, like it looks good, but it's like, does it look, does it look really good and sharp and I'm just not used to it? Or is it too in the uncanny valley weirdness phase?

And I've had a tough time with my brain figuring out, like, if it just like, looks like it should have looked, you know, from upscaling like a 360 video to, to hd. Or, um, uncanny valley. Weird. But it's good. Yeah, it's a really good, it's really good. Upscale. 

I upscale some old VHS tapes of, uh, my childhood and, um, 

had a look.

The shots that it did really well in is like closeups and medium shots, but then when it's like super wide, like if you have the beach and the palm trees, but then you have the tiny people, like, you just can't figure out what, 

yeah, when there's just no detail there. And the wide shots with like crowd stuff or details in the background.

Yeah. It doesn't, it doesn't, still doesn't do a great job 'cause it just doesn't know what was supposed to be there. 

Well that's the thing. I think if you integrate, uh, A VLM into the solve, then it could look at the frame and detect that it's a beach. And the beach. Those dots on the beach are people And then tell on, I think that's what're to generate people.

Yeah. 

So like it knows that that should be a crowd, that should be sand. That should be. Building skyline in the far distance. That's like kind of blurry. 

Maybe they're working on it already. 

I'm sure they are. And the other model is, um, we 

are, 

yeah, like guess we know we are, we are aware of this. Yeah. But the other model from them is, um, uh, NYX, NYX XL NYX.

And so this one, yeah, this one's specifically focusing on denoising videos, um, while preserving detail. This is just a cool, like you've got some older footage or you've got like some super zoomed in iPhone footage that you're trying to sharpen up. You know, I like these kind of models where they're very like specific for like, you got a very specific prop with your video.

Yeah. Here's how you can fix it. This is probably in, oh it's in Topaz video. I know they've been doing some rebranding. 'cause initially like it was Topaz, like video ai, which was a software you ran on your computer and like these models would run vocally. I think they've been shifting a lot more stuff where you can.

It, it's just a subscription package and stuff runs on the cloud, so you don't have to have a, uh, beefy computer to run Topaz anymore. 

Yeah, I mean, video restoration must be, so GPU intensive, first of all, how many people out there have a computer good enough to do this? So you're significantly limiting your customer base.

Mm-hmm. If you just have a local version. Um, and also the opportunity to. Markup, uh, cloud access and usage on the cloud. I mean, that's just like recurring revenue at its best. 

Yeah, I mean it's, you know, talking now that you mentioned that it's kind of funny that they. Sort of, you know, seemed to have found this new use case too with like so much AI generated video that needs upscaling.

'cause before Topaz was sort of like this thing you knew about if you were like maybe in like the documentary space and you're like, we got this like archival footage, well how can we improve the quality of it? Or you need to like fix it video with issues, you know, now it's like has so many other use cases and like such a bigger market of users who just.

Like I made an AI video. I need to like make it sharper and make it look better. 

Yeah. Topaz is kind of, um, like a name that is saturating into the mainstream a little bit. Um. 

Mm-hmm. 

I was talking to somebody outside of our industry the other day and they knew what it was, and I was like, I was really shocked because most AI tools, there's stuff that you and I talked about and nobody else has any clue about.

Yeah, 

yeah. Outside of like ChatGPT, like Right. I'm not like even, yeah. Even Claude, I was like, Hmm, I don't think most people know what Claude is. Speak to Claude. The Painter. 

The painter. 

So this new integration, uh, with Claude Code and Figma. Nice. So through MCP, the model context protocol, which we talked about a couple months ago, which is sort of like Claude Protocol where AI can talk to other apps.

This integrates with Figma. So the idea is you could build out your interface in Figma and just point it to Claude Code and it will automatically like look at your interface and build out your app. 

So. 

And but honor the interface that you built. So yeah, it kind of solves front end design. You build out the front end, it programs everything.

And, uh, give us, give our viewers like what Figma is for those of them who don't know what it is. 

I mean, yeah. Figma is basically a web-based version. It's a web-based, uh, design tool. Uh, I don't wanna 

user interfaces and stuff like that, I would say. Yeah, I mean, it's sort of like illustrator 90% of the interface, if that's the best comparison.

I was saying it's sort of like a web version of Illustrator, but I don't know if Illustrator is a good comparison. 'cause people might not know what that is either. 

Oh yeah. No, it's, it's like illustrator on steroids. So you can do, um, page transitions. Uh, when you click a button, what that button opens up.

And so you can actually wrap up quite a bit of. Your user interface into Figma without ever writing a line of code. And it's, uh, really easy to use. There's like no, uh, technical hurdle to overcome. But then the challenge is how do you take this like really sleek user interface that you design in Figma and then actually implement it back into an application that you're building?

So I think this is a game changer because most of the times UI designers aren't gonna write code. And then the people that write code don't understand UI very well, right? Mm-hmm. So this is bridging those two worlds really well. 

I mean, yeah. And there's, yeah, there's a lot of jokes too where you like built out something that looks really cool in Figma or your like, design looks awesome, but then when it actually gets built and turns into a functioning app or website, that that design completely falls apart.

Yeah. If you go to like any modern website now, you know, that has the fancy buttons and like as you slide up the website, things happen and thi. Or if you're running any app outside of, you know, your Android app or your iOS apps, the native ones, chances are it was designed on Figma. It's like 

mm-hmm. 

The number one tool for UX designers across the world.

So, yeah. This is cool. I'm excited to see how this, like people use this. Yeah. And also like I'm thinking, you know, this is an example where it's like, oh, it's integrating with a popular app. What happens when that also integrates? Like you have Claude integrating with something that's more photo related or video related, and it can do.

A powerful automated, either rough kind of assembly. Yeah. And it's sort of cloud based editing. It's funny, touched on 

MCP, uh, model context protocol. Like the more and more I hear about it, the more it's sort of just replacing traditional API connections. 

Yeah. 

Going into that route. It's like a more sophisticated version of how.

Two applications talk to each other. 

Yeah. I found pros and cons with them. 'cause I've been messing around 'cause I've been messing around with cloud code and I was trying to connect it to like Airtable and I initially tried the MCP connection and it would just sometimes take a long time to find the thing because it's using AI and it's, so, it's the advantage of MCP is you don't have to give it search.

Defined data like you do with API APIs, they need a very defined data set and fields and all sorts of information. MCP could be like, yeah, find me this task on like this thing or this record on this thing and it can understand the abstract language. It just took a lot longer and the thing I was trying to do was just like automate a bunch of fields like fast and so then I found it was actually.

Quicker and easier to just have it use API and it built a script. I don't know how to do any of this stuff. I just tell cloud code, like do this Frame.io. And it's like, okay. Built a Python script to, I mean, it it to connect to, to Airtable. 

Well first of all it's pretty cool that you took on such a technical challenge and it's crazy that you almost got it figured out.

I mean, it figured it out with the API that worked, that worked faster. So now when it needs to update stuff, it's just like runs its little Python script and it's like, okay, do do, and it's updated. So I feel I'm basically, I'm saying there's like pros and cons to the API and the M ccp, but the MCP is definitely for the, I think stuff like this Figma integration.

Like you wouldn't be able to do that any other way. 

Yeah. And, uh, we're gonna see more M ccp. 

Mm-hmm. We feel we had that, that blender integration demo like a couple months ago. Uh, which I'm, I haven't really seen anything new come out of that, but I'm sure we'll see more of that as well. Yeah. For like Yeah.

You know, it start, build out a basic animation of this character. Just kind of builds out the, found the, the groundwork so you can jump in and, and take care and modify it, but least you have a starting point. 

Yeah. I wonder why not. Like some of the more complex enterprise level tools like Houdini, Maya, Andre, come to mind, unity, like it's just ripe for the taking.

Somebody just has to come in and build a really kick ass, um, user assistant feature. 

Oh, that integrates with these tools. 

Yeah, like if you train a custom model on all things. Maya, like, uh, learning Maya takes years, decades, even. Right? What if you could shortcut that to like 

two weeks? That Yeah. That I don't have any disagreement to that.

That would be very powerful. 

Yeah. I guess, uh, the question is like, where's the money in it and why would you do it? And when there's ai, I mean, if you 

could expand your user base, but then it's like, will you do, like are people just still gonna get Maya? Or, or is it like, is it just better to build a more simplified 3D program?

Yeah, exactly that. Yeah. Like start from, start from scratch. I, I, I, I don't know. I was at the, I went to the, the Comfy UI meetup, uh, yesterday at, uh, that was at, uh, fabric in, uh, in Venice. And, um, there were, there's definitely like a growing number of people in the crowd who are traditional EM and e background.

And are AI curious and are growing and like wanting to figure out like how to adapt these tools. And so there was a good group of people who were like traditional video editors and then there was also someone who was like. VFX 20 years, you know, he was like, I was around, you know, during the formation of Nuke and all of these other tools and saw the, the, the shift, you know, to node based VFX work.

And then he is like, I see Comfy ui and he is like, uh, this is, this is the next VFX thing. This is the new, this is the next nuke for the, like AI VFXH. So I thought that was a very interesting comment from someone who's been around in, in, in VFX for. 

Oh, that's a long time. That's very encouraging to see some of the tides or shifting.

Mm-hmm. And I think, uh, we talked about it before, it's like the folks that have expertise in VFX, I think those are the game changers of our world because they have the eye, they have the discipline. They put 12 hours in front of a computer, no problem. Right? All they have to do is just learn this new stuff.

Yeah. Yeah, a hundred percent. 

Alright, and the last thing on our list of the AI Roundup is Suno. The controversial music platform is back with a new version called V five. Now it's. We, it's kind of cryptic. We're having a hard time finding what's new with it, but it's what is better? It's better. And how is it it better?

Uh, with most audio models, as you can imagine, it's all about clarity, control, uh, the ability to separate out the tracks, uh, so have stems, the ability to have more control over vocals. So perhaps they have, um. A better vocal generation, you know, a cleaner, higher frequency sound and things like that. 

Yeah.

Um, yeah, I'm discussing, yeah, better, better vocals, better quality. Yeah. I mean, I found Suno out of the box is pretty good at making, I mean, I don't use it for making, trying to make some background beats or something. Yeah. I don't, I mean, the lyrics are funny. I, I, I, I wouldn. 

That, that, that's one thing I publish anything.

But we've talked about that before. Suno does actually. Well, is, um, it, it'll do, I think it'll do voice to voice, so you can give it, you like scratch reading or scratch singing something. Mm-hmm. And it'll turn that into actual vocals. 

That's cool. Yeah. I can see that's like good if you're a musician and trying to kind of brainstorm ideas and or test things out more quickly.

Yeah. But, uh, yeah, I, people talk about the future, but I saw Spotify just came out with some new guidelines around clarifying some when songs were AI generated to sort of fight back on the flood of AI generated music without any human input. So yeah, he got this kind of battle of, of soon five coming out where it's probably gonna sound even better and more realistic and more.

Uh, higher quality and Spotify. Yeah. And everyone else fighting against the flood of 

ethically, I'm AI stuff being dumped out. I'm so torn ethically when I listen to AI music as well as looking at some of the ai quote unquote movies. But the music hits harder Frame.io because, um, it's so much closer to real music.

I mean, the music thing is also like, it is a one button and you get something that you could listen to and sounds kind of decent, where there's like, no. Human input at all, really, aside from my prompt. And then if you think about like 

how long people need to learn an instrument proficiently enough to play it at a, at a level where you know the output is usable for music.

Mm-hmm. I mean, you're shortcutting all of that, you know, with just a prompt. It just doesn't seem fair. 

Well, the ai, the AI suno band, like can't go on tour and play for you live. So true. If you're real, real musician, you got, you got the live show buffer. 

I mean, I feel like a bleak, I don't know. I'm feeling bleak today for some reason.

So, dystopian view of the future is the AI ban has a lot, uh, is just Projective up on an LED wall with, uh, digital avatars and you're just. Jam packed physically with a constant goer next to you. And you guys are all looking at the screen with digital performers? No, 

we're all looking at the AI band inside the Metaverse in our, in our meta headsets.

That's right. That's right. Yes. The Meta Ray bands. 

Yeah. Yeah. Meta Ray Band, VIVE Mars with, 

and then even worse, it could be like the version of that social media app where, I forgot what it's called, but it's basically like a. An A, it was like a, a fake newsfeed where like you log on and it's like, you know, you post stuff and then a bunch of like AI bots.

Reply to you and like kind of give you that, like that dopamine boost of like, you know, kind of going viral. But it's all, it's all fake. So it's you at your concert in the metaverse surrounded by the whole crowd, but the whole crowd is, um, AI is bots. 

Bots, yeah. 

Yeah. It's, it's you AI bots and the crowd watching.

User performance. Oh, is that ble? Is that a ble? Bleak enough future painted. It's way more bleak 

than mine. I don't know what you've been doing all morning, but yeah. Okay. You beat me to it. 

I was just, I was just going there. It's riffing. You're just 

riffing. That's great. Uh, you know, I'm kind of glad you didn't come over the house today.

No, I was log off, buddy. All right. Well, a good place to wrap it up. Yeah. On that note, thanks for everything we talked about@denoypodcast.com or down in the show notes on YouTube 

if you're listening to this on Apple podcast. Do us a tiny little favor. Leave us a five star review. If you scroll all the way to the bottom of your iOS app, it's down there.

Most people can't find it, but yeah, it's all the way at the bottom. All right, 

thanks everyone. We catch you in the next episode.