Kling O1, Seedream 4.5, Z-image: AI's Biggest Week Artwork

Denoised

When it comes to AI and the film industry, noise is everywhere. We cut through it.

Denoised is your twice-weekly deep dive into the most interesting and relevant topics in media, entertainment, and creative technology.

Hosted by Addy Ghani (Media Industry Analyst) and Joey Daoud (media producer and founder of VP Land), this podcast unpacks the latest trends shaping the industry—from Generative AI, Virtual Production, Hardware & Software innovations, Cloud workflows, Filmmaking, TV, and Hollywood industry news.

Each episode delivers a fast-paced, no-BS breakdown of the biggest developments, featuring insightful analysis, under-the-radar insights, and practical takeaways for filmmakers, content creators, and M&E professionals. Whether you’re pushing pixels in post, managing a production pipeline, or just trying to keep up with the future of storytelling, Denoised keeps you ahead of the curve.

New episodes every Tuesday and Friday.

Listen in, stay informed, and cut through the noise.

Produced by VP Land. Get the free VP Land newsletter in your inbox to stay on top of the latest news and tools in creative technology: https://ntm.link/l45xWQ

All Episodes

Denoised

Kling O1, Seedream 4.5, Z-image: AI's Biggest Week

December 04, 2025 • VP Land • Season 4 • Episode 74

0:00 | 37:09

Addy and Joey analyze Kling O1's superior video modification capabilities versus Runway Aleph, Z-Image's emergence as an open-source alternative to proprietary image models, and Seedream 4.5's unique batch generation feature. Plus, they compare new releases from Runway Gen-4.5, LTX Retake, FLUX.2, and TwelveLabs' Morengo 3.0.

The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.

Welcome back to Denoised. It is time for the AI roundup. You ready? Addy? Always ready. Joey AI roundup. Here we go. What do we got? We got Kling. Is that getting old? I don't think so. I think this time we're gonna have to change your hat

Z-Image, Runway, new Seedream model. A lot of stuff. Let's get into it.

All right. Welcome back, Addy. I'm glad we're doing this one remote since you

Yeah. There, there's a contagion in this household right now, so yeah. Thank you for, uh, not being here. Joey.

Thank you for, thank you for the consideration. All right. So busy week. I don't know, what do we say? Christmas came early from the AI companies.

It just keeps coming. They really dropped them all after Thanksgiving and every time we've kind of been delaying recording the episode and then I'm like, ah, but like it's a hot topic and then a new model comes out that morning. So you're

so smart on your timing. I was like, let's go today. Joey's like one more day.

Seedream comes up. Wait for it.

All right. So I think. Well, we got a bunch, but I think the biggest one, especially the one I've been seeing on my feed a bunch, is clings new multimodal model. O1. O1.

Omni Omni 1.

Yeah. Kind of being described as like the Nano Banana of video. This one, multimodal understanding.

You can give it an existing video, tell it what you want to change, similar to Runway Aleph, but the outputs I've been seeing from this look. A lot better. A lot more consistent. Mm-hmm. A lot sharper.

Yeah. What is it? Is it native 1080p or 2K? What are we looking at resolution wise? You can upload

a video up to 200 megabytes, 2K.

I think the output's gonna be 1080, but I don't see it. Confirmation. But I think it'll be, I mean, I, it definitely would not be more than 1080, but I mean,

1080 p feels like the bare minimum in today's video generation world.

Yeah, for sure. I mean, and if it's like a true 1080, 'cause uh, even with Veo it's sort of like if you work in flow, it's like a 1080, but you have to up press it so it's really 720 and you're kind of upresing a true 1080 is good.

Yeah, and I believe, I could be wrong, but I believe, uh, most of the diffusion models, image and video included. Our square kernels. So it's either like 1024, 1024, or you know, 1280. 1280, whatever. None of the video models natively run at 16 nine aspect. Okay. Yeah. So they're kind of just faking it. Yeah, they're probably generating something bigger and then cropping the top and bottom to give you that cinematic aspect ratio.

All right. So yeah, some of the examples I've seen, so like really you can give it a video similar to what we've seen with Aleph. But much sharper, uh, remove objects, change people, change locations, change scenes. I've seen another interesting trick where you can give it a video input with an overlaid still image.

So in this one they kind of gave it a video. Just like

Nano Banana.

Yeah. But in this case you, it's actual video. So you can give it this video input and then say, composite this car and change the style of the landscape to look fuzzy. And so we got this lingo one output. With the car composited driving down the street.

Super accurate. And this one, they did a test against Aleph and you could see Aleph sort of, yeah, kind of just did some random like sorta of kept the scene but did a bunch of random stuff.

That's interesting that this, this is happening, uh, as far as the entire ecosystem of all the models goes, oh, one is being compared more to Aleph than it is to Veo 3.1 because it's essentially a video to video model.

Yeah, and I think Aleph is the only thing that we've really had so far where you can give it an existing video and through a text input or image input, modify elements of that video. There hasn't really been anything else that can do that. There was sort of the Veo three, you can modify the video, but that only worked on a video that you generated in Veo and then you wanted to change later.

You couldn't just upload. A video and change it.

Right?

Kling. This one's the only one outside of Aleph where you could, I'm sorry. I will also, in fairness, Luma uh, Ray Luma Modified Modified Video. Yeah, you can also do something similar, but I, I, and I had similar outputs to Aleph with that, where it would just kind of sometimes do it, sometimes make it a bit too mushy.

Yeah. So in seems in our world of film and television, the most applicable models are obviously gonna be Kling O1, Aleph, Luma Modify, and then Veo 3.1 for any type of text to video. Image to video.

Yeah. And I mean, well, maybe we got another update from Kling. The dropped also dropped today

addition to Kling 01.

Yeah, I mean, well this one is, this is a demo video from Martin Le Block get Freepik of a, uh, Kling hype video that they did. And so you can kind of see this sharp detail with, uh, different character reference images, character performances. I would say you could also try this as a kind of. I'd be curious, like for character performances, how this compares to one 2.2, right?

'cause you could kind of do that same workflow of have a live actor human performance and then give it a reference image of another character. Sounds like a good. Comparison. 'cause I also haven't really seen, I've seen, obviously I've seen a lot of Aleph comparisons, or not banana comparisons. I haven't seen any Wan 2.2 anime comparisons.

So it could be good. Could be good for that.

You're giving me ideas for my next extra tour video. Thank, thank you for this. Yeah, gotta, I just wanted, this is mind blowing how good the even the demo videos are because. I don't know if you remember when we started this podcast about a year ago, having temporal video consistency was a big deal, right?

Like as you have like, like a face and the camera rotates around you. Yeah. Like just having the face be a face as the camera's moving was challenging enough and now that problem's totally solved. And so we're onto the next part of the problem, which is fidelity and resolution and quality. Mm-hmm. This looks

good.

I would love to know though, with all these hype videos. I would love them to do a video and have like a counter of like, what number generation, this was Kling. I'm a little maybe less suspicious of, but, um, when we get to Runway runway's output, sometimes I find I have tough, tough time replicating the results that they demonstrate.

Yeah, yeah. No, I, we could go off on Runway. I think we have the Gen-4.5 release on our AI roundup. Yeah, yeah. It's coming up later. Yeah. So we'll talk about Runway then.

Yeah. Um, I've heard Runway, the 4.5 has felt a bit like, uh, the meme of the, the pool and, and the, the run Runway was the hype for the day, and then now it's kind of at the bottom of the pool.

Like,

oh, it's, it's so brutal being an AI company nowadays. I mean, your model hits the market within six hours. It's outdated or trumped by something else. It's crazy. Um, it's good for us because, hey, we can do two of these episodes and we can not have like enough time to cover everything

I know. Okay.

Other Kling thing. So, Kling O1

wasn't the only news this week, this morning that we're recording this, uh, Kling 2.6. Dropped. And this is, uh, let me look at the inputs, but the big thing with this is it could do audio. So it's kind of a bit more in line with Veo 3. Mm-hmm. Veo 3.1. Yep. Yeah. Generating, uh, audio with the outputs.

Let's see what else we got here. We got some demos from fal of character performances, I will say. Mm-hmm. The Veoices sound pretty robotic and not that great in this astronaut clip. Tell my

family

love them. Oh. The stars are beautiful from here. That's the uncanny valley of audio, Joey. Yeah, it's stars are not, not that great.

So making, making a similar comparison then, would you say that Wan 2.2 Animate and Omni 1 are more or less one to one and then Wan 2.5 and Kling. 2.6 are one to one.

Oh, that's a good, that's a good comparison I think from the quick look that I've had. Yeah, I think one, 2.5 and Kling 2.6 seem to be kind of on on par, one 2.28 as well as Veo 3.1, as well as Veo 3.1.

Yeah. Uh, Wan 2.2 Animate. I mean, that can only do one specific thing. Whereas KlingO1, you could do this character performance transfer, but you could also just modify your video or tell it what you wanna change and stuff. So also, and you could throw a bunch of references into the video generation.

You could do references. Yeah. So I'd say Kling O1 is like much more powerful than Wan 2.2. The other thing is Wan 2.2 is limited to like 720 P resolution, whereas we are not a hundred percent sure what we think Cling is 1080.

It's, it's interesting, like late 2025, we're seeing the video models almost, uh, hit a fork in the road and go in two different directions.

So on one direction, which is the main direction, you have Veo 3.1, Kling 2.5, Wan 2.5, Runway Gen-4.5, which is the best image to video you can do. The other direction is take an input video and then modify it and then I'll output a new video and that's Luma Modify, Aleph, and now Kling 01 and Wan 2.2 Animate.

Because I mean the modified video, I mean obviously that's has a lot of practical applications, but it's still, you know, like you're basically taking whatever you shot your video at and then you run it through the model. Now you're kind of compressing that video into whatever the model spits out, which we know since they have not hyped it, we know it's gonna be like a 720, maybe 1080 8-bit super compressed video. So if you're, you know, a professional filmmaker or in a professional VFX pipeline, still not there yet, obviously. Mm-hmm. It's sold. Mm-hmm. Obviously, as usual, this is the worst it will ever be. Like

it, it totally, totally right.

Yes. It's like. Don't fret on that even for a second, because six months from now we could be, we can completely overcome this hurdle, right? I think the building blocks are all there to do the video to video, quote unquote VFX. The last mile problem still persists, which is taking the output from the variation auto encoder, essentially what AI creates, and then making a bit depth resolution, color space, fidelity high enough to where.

You can feed it right back into a traditional production

and have the flexibility that you're used to. 'cause it's like you make the shot and then it's gotta fit in the pipeline and then you gotta color correct. 'cause you gotta make it match every other shot and you gotta composite or blend or you know, other traditional tools to.

Bring it

all together. Totally. And I, I think we're all, I mean, I have not tested Ray three yet. I think Ray 3 is a, um, good move in that direction.

Mm-hmm.

But I'm sure Google will have an answer for that, right. Sooner or later, Veo is gonna have some type of HDR capability, maybe pass 4K, and then perhaps having the ability to output raw in raw color spaces, right.

So you can have way more color adjustability in the. Post.

And also just like one final note with Kling. So it could do, it's a text to video or image to video input. So usual inputs you have there. But it's kinda like if you wanna have more controller modifying video, then you would go Cling. ClingO1.

What's next, Joey? Next update.

Ooh, this

website looks like it's from 1998.

I don't dunno why we have this one saved. Giga-Zine. The next update is uh, from Alibaba Z-Image. A Z-Image. Yes. Image model. What do you got about this? Addy

Z-Image is, uh, being perceived as the next SDXL. And what I mean by that is it's completely being embraced by the creator AI community as an open source, easy to adjust, easy to modify model.

If you remember two, three years ago, SDXL was the darling of the, uh, creator community because you can easily build LORAs for it. Publish those LORAs on civic ai. Or perhaps completely fine tune, uh, full weight train the model. And it was, it was a basis for a lot of products back then. So SDXL was the basis for Midjourney.

Before Midjourney had their own model. A lot of the SDXL folks went over to FLUX, and then FLUX was born. And so Z-Image gives you that same level of capability and tuning. It's fully open source, but with today's quality standards, right? Yeah. So SDXL is no longer relevant in the quality standard of today's market, but Z-Image is, you know, if you're gonna go neck and neck with let's say a Nano Banana, then this is, uh, I think in that realm of quality.

Yeah. And they've got three different variants. They've got the image turbo, a distilled version. That can run on more consumer grade hardware and also all of the stuff, there's Comfy workflows. And you could run this on Comfy, uh, locally on your computer or Comfy cloud or, that's right. You know, it's on every, it's API

and that, that's, I think the main, main appeal is that it's super lightweight, right?

Yeah. And it can run on, uh, lower end GPUs. It can run locally on Comfy. Mm-hmm. So you can go to town on

modifying this on your own. Then they got Z-Image base, which is the non distilled foundation model and Z-Image. Edit, which is for image editing tasks and more in line with like a Nano Banana kind of editing.

But yeah, it kind of this showcase that they have on their, uh, launch page. The, you know, as you were talking about, the photo quality of the images or photo realistic, like much more up to date with other higher end locked off models. Proprietary models, yeah. Yeah. Stuff that you don't have the weight information.

So yeah, the quality here is excellent. Uh, also accurate, uh, bilingual text rendering. So English and Chinese,

hopefully other languages as well.

Yeah, eventually. So, yeah. I mean the demos that we see here, it's like the demos in itself. It's like, oh, I've seen this not been at our other things. But the power here is like, this is a model that is open source.

You could download it, you could run it, you could build it into whatever workflow you want. Yep. Modify it. Yeah. So it's a very, uh, powerful model.

Yeah. And, um, folks like, uh, you know, foul. Run pod replicate Freepik. Mm-hmm. Like the big AI aggregators of the world have already ingested this into their, you know, their ecosystem.

So you can go run Z-Image today if you like, on most big platforms. Uh, I'm sure Comfy Cloud is probably gonna have it soon if they don't already. I

think. I think it's there already.

Yeah.

And I know, I've been seeing some LORAs and stuff pop up for this already. Yeah. So, yeah, this is, uh, this is a good. Step for open source.

One

of the more interesting workflows that I've seen in Z-Image is, um, not generating, uh, a base layer, but doing an image modification where you're adding detail to an existing generation. So we talked about ai sheen, plasticy skin, and that kind of stuff before I've seen Z-Image workflows where you can put an AI image through it, and then it comes out on the other end, having much more realistic skin, you know, sort of like upscale.

So ish, I think much, much better than an upscale, honestly. Okay. Yeah. 'cause it's not just pixel resolution, it's actually changing the image. Okay.

That's a good workflow. Yeah. I've seen similar workflows with like ADA and Anna as well if like sort of upscaling and slight style changes. But

yeah,

to have that, I mean also that sounds like you could find or build a LORAs that just specifically does.

That with, uh, Z-Image?

Yeah, that, that's exactly what it was. It was, uh, it was a LORA that, um, plugs into Z-Image's main model for that. Okay.

All right. Uh, next up, going back to what we talked about before, Runway. Mm-hmm. Gen-4.5. So this is a new update to their foundational model. And then we've got the hype reel here, which as we talked about, I mean, look, the reel looks great, you know, super clear, coherent shots.

Some of the shots do fall apart. But yeah, most of the shots are fantastic. I, I would just love to know what. How many generations each one took to get to this 70,000?

Joey,

this isn't a Coca-Cola ad. So now to clarify with this too, this is their video model, so this would be definitely text to video. I.

Assume also Image two video. They haven't released this model yet. You can't use it yet. They just sort of, I think it's coming out this week for, for on the platform. But this isn't, this is separate from Aleph, so this doesn't have anything to do with being able to modify an existing video. This is more just pure video generation.

Yeah, this is, I would consider their flagship product in Runway right now. Mm-hmm. Probably their most state of the art. I did see a couple of podcasts that, uh, Cristobal was on right when this launched, and I just, I'm always fascinated by CEO of, of AI companies, as you know, with, you know, we've interviewed a few of them.

Mm-hmm. Um, like what is their, what is their internal clockwork? What, what is motivating them to build the things that they are building? And it's fascinating what Cristobal said. He, he essentially. Made a one-to-one comparison to what Runway's building to a world simulation, which is kind of like the Cosmos Nvidia model or the Google Genie model.

And it just happens to do media and entertainment now, but eventually it'll do everything. What are your thoughts on that? All roads

lead, all roads lead to robots. Yeah, we've talked about that before. We're like all these world models or So the robots and cars can understand the world. Because film production applications are like that ti tiny little bit of the, of the, of the addressable market.

I, I, I appreciate the ambition and the vision of these AI CEOs. Like they certainly need it in a space that's very ambiguous. Having said that Runway, you guys still haven't cracked film and tv, so don't just move on yet. Like the stuff is. Not quite usable. Would you agree?

Yeah. I mean, unless you're doing some insert shots or something, I, I have found more just from a user experience part, like it takes me more generations to get something usable from what I'm looking for from Runway than it would with like Veo or Seedance.

And so I tend to rely on Veo or Seedance first because I know the outputs are, I'm more likely to get what I'm looking for out of like a few shots and Runway don't use as much.

Yeah, I mean it's funny that, um, we always talk about film and TV as like a small slice of the overall market, but it's. Arguably one of the most difficult

of those slices to crack.

Yeah. I mean, you know, like maybe you can't specifically put your finger on if we have uncanny valley, you know, when something feels kind of off. I will say, look, I'm looking at some of their highlight their demo videos and, you know, the real world understanding and physics. Um, I don't know if this is a shot at, uh, the Genie 3 demo.

Yeah. But they have a, the blue paint painting, the wall with blue paint, which was the, uh, the good demo in Genie, uh, 3 from Google. Right. Uh, but this shot, look, the guy, uh. Flipping a mirror back and forth and it's getting the reflection correct. That's crazy.

Yeah. That there, there's definitely some physics magic going on within this engine for sure.

Yeah. So I mean, I think the physics is definitely better. You know, I think, I'm sure the quality's better. You know, it's just still, this is, I'd be curious to test it out. 'cause maybe, maybe it does. Maybe you're getting better outputs on the first or second generation versus, uh. Issues I had before, like the success rate is probably much higher.

Yeah. Like is it getting me what I'm looking for? Yeah. So I'm, I'm, I'm curious to try it out. I got my Runway subscription, so

Yeah. But again, it, it's, it's a shame that they drop the same time as Cling oh one, which I think will take the bigger spotlight for the moment because video, video is just. Like, in my opinion, the more exciting of the two.

Yeah. Not to sound jaded already, but it's just like, oh, you do like text to video or image to video and it looks cool. Uh, okay. We've you sound totally jaded that Yeah. That bar is, um, yeah, that, that it's like a gradual input from, from what we're doing. Um, we're, we're so

past the, uh, the point of like, oh, look what AI can do to the point like, all right, but can it really do it?

You know, now we're more of a, like a realist, uh, in terms of what AI capabilities are. And also,

I mean, it's like, we also know it's like, yes, AI is very good. It can make very good looking single shots, like, you know, but can we get the control and quality and stuff to make something more coherent, uh, and work in a more professional filmmaking pipeline.

Yeah, for sure. You, I mean, you're stating the obvious from the studio world. I'm sure that's what everybody in, within those four walls are probably thinking about is like, how can we, how can we replace our existing pipelines with this stuff? You know?

Right. How

can we modify it so that it

Yeah. Works That this can work into that workflow.

Yeah. Yeah. All right. Other. That dropped this morning. Uh, one of my faVeorite image models. Seedream? The ByteDance.

Yeah. Be, see, yeah. This is the same pairing

company of TikTok that makes Seedream. Same pairing company, TikTok,

ByteDance. They have Seedream Seedance. Seedream is the image model. Seedance is the video model. Mm-hmm. So Seedream 4.5, uh, drop today. Sort of, I mean, you know, also like. It feels just, it's Nano Banana and everything else.

It feels like just getting up to parody with like, what some of the other models could do. You know, more consistency with reference images, more fine tuned editing controls. Um, you know, better text graphic layouts and design text. Yeah. Uh,

multi-image editing. Yeah. Nano Banana Pro I think came out maybe six months too early and these models are now catching up to it.

Yeah, I will say Seedream is the only one still that does that thing that I like, where it'll, it can batch generate multiple images in the same Yes. Latent space generation. There was a hack, someone posted online and I was, um, talking online with Matt Workman, uh, from some database. Database,

yeah,

yeah.

Great channel, uh, about this where basically someone was using Nano Banana to create a grid of images, like a nine by nine grid. So kind of same hack you're getting the image output in the same latent space that you get, like consistent characters and locations and shot variety. And then you would ask Natalie Banana like, Hey, like up Perez, you know, frame three, but.

Seedream does that in one shot. Wow. Like it just gives you the output images in the same generation. Yeah. Yeah. And I haven't seen anyone else, any other models do that yet, so I think that's still very cool.

You're right, it that is a big differentiator, especially in our world where we need that consistency from shot to shot.

Yeah. I've used it for character, you know, generation stuff. And instead of just making one character sheet, it's like, oh, just make me like high quality. Character images of the, uh, you know, of, of different angles of the same character that I need.

Damn, that reminds me of the, the video, the next video we're about to drop.

I should have used Seedream for exactly what I was doing. Alright, you got, well now I got more video ideas for the holidays.

There's, there's no end of testing.

Yeah. How much time I got, right? It's, uh, yeah, stay tuned. Uh, our, to our audience, uh, we have a lot of exciting testing videos and yeah. Podcasts that, that are hopefully is gonna drop in December.

And then, uh, yeah, we'll get into the new year with a big

recap. This is a good demo from fal, from Seedream of, uh, I don't know if this is an input existing image and then they. Extracted it into, into a white apra. Lemme map that address 'cause that's here

in la.

Yeah. Lemme see. Lemme see. But if that is true, it's basically a picture of a tote bag with like a very small text of like its address in LA and phone number for Canyon Coffee.

And then it was extracted and displayed on like a white background. But the text detail is preserved, which this is normally, this is an issue I've had where just small text and details get jarbled when you do. Um. AI transfers. That

is a real place. Canyon Coffee's on Echo Park Avenue. So that's a real bag, probably.

Yeah. So I think this is an input image of this real bag. And then they. Said, you know, place the bag on a, on a white backdrop. So that's cool. That's, yeah,

that's amazing that, that's a huge market. That's a huge industry, right? Uh, if you're talking about every single e-commerce that is selling clothing, right?

All the fashion novas in the world and the. Um, all the, you know, abercrombie's or whatnot, uh, I mean, they could essentially replace traditional photography with this, which is scary.

I mean, I think also once it gets fast enough like that and cheap enough where like the, for the user could do a virtual try on, I know that's kind of tough now because the cost of the compute to generate that for every user on the website, uh, is expensive.

But eventually it probably, you remember that Google app?

That we tested a few get cheaper months ago, uh, which was supposed to do, which one? Remember the one at where I wore wore a clown outfit? Which one? Oh no. Did I do that to, yeah. Um, I think it's, I'll, I'll Google it. I vaguely remember this. I think Doppl.

D-O-P-P-L, Google. Do I have not heard of Google Doel since we did that episode? We did an episode on

it.

I, I vaguely, because I think we talked about that. And we talked about they had like a, kind of like a whiteboard. Tool area as well. This is part of like Google Labs, right? Yes. One of the lab products.

But you're saying this reminds you of that this virtual tryon thing?

Yeah, I mean like they not only figured out the images, but also the video portion and it was a video to video workflow, so, mm-hmm. Uh, actually it was an image to video workflow, so that model would then walk and turn and so you could see the backside and everything.

What else We got camera controls. Virtual try on details, image quality enhancement, material preservation when editing. So yeah, it's, it could do a lot, a lot of stuff that seems on par with not a banana, but you know, still pretty high quality and, uh, it's just good to have another option out there.

Yeah.

And the other thing is Seedream is natively 4K output isn't, isn't that right?

Yeah. Yeah. So like you're getting Yeah, it has really high quality. Yeah. The pixels are all there for you. Nano Banana Pro does do 4K now too as well. Right. All right. Next one. Uh, this is an update from LTX new feature for 'em called Retake.

And this is interesting 'cause this is in the realm of video modification, but you can give it an existing video and then just mark off a specific section. Of the video that you wanna modify and redo just that section and then preserve the beginning and end. So I like this kind of specific frame modification, which the only other tool that I've seen that could do something similar-ish is, uh, Moonvalley.

Mm-hmm. Where they sort of have like, frame specific editing. Uh, so this is a cool update. Okay. The motto, the Retake demos that I've seen, the quality is a little Mm, iffy, but, um, I like where they're going with it.

Okay. So. Let me understand this correctly. So you give it a in-frame and an out frame, and then it'll generate before the in frame and after the out frame?

No. You give it an existing video. Mm-hmm. And then you can, I guess, in their interface, mark a specific section of that. So like here it's marking. Like this actor performance mm-hmm. Is reminds me of seven or something. Uh, one video take. Mm-hmm. Yeah. And then you mark off this one section in the middle.

Okay. And then give it direction. Oh. And change. So it's like the video, it's

like in painting within the timeline.

Yeah. Giving it a specific range of the clip that you wanna modify and then leaving everything outside of that range. Un touch. That's amazing. I'm saying that's why the outputs. Maybe don't feel as natural or realistic.

Right. But interesting where it's going also, and putting aside all the issues of changing an actor performance with or without their permission, the

quality is not there, but the, um, the concept and the, um, execution is very interesting.

Yeah. I like this concept of like a, instead of, I mean, maybe you could do it with Kling oh one, and if you give it in the prompt specific timings of like things you wanna change, but this like gives you that control very narrowly like.

I like the rest of the shot. I just in this very specific moment, want to change this thing.

Yeah. And also it blends in with that overall clip as well, because you're just taking a segment of it so then you don't have to do editing magic to blend all of that.

Yeah. Yeah, that too. So, yeah. Interesting. Uh, take on this.

So I'm

curious. And, uh, LTX is the, is the one that's left, uh, that hasn't been acquired. Invoke and Weavy are already gone.

They also have some of their own models though. Like they're not just a tool set like LTX has. I mean, they have this model and they have a, I think what it's called, LTX 2, which we didn't talk about.

That's right. But it's um, its own image generation model. It can do like 50 frames a second. It can do audio output. I think it could do like one minute long. Generation. So it seems like maybe they're not up to par with the quality of some of the other models, but they've been doing other things that the other models aren't doing, like super long generations and higher frame rates.

Um, so I think it's a cool, I think it's a good strategy Yeah. To like just do something different.

They're definitely unique in the, in the AI ecosystem on what they're doing.

So yeah. Some of the use cases for this are rephrasing dialogue, changing dialogue. Mm-hmm. Again, putting aside the. Permission issues from performers.

Just, just can you do it? Focusing on the technical abilities of doing this, the right stuff is, you know, and ethical stuff. Put that aside for right now for this conversation. Yeah,

I, I think, uh, consent and capability are two different buckets in

AI at the moment. Refined, tight and pacing, adjust delivery.

You know, this is interesting because I, what did I just see? I just saw some breakdown of some, oh, you know, I think it was, um, what's his name? Todd? The, the, the. The VFX artist who worked on Star Wars and a bunch of films, and he had, he's big on Twitter, and he also just did like a variety video breakdown of like VFX shots and he, it was, uh, Dungeons and Dragons mm-hmm.

Or, um, one of the movies. And it was basically like a long breakdown of how the pacing. Of like the shot of an intro of a character who like rides up on a horse and then the camera like 360 around them, and then they like shoot a slingshot. And the timing of that shot as they shot it like didn't work great.

And they wanted to like tighten it up. And it was like a very long extensive VFX process to like literally just like cut 10 frames out of this thing, right? To like speed up the pacing, you know, potentially, maybe not in the quality as it is right now, but something like refine in this feature. Whether it's LTX or another model, it's like those kind of use cases where it's just like VFX work that just needs to like tighten or clean things up.

Yeah. I mean, directors love to fi all in post, right? Like this

is mm-hmm. This is one of the things that delays movie productions is how many times do you want to go back and shoot it and redo the whole thing. So if you can prevent some of that and great, that's water money in your pocket.

Way cheaper to mess around with stuff in post than Onset.

Right. All right, next one. Yeah. What you got? This one's definitely been on the bottom of the pool. FLUX.2. Oh, come on. FLUX.2's great. No, I, I mean, in the world of, well, you know, this came out, this was announced on the 25th, so, you know, we're like a week. Oh. Behind this, and it is in the, in the, in the deep end of the pool.

Yeah. So collecting

dust,

black forest labs, I, I find this company really fascinating. They're in Germany. Mm-hmm. And, um, like I said, uh, some of them, uh, went from stable diffusion and formed this new company. They are the David and the Goliath, right? Like they are still ma hand making models and competing with Nano Banana.

Or cling or Seedream, all these big trillion dollar companies. And, uh, they're doing it with like, I don't know. 50 people, a hundred people. So anything that comes outta Black Forest Labs, I always do a double take because I love rooting for the small guys. Yeah. I mean it's impressive what they've done with a small team.

Yeah. FLUX Kontext. Uh, if you remember when we covered Nano Banana 1, FLUX Kontext was quite neck and neck with Nano Banana one and so mm-hmm. Um, this is, I believe, their answer to Nano Banana Pro, which is FLUX.2 I, I would say the only negative thing that happened here is their model. Is just a little too heavy.

Um, so I think it's 90 gigabytes, if I'm not mistaken, of VAM needed and most GPUs at home, if you're running Comfy at home, you can't do that. So this is really a model that's designed to run on a, on the cloud, on a 100 GPUs that are only available on the cloud. So, um, having said. The quality, the image modification capabilities, in my opinion with the limited testing that I did, is maybe a good second place to Nano Banana Pro.

Okay. No, that's good to know. And like, so you could, and also you could potentially, you could work this more into custom pipelines as well, which FLUX was good with like whether you can run it locally or not, you could still. Run it on a cloud in forensic, Comfy, and then add LORAs and kind of customize this out a lot more than you could with a pro.

Yeah. The, the, the little bit of hilarity that ensued was that this dropped the same time Z-Image dropped and Z-Image is, like, I don't know, I'm gonna say under 20 gigabyte, like it's a super lightweight model. Yeah. And so the creator community at the, the Comfy UI community, if you will, they embrace the image right away and they're like, whoa, whoa.

Flex two is just too heavy, too big for us to really tweak and modify.

Yeah, it's still good to know about. And, uh, yeah, I mean a lot of the improvements are kind of similar on par with what we've seen in the other models. Uh, but multi reference image support, uh, image detail, photorealism better text rendering, better prompt following world knowledge, uh, and output.

Or image editing on resolutions up to four megapixels. Mm-hmm. Also, it reminds me of black forest cake,

which is delicious.

That's a good cake. Also German, I believe. Uh, yeah. And there's a variety of models. FLUX.2 Pro is their state of the art top model.

Yeah. Flex two is their latest and greatest at the moment.

Yeah. And again, um, I, so there you go. It's already probably built into Firefly. fal, for sure. I've seen it on there. I'm sure Comfy has it already. Yeah. So yeah, it's already proliferated into

the AI ecosystem. Did this fold in Kontext or is this, is Kontext gonna be I think it's a,

it's a new product. I think Kontext is probably just a, a different product altogether.

So like eventually it would be like a FLUX.2 Kontext. Yeah, probably. I think this is like a. Proper foundational model for them and a new architecture going forward. That's my guess, because yeah, it doesn't seem like it's

as much about editing existing images, but like generating new images from.

Various inputs. Yeah.

And again, um, we're, we're, we're so spoiled and so jaded because in today's world, in today's episode, we have Z-Image FLUX to Seedream 4.5. Mm-hmm. And if you're gonna compare the three side by side, I mean, the difference is minute. They're all really good. And I know there are race, we're swimming in.

Like if you think of golden AI image model. I

know.

Yeah. I was just like, which option do I pick for the most foot? Or realistic. Exactly.

And this is just after covering Nano Banana Pro like a couple weeks ago. So like there is no shortage of good technology for you to pick from.

Yeah. I mean, that's a good question too.

It's like once you get to that point, what, what makes you decide? I, we've seen a lot of photorealistic stuff. I think some of the deciding factors, it's like depending on what your project is. If it's not photo realistic, like we've seen some models just work better with different styles or different types of images or different prompting techniques than others.

Mm-hmm. And also we don't have a cost comparison, but like, it would come down, images are less concerning than video generation, but you know, would come down to cost. Yeah.

And, uh, I, I do, I do image like. I know we're comparing literally pennies to pennies as far as like infra cost goes. If you compare that cost to like actual photography or actual videography, like it's like 1% of an actual shoot, right?

Yeah. So we really shouldn't even be looking at pennies. I mean, whether it's on the cloud. Um, I will say, uh, the question that you asked earlier, which model to go with for what? So when you brought up the seed dance. Rather the Seedream example with creating nine images in the same latent space, like that's a very unique thing to that model.

And so when you need that, you use that. Um, for the testing that I did that we're gonna show the episode for you guys, I love Nano Banana Pro, strictly because of the image fidelity, like Frame.io having pixel detail mm-hmm. Is, was like number one. So I just went with M and A Pro for that. Yeah.

All right, last one.

This one's a quick one, not generative, but an AI update. TwelveLabs. Not to be confused. 11 labs. 11 labs. TwelveLabs has the foundational models that analyze video, and so they're good for basically turning all of your video library into searchable indexes and doing all sorts of stuff that you want. With that, they launched a new version of their model, Morengo 3.0.

Basically smaller embeddings, faster indexing, temporal and spatial reasoning. So not just looking at individual frames, but kinda understanding more of the like is a video, like is a fast

moving object or slow object

and then a better identity of like images and text inside the videos and stuff. So basically this is like for, you know, if you have your video library and you want to have AI understanding of your content to search or to, you know, find things or put things together, or if you're

building AI agents for production type work, you're gonna need the, the eyes and the brain for it.

Yeah. This is how AI can understand what. Videos you have that you shot so you can do things with them later. So it's cool to see a new improvement of their model. Uh, they've kind of been one of the biggest players in town to do video understanding, uh, on a, on a temporal level. Yeah.

And um, I, I saw the TwelveLabs guys at a conference a few months ago.

Um, the, their presentations are amazing, like the stuff that they show. So big fan of these guys. Hopefully they continue to do good work. Yeah. Yeah.

So good update here. And that's pretty much her. Post Thanksgiving roundup.

Yeah, no, that was a, that was a great roundup and I'm glad we waited. We got everything in one episode and guess what?

The next one we will hang. We're gonna have more. 'cause this industry just can't stop innovating. After we stop recording, there's gonna be more news. Um, I do want to give a special shout out and uh, I. Believe I've shouted him out before. Paul Trapani, thank you for your comments consistently on Spotify.

That really helps. If you're listening on Apple Podcasts or Spotify, we would love a five star review if you haven't done so. It really helps the algorithm more than you think.

Also, Spotify wrapped is coming out. Now, and if, uh, we have made your top whatever, five, 10 list of podcasts, uh, listen to, please let us know, tag us or, uh, send us over, uh, your Spotify rap.

We'd love to, uh, see it and share it out. Yeah, we'd love to see that. I'm curious at least for everything talked about at denoisedpodcast.com. Thanks for watching everyone. We'll catch you in the next episode.

Addy Ghani

Host

Joey Daoud

Host