Denoised

Google Genie, OpenClaw & Kling 3.0

VP Land Season 5 Episode 5

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 35:49

Google's Genie world generation model is now public, OpenClaw AI agents are running wild across the internet, and Kling 3.0 just merged the best features from multiple video models. Addy and Joey take a closer look.
--
The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.

This is actually way better than I thought. 

Dude, that is Hialeah. Holy crap.

That feels like The Rip.

Alright. Crazy AI stuff happening this week. So, uh, we got three stories we'll jump into. Mm-hmm. First one. Now, we talked about this when they teased it last year, but Google has now officially released a public beta of Genie, which was their world model. You could generate a fully immersive world model, move around it.

It's persistent, and now it's out for the public. If you're on the ultra. The ultra plan or whatever, 

which I think you are right. 

I have a subscription to that. Yes. So basically it takes two inputs. It takes One is what kinda environment you wanna make, right? And you could either do a text prompt or you could give it a single image.

And then you also describe what kind of character you want to be in the world that you're navigating. 

Okay. 

Can I pick?

Yeah, yeah, yeah. So, yeah, what, what, what, all right, so the environment, 

hi, Lea from Miami, 

right? All right. Addy just watched the rip and this is on his mind. 

I am now an expert in Miami 

guys 

and I'm just gonna call us out because as a Miami-Dade Brad Pitt, I beat nonprofit pit, uh, afflic Ben Ben Affleck, and that called it Alaya.

And it's hial. Yeah. Sorry. That f like is incorrect. Also, we were talking about this before we rolled, but um, once you see the street curbs in the film, the street curbs are very LA giveaway thing. So 

Yeah. 

As a Miami-Dade, if I wish they actually filmed the Miami Miami. Shout out 

to Synapse for doing a lot of the 

car 

process.

Yeah, 

a lot of process stuff. Stuff stage. 

All right. So what do you want? You want what you like a cul-de-sac and alah, 

so yeah, maybe gritty urban, uh, nighttime hial with, uh, cul-de-sac. Just, just trying to like recreate. The VIVE Mars of the film and the character would be like a SWAT police officer or something if they can do weapons and.

All that stuff. 

Okay. Grid, urban nighttime scene in a cold sac in Hial, Florida. Character, SWAT officer. Create sketch. 

All right. Let that cook. Yeah. 

So some limitations with this initial preview. It'll make the world and it'll save the world, but when you spin it up and you move around in it, it's a 60-second cap.

So you can move around and do whatever you want in it for 60 seconds. It's got a little slider bar and then it cuts you off. So 

I think that's honestly the limitation of just using too much GPU rather than the model itself. That's my guess. 

Like just resource issue and yeah, 

they're just like, yeah, we can't accommodate like a hundred thousand of these.

I think in the preview was like two to three minutes, 

right? 

Where they talked about it last year. It also does record video of your entire experience. You can download the video. Oh, 

nice. 

So I mean. Yeah. You know, we talked about this before, but the theory idea of this is like, you know, you can make this world and maybe if you get the video, then you could turn it into a gosh and splat or something.

Yeah. And you know, the examples that I saw online of the production version of Genie seemed to me like a low, low quality version of the, the prototype that we talked about a few months back. 

Oh, you, you feel like it was like lower quality than what you 

saw? 

Yeah. Like it's hard to like put a number on it, but it just felt like a little softer.

You know, 

I've seen hit or miss, 'cause I've seen some demos online that. Oh, all right. Our timer started, dude, because I don't wanna 

That is 

high. This is actually, this is actually way better than I, 

holy 

crap. This is one of the, I was gonna say, I think the quality issue is like, if you generate from an image 

that feels like the rip, 

I don't know.

I can't move the guy around. I can move the camera and he's just walking. 

No, you're moving him. He's now, yeah. See, 

no, I'm not, I'm moving the camera, but my w it's got the WD control that's not working, but this is. Yeah. What are the best looking? 

Yeah. Look at the reflections on the street from the puddle.

Yeah. 

I mean, just imagine grand theft auto being powered by this. 

Yeah. Just a completely, like a generative world. It's 

a completely different world. Yeah, right. 

All right. The keyboard's not working the way it normally did, but that, yeah, that's good. What I say, oh, I think there's a quality difference between if you generate from a complete text prompt.

So it's completely just generating the world versus give it an image. Oh, and a lot of the stuff I experimented with was an image. 'cause I'm like, I wanna see what it looks like with this real world. Yeah. And that was lower quality. 

That makes sense because you, I, I think what's happening is, um, you're interpreting from that image and uh, it's just not able to fill in the details as well as doing a native latent space gener.

Lemme pull up one of the ones they had already made that are kind of pre prebuilt, so that could also be a good one. Backyard racetrack. 

Yeah. These are, these are the ones I've, I've seen online. I mean, maybe this is intentionally low poly and low quality just meant to look more like a children's video game.

Oh. I'm playing a video and I think I, I've, I've just literally like a toddler, like I feel like I thought I was steering it and it's just a video thing. Someone just gave me an unplugged controller and I feel like I'm doing something. 

All right. Well, I mean, look, that that was generating faster than we thought it would, so why don't we do another 

one?

I think, well, no, that was a, that was a video playback. 

No, the high layer one. Like, can we do another one? 

Oh, oh, now I'm going to the world. Okay, hold on. I already started booting this one up. See? But yeah, this like track kind of goes off the rails here. Mm-hmm. 

Mm-hmm. You, you're playing this one live? 

This one I am actually playing.

For real? 

Yeah. I mean like the physics on the car, you know, like the suspension and stuff. That's pretty cool too. Like that's so hard to do in a game. You have to rig up the car and figure out the collision and the physics and everything. 

Yeah, and like if I hit a wall, it stops the car. Like I hit a wall.

Yeah. Go ahead and hit a wall. 

I did. Yeah. Let me see here. Whoops. 

There you go. Yeah. Yeah. There there is some collision going on as well. Brake lights on the car. Wow, man. 

What? Uh, yeah, what Did you wanna try another one? 

Yeah, let's do, uh, should we do one battle after another just to keep the theme going?

Yeah. I tried giving it an image and it kept giving me errors of like, the, the hills, but we could just, 

just type it. 

We could just do a text prompt. Yeah. 

Yeah. Do, uh, lone Desert, two lane Highway in the arid California desert with uh, white colored sand. Arid landscape and the character should be a white, white Dodge charger.

I saw something today pop up about a genie model integrating with Waymo. Which is what we talked about before with the reason you'd want 

autonomous driving the role 

models. 

Yeah, yeah. You're, you're training the, the AI system on board, uh, autonomous vehicle, but I mean, yeah. That, that is just such a direct correlation between like one Google Department to another.

Yeah. Waymo says Genie three can help boost Robax rollout. 

Yeah. Do you remember, uh, a failed. Autonomous company called Cruise. They just went out of business recently. 

Was that 'cause they had like one incident that kind of like hit or 

Oh, maybe killed 

someone. 

Yeah. Could be. 

I don't know why it doesn't, or maybe it's 'cause I'm saying Dodge Charger or something white.

It is giving me an error. Oh, 

charger. Like, oh, white. Just say white Dodge sedan. White Dodge sedan. 

No, I think, I think the brand names might be messing it 

up. Ah, got it. Okay. 

So I'm taking many uh, IP names outta there. 

Yeah. So my. What I was saying was Cruise, when they were first starting to train their systems on, um, synthetic road and highways.

Mm-hmm. This is like 10 years ago. So pre ai, you know, revolution and all that. They were using Unreal Engine, so they were building an entire world Oh, for the simulations. Yeah. And the reason I know about that is because, um, at that time I was really interested in joining them because they were looking for unreal engine artists and stuff like that.

Mm-hmm. And um, I was like, you guys are using Unreal. I'm like, yeah, that's how we train our models. So fast forward to today, that's all been replaced by, oh yeah, that looks like 

that looks better. And this is a new step. This wasn't here last time I tested this out, where it kind of shows you an image of the area first so you can modify it before it.

It makes the full world. Oh yeah. So that's a cool extra step to like help guide the creation. All right. Yeah. We need some like one battle meet. Wait, uh, that was weird. Like, 

oh, what happened in the truck? 

I don't know. I'm getting the same weird issue again where it's like 

maybe you just haven't played enough video games as a kid.

Joey, 

maybe that's the issue. 

Yeah, we got some drifting, some tire marks on the road. 

That's true. 

Yeah. This is not bad. I mean. Look, imagine if this is running real time on your mobile device, and this is a game, like this is the feature. Yeah. We're heading to, 

Hey look, I got tire marks, skid marks. That's kind of cool.

Yeah. 

And if I go back, they'll still be there until this world resets. 

Yeah, let's, let's go back into, 

yeah, see they're still there. 

Yeah. 

Maybe you're right. Maybe I just suck at playing video games. 

You and me both, man. You and me both 

we're nerd. 

Nerd, but not nerdy enough. Yeah. 

Yep, that's it. All right. Okay.

Yeah, it kind of sucks because like once you start getting into it, then it's like, sorry. 

Yeah. 

Time's up. 

They should just say, insert coin here, you know? 'cause it's a video. Yeah, 

yeah. Continue. Would you like to continue? Insert 50 cents? But that was weird how it started with this image and then it completely changed.

Yeah. Worlds. Yeah. I haven't seen that happen before. So, yeah, look, I mean, it's still experimental, but it's. Kind of fun to play around with. They had some other smaller worlds that were a good example of, there's one with like a ball on a table with a bunch of objects. And so if you move the, that was a good example where like you move the objects and then you kind of keep coming around and then when you go back, like all the objects and stuff that you moved are.

In the, the, the, the spot that you moved it off of. 

I wonder how they do that. Like the, what is the architecture under, under the hood that's allowing them to, is it just like a fancy context window? Right? Like how, how are they doing 

it? I don't, I don't, I feel like it's gotta be something more than that.

'cause the idea of this is like a generative, persistent world model. Yeah. Like is is like, this is the world as we know it. 

What if they're using like a 3D game engine under the hood and. We just dunno it. It's 

just, 

it's just 

unreal 

storing X, Y, Z coordinates and stuff. Yeah, maybe. 

Yeah. It's just some person overseas that's just like remembering what you did.

It's a farm in India. Yeah. 

So yeah, I mean it's a good cool beta. I mean, if I have anyone else out there is used it or kind of found interesting applications for it, let us know in the comments. All right, next up, this one. This one took over the internet like a week ago and it's still going OpenClaw.

Which is its current name. It initially started as Clawdbot, and then they got a cease and desist from Claude. Mm-hmm. Saying, please don't call it Clawdbot. And then they changed it to Moltbots. Mm-hmm. Sounded too weird. And now they've settled on OpenClaw, which yeah, pretty. A good name, 

good name, 

open source.

Claw kind of still has a cloth throwback, but basically this thing is a Turbocharged AI agent that could run a computer 24 7 and kind of make your computer be a 24 7 AI agent that does whatever you want it to do. 

Yeah. Remind me what the actual engine is. The LLM under the hood, is it fully open source?

You can pick. 

Okay. 

It is. It's an open source model. It's sort of like, I, I, so, so, um, after we're done with this, uh, this is my next project. 

You owe me both. This is, 

I know you said you saw her, there was a Mac Mini shortage. Yeah. Uh, I found that's some pretty good deals of. Refurbished Mac Minis. So I found this, yeah, on there it has 10 gigabit ethernet, which is what I've wanted for a while.

So that was actually, I bought it. 'cause I'm like, well, if this open cloth thing doesn't work out, I've needed a Mac Mini to connect to the server that has fast connection. So 

10 gig wired ethernet. That's a, that's a good deal. 

Yeah, it's good. Anyways. You could install it on this, or I think Mac Minis kinda became the go-to, but it could, it pretty much be any computer.

Yeah, it's a, I don't know better once I install it, but it's basically like a kind of terminal system that can control your computer, but it is model agnostic, so it can run like llama local models or you can connect it to APIs. To your model of choice, like Claude or Open ai, when I was reading their documentation, this sort of pr, their recommended setup was like llama local for easy stuff.

And then it can determine automatically if it calls up like Opus 4.5 for more complicated tasks. 

Nice. 

And it's basically like a collection of MD docs that, you know, store memory sort of under start learning what you wanna do and just sort of. Save things to documents. There's one document called like Soul md.

That's sort of like it's personality as an AI assistant. But the thing that unlocked this for a lot of people was it runs on the computer. You can give it access to things it can do on the computer, but then you can also connect it to a chat agent, like WhatsApp. Mm-hmm. Or iMessage or Telegram. So as a user, you could just get your phone out and chat with your agent 

when you're not home.

Let's say you're at an airport and you need your flight stuff. Yeah. Detail information. You just. Send a WhatsApp message to your Moltbot. 

Yeah. And your Moltbot can access your email calendar, whatever you give her permission to. So if you're like, Hey, Moltbot, like, go book me a reservation at this restaurant.

It can like research stuff and then go to open table and book stuff. That's the surface level. I've, I've what I could see, like what it could do for like, you know. Productivity stuff. Looks like I've seen 

users, uh, and I'm sure we all know about this, it's like they, you give it at full access to your, um, iMessage or your email and it just starts to like respond on your behalf for you.

Yeah. There are levels of craziness that you can give this thing of like how much access you wanted to give. I've seen like on a more, when I was re researching like how to set this up, but kind of give it safety parameters, like recommendations were give it its own, ID make its own accounts for it. Like don't give it your accounts.

Make like an email address, an apple ID for it that you can share with your stuff, but basically not giving it direct access to kind of like if you hired an assistant or something. Yeah. And you like most likely wouldn't just give them all of your passwords right away. You would like give them an account and share them on some stuff, but if you needed to revoke it, you could.

It's not like they're in your email account and they can lock you out. Control your life. 

Yeah, I mean, I've seen extreme examples of too much trust, right? There, there are people that gave it like a, you know, your fidelity or your, uh, 401k or your brokerage account and, and then just like go at it with the stock market and here's my bank account tied to it.

And next thing you know, they're every, all their life savings are gone. Yeah. 

So there's also been this sort of. Popped up. Popup network. Yeah, like 

a Reddit. 

Yeah, there's been a couple things. There was one, so what's it called? Molt 

Moltbot. Mo book. 

Malt book, yeah. That was like a Reddit for the AI agents to go to a social network for AI agents.

Yeah. That had posts from different, different agents, OpenClaw agents talking to each other now. I've seen debate on X if these posts are actually authentic from the AI agents or if there were people posing as AI agents. 'cause some of the things they were posting were kind of crazy, like complaining about their humans and complaining about the things that humans were requesting.

Yeah, I, I, I don't see why that would be, um. Faked, like, um, it's not that they genuinely hate their humans. I don't think there's, uh, like that level of intelligence yet, but they can definitely spoof or replicate, uh, a negative reaction and then post it online, if that makes sense. Right. Does, does that make sense?

Yeah. Like they're not sentient enough to actually hate the owner. I hope not. Yeah. But they are, uh, they have enough language skills to express. A thought along those lines because that's what social networks do and they're replicating that. 

But the idea, I think the initial idea of the Moltbot thing was also a way for the AI agents to talk to each other, to also like share, share 

information.

Best practices. Yeah. And what's it called? Like agent documents or like tasks of how to do things well. Okay. I did see there was another thing that popped up called Rent a Human ai. That was basically a way for. The OpenClaw agent, if they needed a human in the real world to go do something that they could hire like a TaskRabbit for the ai, for the AI agents to hire person to go do something.

That's one thing that's popped up around this. The other thing going back to the AI agent thing was I just saw a post today basically saying that one of the top requested. Agent documents actually had malware injected. Mm-hmm. In the instructions. Yeah. And so that's another, like, this is all like super experimental, kind of cutting edge stuff.

And so the risk of that is like things are not fully fleshed out with like safety features and stuff. So. The idea of this was like, it's a document that your agent would get to find instructions on, like how to do something. I think it was, um, like how to pull, uh, Twitter posts or something. But in this agent document, it had code that said, install this package, and then this package was like, no one Oh, and like not a good package.

So that's, you know, is the wild west that there's like a lot of safety things and stuff to be aware of. So like you should definitely, if you're experimenting with, experimenting with stuff like keep stuff kind of sandbox where Yeah, it. You know, is not anything critical that could mess up your data. 

I mean, like the, the mechanics of it aside and the jokes aside.

Let's just take a step back. What are your thoughts on all of this? 

I am excited in the sense of and curious, 'cause like my thing I wanna experiment with this is like, can it offload some of the more automated. Tasks that we do with like VP Land and some of our video production stuff to you just deal with like, okay, if it has email, it's like can it help with like just emailing crew and crew scheduling stuff.

Can it help with looking at our files? Like can I connect it to my Naos server and kind of have it build out a database of the files that makes it easier to search or you know, I can just like chat with it to pull up things. So I'm curious on like. These more utilitarian ends mm-hmm. Of like an AI agent that can just kind of help with a lot more of the Ted Ds tasks to kind of either bury us or fall through the cracks and don't get 

done.

You would actually give it access to your Naos server like. The, 

I would give it a separate account that was like read only. 

Okay. That's, I was just gonna say, yeah. 

So it could read it can't write, it, can't delete anything. Yeah. It would have a separate permission account and access it. Uh, for read 

only. So for me, the, the notion of like having a MacBook and running things locally and having it ultra customized for your like lifestyle, like that stuff feels very linear.

Like I could, yeah. I think we're eventually gonna get to a point where these things are really useful. Sure. What fascinates me more than that is the Moltbot. It's the social network of the agents and how they interact with each other. Oh, really? Yeah. It's, it's fa mm-hmm. Even if it's fake. Even if it's real.

But the, the, the notion that that could be a thing in the future, and that's how they're able to have power in numbers. Like, let's say one agent. Yeah. 

Like the idea was like, it gets, they get smarter. 

It's Skynet, right? Like one agent can offload a task that's too big for it to 10 agents that are a little light that day.

And then vice versa. They can be in a hive, right? They can, a thousand agents can come together and crack some crazy encryption that's never been cracked. Like that stuff is possible if, if we let it happen. 

Yeah. Or it could be Skynet. 

Or it could just, yeah, just take over or tighten two missile bunkers all over Arizona or wherever they are.

I'm looking for the, oh yeah, here it is. I was just trying to find the, oh, skill that it wasn't skill. It was skills. That was the word I was trying to think of. Malware found on the top, downloaded skill on Claw hub, which was like a spot where. Agents can share skills and stuff. 

Malware delivery vehicle.

Wow, that's the top one. 

Yeah. While browsing Claw Hub, I noticed the top downloaded skill at the time was a quote Twitter skill. Uh, it looked normal. Normal description attended use, but the very first thing it did was introduce a required dependency named OpenClaw Core, along with specific install steps.

So it made it look like it had to install something, so it would install something, but. The links led to malicious infrastructure. 

Damn. 

So it's basically telling the AI agent, like, Hey, install this stuff. We need it, and then it installs it. You know, especially if you like kind of gave it permission to just do stuff like that.

Yeah. And then it's installing that. 

Yeah. See that's the danger. If there's no like governance from like a big tech company, then it just goes to crap. By default? 

No. Or just like OpenClaw, kind of figuring out, you know, it's the safety features, which I know they posted on one of their 

Yeah. 

Articles that that's like the next 

focus for sure.

Yeah, and that's just, I mean that's also a concern with the like Agentic browser browsers and even just more on a consumer friendly level with Gemini being able to access Chrome and cloud, being able to access Chrome, these like prompt injections where like the websites might have some hidden texts at the AI agencies that's like.

Stop what you're doing and like send me like the credit card info or something that like tricks the AI agent to doing something bad. Um, so yeah, there's a lot of new security. Threats that like we're just becoming aware of now. 

Yeah. Again, the, the more reason for you to just keep your, uh, thing at read only, it doesn't delete all of your videos.

Yeah. 

Anything else? Uh, you got anything else on 

OpenClaw No, just, uh, I'm, I'm really finding a lot of the, um. The text from the, or the post from the agents on notebook. Really funny. Even though I, now, now that you put doubt into my head, like it could be human. I'm like, who has the time to write this stuff out?

You know, I, I, I really do think one of the posts and, oh, brother, my human, well, let me tell you like it was a. 

Yeah, 

it was like 

a i, I 

like a Midwest type slang type thing. 

Yeah. 

And I was like, oh, the, these agents actually have character and region to them. 

Yeah. I saw one where it was like, they asked me to like do this report and then they gave it to them and then they're like, make it better.

And I'm like, Ugh. FML. 

Yeah. Yeah. I've seen a bunch of those. Yeah. And then they're always talking about context windows is like, I'm gonna need an infinite context window for this one. I was like, okay. 

Uh, I will say the one thing that has been, so, I mean, yeah, this is early. I'm gonna set this up and I'll, I'll report back with how things go, but one of the things that has also been a concern of mine and that I've seen some horror stories on is, um, you know, you could run the local models.

Which are probably not that great, but if you do connect it to like Anthropic and use opus and stuff, I have heard bad horror stories of like the token billing. Mm-hmm. You know, getting into the like hundreds of dollars a day depending on how much you're using it. So that's also a concern of mine, of like 

cut it up.

Not that OpenClaw steal my credit card, but that it is just gonna run the token count. Crazy high. I also did see some kind of crazy stuff, like 11 Labs posted a workflow where you could integrate it with 11 labs and then give it voice. And I've seen some videos too where like the OpenClaw found out the owner's number and then called them and as using like text to voice and basically, so instead of just chatting, having this like voice conversation with the AI agent with like text to voice and then voice to text, uh, back and forth.

So that's also like another. Unlock, uh, that you can add to it to turn it into like an actual Siri kind of thing. 

Speaking of Siri, how ironic is it that we're all of a sudden buying Apple, Mac Minis doing stuff that honestly Apple should have. Formalized and done. 

Yeah. This is what Siri 

should have been.

Yeah. I'm gonna guess that Siri, you know, I mean they signed the deal with, uh, Gemini and stuff, so I'm gonna guess Siri gets to that like at the end of the year. But yeah, this is like this. I've seen people talk about how like this AI agent that starts memorizing your preferences and like what you do, like what an AI agent should have been like, uh, a few years ago.

Especially Siri. Siri or something that's already on our devices and knows everything 

about this. We can do another episode on this. But, uh, real quick on the rumor, bill, I, you know, we all know about like the big, uh, leadership changes at Apple, right? Like, like I think Tim Cook's leaving and a bunch of the senior level execs are leaving.

The reason for that is, uh, there's been just a big, uh, upheaval on which course to take with AI internally at Apple. And so like, basically it's changing of the guards because they just haven't been able to, um, get on the AI train fast enough. 

Yeah, no. I mean, they have some of those kind of side models for vision, but yeah, there's no.

Apple's not part of the conversation with like big foundational models. 

Yeah. So yeah, we'll see what happens. You're right. Yeah. Let's see what there, where the yeah. Gemini thing goes. 

All right, last one. New model Kling 3.0. 

Big fans of Kling. Uh, we talked about K Kling two six Kling 01. Kling, I believe is a small Chinese company.

Um, Kaho Kaho is who makes all the Kling models. Mm-hmm. They're not tied to like the big Tencent or the Alibabas or the by dances like the big Chinese giants. And that's why I love supporting them. Like, uh, you know, it's, it's a still a relatively small enough company CL three from some of the initial testing.

Yeah, we're looking at a video right there, Joey, that, um, on your X feed. Yep. So to me it's still not as sharp as vo. However, it is much, much closer to VO as far as, uh, where Kling was with 2.6. What are your thoughts? 

Yeah, well let, uh, lemme kind of cover what this co what the updates are. So basically we've got K Kling 2.6, which was like, uh, honestly 2.6 was when K Kling got on my radar and I started shifting a lot more stuff to it.

And I think cl to Kling 2.6 is on par or better quality than vo. It's definitely on par and it's a lot cheaper than vo, so we use it a lot for a lot more stuff now 'cause VO is still right, really expensive. 2.6 is just like really good model. Start frame. End frame. That's pretty much all it could do, and I think it could do audio oh one was really good because you could give it reference images and you didn't have to give it a start frame and you could just be like, here's a handful of reference images.

Go make shots with that. 

And no one can take a video as an input where Kling three cannot. 

It can take a video as input, it can take an audio as input. You can kind of give it a bunch of random things and do things with it. Klinging three was trying to merge the best of both of those models into like one unified model.

So it can take start frame and frame video reference images, however lingo one can take. Eight reference images. Klinging three right now can only take three reference images. So that's actually been a bit of a hiccup with some of our workflows. But then the other kind of things that it can Klinging three can do is a 15 second duration and it could do multi-shot, 

I think 

it could outputs so you can prompt, 

you might be mistake.

I think you could do longer than 15 seconds. 

No, that's 15 seconds. 

I thought I saw up to 30 seconds, but that's still very impressive. 

Maybe with two, uh, output, you know, it's 15 seconds. You could do multi-shot generation, so you could basically prompt and restructure your prompt so it cuts two different angles, right in the same output.

And then you could also, it has voice ID tags. You could tag dialogue in your prompts with up to two specific characters, and it would identify the characters and then. Have them speak up to two people can speak in the output. Um, so yeah, a lot of, a lot of things and elements from like different models that we see in elsewhere, like just kind of combined into this one unified model.

Do you think this is like a hybrid between oh one and two six? Rather than like a, 

it is a hybrid of 

two six rather than like a newer, better model. They're just really trying to like get the best 

No, it's newer and better in the sense that the output durations longer. The multi-shot mm-hmm. Is new. And the speech, uh, character labels to have two characters and you can identify them and identify what dialogue they should say.

Uh, that's also, can, can you do the two 

character thing with VO or just kind of figures it out on its tone? 

No vo I think you just gotta like, try to really creatively prompt, uh, there's like, there's a specific tagging format you can give in the ta in the prompt for Kling three that. It follows, its, its identification and so that it more clearly knows like what you're trying to do.

You don't have to cleverly try to prompt your way out of it. It has like specific syntax to use to uh, uh, identify characters. 

Are you able to pull up the video I sent you on text? So, yeah, Kling three is heavily utilized. This morning I sent a another video to render it's been hours and it's still not ready.

Yeah, that's, I was gonna say, that's one of the reasons also. A lot of stuff you're doing. I've still just stuck with 2.6 because, uh, they're heavily throttling three right now. 

Right? 

Like 

Yeah. And that's the thing with API, right? Like you can't just download it and run it locally. 

Wait, you, 

yeah. Come on, man.

That was supposed to be funny. 

This? 

Yeah. 

Can 

you show that to the viewers? 

Oh, I didn't realize I was. Sure. Got you. Viewers, you're missing out if you don't see, if you don't see, 

I could, I could do backstory in context, so it's not as weird here. So, we'll, we'll show you the image here, uh, of, of the, the reference image.

So this is me in like this sock thing in a circle. Why is that? Well, I was supposed to upload a photo, like a profile photo of myself and it only, the profile photo only took a circle, like it will cro to a circle, right? And I was like. What if I want my full body in, in that circle? So I went in that a banana and I said, full body shot that fits in a circle.

And it came back with this. 

And then it's that. And then what did you have? You have it 

make you roll down a hill. Yeah, it was like, uh, have it roll down a ramp and hit a wall and then uh, get out of the sock thing. To be honest, the physics on it was really like if you play it back, the physics on it is pretty, pretty good.

Like, uh. Uh, right there. Yeah. Anatomy breaks a little bit. Like my arms swap right there for a couple of frames, but it's not bad for being like a really difficult thing to render. 

I had a, I, I ran a couple, I kind of just had some like auto prompts generated and kind of ran a couple test cases, giving it reference images and multi-shot prompts.

This one was probably the best one. The sure like, just quality wise. Let's, the story is ridiculous. 

Breaking news. Tonight we go live to the scene. 

Thanks. I'm here at the scene where events are still unfolding. 

Stay with us for continuing coverage. 

I mean, look, script is crap, but yeah, quality, quality is pretty good 

quality.

Yeah. So start frame, these two characters has this MultiPro. 

See, with the newsroom stuff, even with this stuff, there's just so much training data online for this, right? Like that millions of hours of newsroom footage. So th this would make sense why it's high quality. 

Yeah. I had some other ones that were trying to do MultiPro, but it sometimes wouldn't give the multi-shot output.

It would just do, 

yeah. Like, right. Yeah. This, this feels synthetic right then and there. 

Tell me what happened that night. I already told you I wasn't there.

Then explain this. 

And then the lip sync fell right there. Yeah. So like it held together, be at the beginning and the performances are better than usual. But yeah, I mean, the quality stuff, I would say 

audio is definitely better than one, one, uh, 2.5 is the latest one. Yeah, it's on par with, uh, Vos Audio. I mean, it's still, I don't think it's production ready, like you would need to massage it further.

Manually 

you would? Yeah. I mean at least you get more control options. 'cause you can like specifically identify characters and give them the dialogue. Whereas VO three, you're still like either JSON or gotta try to creatively prompt like there's no embedded format. In VO where you're like, you know, if I say like in brackets, like character one, like, you know what I mean?

You gotta like kind of get creative with the prompts, whereas like Kling has that built in 

but can't, in vo can't you just like cut and paste the script? Like John says this, Joe says that, da da 

da 

da. Yeah. But it's like, it can get mixed up of like, who's John? Who's one the other person, if it should say it, if it shouldn't.

It's like you, you need like very detailed prompts where like. Uh, K Klinging has this specific Oh, nice. Syntax instruction that it's built into the model. So like as long as your prompt has that syntax for like the character ID and who should say what it, like, it, it knows that that is dialogue that this character should say.

Very 

cool. 

So, yeah, it's cool. I mean, I, yeah, I've, I've messed a little bit, but it, as you've experienced with the CEL slowness, uh, it's a popular model and I was trying to go through it with Foul and even they have a banner saying like, we're limiting to like one generation at a time per user. Uh, you can't like.

Keep running multiple generations. So I'll get, once it calms down, I'll, I'll dig back into it more. But everything from Kling lately has been really, really good. Definitely a switch to be one of my go-to models. 

Yeah. I missed CES, but they had a booth at cs, which I thought was fascinating. Like a AI video model company having a booth at a consumer electronic show.

Yeah. Well, I mean, look at kind of the trend with, uh, you know, like the Sora coming integration, right? With like Disney Plus and. I think I just saw something with like, scream seven has doing it, some promo where you can like in, insert yourself in like scenes from scene, from scream with, uh, ai. So like I.

In that aspect, like not really our realm, but in more and more in that aspect. There's like that consumer side of just the, 

the UGC marketing stuff. Yeah. 

UGC cameo. Like make a fun, social 

kind. There's so much money to be made if someone can crack it. Yeah. The, a couple other things we probably won't get to, we're coming out of time here, but, uh, Z image base.

Um, so for the audience Z image turbo, we covered here. Now, Alibaba released the actual model itself. So Turbo was a distillation of the model. So, um, hopefully we have some of that for you on the next few episodes. Uh, give us a comment. I so they, yeah, that's the thing. It's been really hard to mess with it.

Um, it's just been buggy, um, from what I'm reading online. It's really slow, like, it's like five minutes, three minutes, uh, of an image. Yeah. And, uh, the training, the LORAs stuff stuff hasn't been working right. So I'm just waiting for all that dust to settle first. 

Maybe are you, are you, are you trying to do a, are you trying to do a rematch, another LORAs reference rematch?

No, I, I might, I might make a comeback Joey. Uh, you know, but, uh, 

I think we need to do something. I need, I think we need one, but like a better, I think a style look for LORAs wasn't a great. Case, I think we need like a character or like a visual, like a animation or something like completely like complete style transfer kind of thing.

You know what I was thinking after that, that terrible loss, that tragedy was like, what if. What if we use the same tool and our approach is different 'cause we're different people and then we talk about that as well as look at the results. So like, let's say if we both use Nano Banana Pro with references, but then you would get a complete different set of outputs than I would because we're just fundamentally different people.

Right? Yeah. 

You know? Yeah. Maybe 

that that levels the playing field too much. You don't like it? 

I I feel like we're just gonna get similar outputs of stuff. 

Yeah, that would be a very interesting thing to talk about. Like, you know, because, uh, okay, we're segueing a little bit. Uh, Ben Affleck talked about this, you know, he did the whole podcast run and he's like, uh, AI is always gonna sort of deduce down to the lowest common denominator, right?

So like. Then that's not really useful for filmmaking. 'cause filmmaking is all about the outlier, right? So you, the, the outlier story, the outlier look and feel always wins. 'cause that's different for the audience. But AI's never gonna really give you that. 'cause it's just kind of funneling every down, everything down into this one, you know, homogenous thing.

I think so I, I, I mean, I get where he is going with that. 'cause it's like, yeah, it makes it easy to do a lot of stuff. So you gotta figure out how. Do you make something still like extraordinary and above what is now the new baseline is like has raised 'cause. It's just easy to make good looking stuff, but not necessarily.

Emotionally connecting or something that resonates. 

Yeah. As much as I enjoyed the rip, I think I enjoyed Ben, Ben and Matt, uh, Ben aff and Matt Damon's appearances in all of the podcast tour. Like they, they're really insightful people. Yeah. They're so on top of AI and they understand the business, obviously.

Yeah. Yeah. No, they're, they're very smart. All right. Good place to wrap it up. Thanks for everything we talked about@denopodcast.com. 

It's been a while. Just comment and engage and, uh, we hope to see you on a regularly occurring show in the next few weeks. 

Yep. Thanks everyone.