Denoised

Why Veo 3.1's New Insert Feature Changes Everything

VP Land Season 4 Episode 67

Addy and Joey dissect major updates to Google's Veo 3.1, including ingredients to video, video markup tools, and improved frame extension. They also examine Runway's new specialized apps for VFX, Netflix Eyeline's research on video reasoning, and NVIDIA's new DGX Spark.

--

The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.

Time for the AI roundup. 

Let's go 

Yeehaw.All right. Welcome to the other studio. Addy. 

Yeah. Hey, man. I'm on the west side. It's nice to be out here. A little bit cooler, nicer 

for LA Tech Week. Welcome. 

Yeah, thank you. We've got an AI 

event we're 

going 

to tonight. It is a zoo. All right. Big story this week. Moderate update to Veo 3.1 Yeah. Uh, so not a full, it's not Veo 4, but a lot of quality improvements and just sort of like product updates, 

incremental updates.

Yeah. All right. So the big updates in this one, uh, there's now ingredients to video. So this was something that's already in Veo 2, but basically you can, similar to like Veo 2 or Nano Banana, you can give it a couple of images, locations, people, objects. And it'll use them in the video. Mm-hmm. So that unlocks a lot more consistent characters or consistent locations.

Consistent things. So it just gives you a lot more control. 

Yeah. 

Over the output, 

as you know, with baking, uh, it's not the ingredients, it's the ratio of ingredients. So how much control do we have over that? 

Well, there was another demo that was interesting from Google, uh, where basically it seems like there's also.

And this is all inflow, specifically the, the Google web app that is sort of the portal to use Veo and that probably has the best options for Veo outside of, you could still do this in the api. Api, yeah. Other tools. But they had a, it seems like it's sort of like a built in annotation marker. So in the video in vo, there's a demo from, uh, Google's, uh, Twitter page that, um, they have an existing video and then they just drew a square on the video itself and said, add a person here.

Nice. And then just outta the person on the video. Yeah. So that's sort of like, that's, 

that's the best kind of guidance you could give it. Yeah. Kinda like 

the thing we saw with the first image hack, but now it's sort of built directly into the product. And you could do it in a video, not like an image to video.

Yeah. It's like you're modifying the video itself, like we've seen with Runway Aleph. 

Runway Aleph. Yeah. Wan does, Wan wouldn't have markup. I'm thinking of the Big Muppets, 

um, oh. Sort of workflow. But that's different. 

Runway Aleph. 

Okay. Yeah. That's cool. No, that's very, that's very good. 

Uh, the other update is like, there, there's, you know, quality of life now there's start frame and end frame that used to just be start frame only. Mm-hmm. So just, you know, more improvements to that. And then the other interesting one is there's now an extend feature, which is not new-ish.

'cause like other apps have extend features, but they sort of would just take the last frame Adobe that for a while. Yeah, they would take, take the last frame and use that as the first frame for the next one. But this post says that Veo 3.1 uses the last second as the driving force. 

Oh, so it goes further back in time.

Yeah. Which 

is cool because you always get that weird issue where you get that stutter working completely like. The video could completely just completely change. It's like, that's not what I wanted. Had zero context about what happened before. Yeah. And just be completely random. So the fact that it takes a full second Yeah.

Of the video as context is pretty cool. Temporal consistency. Exactly. Really good. Good job Google. You 

guys know what you're doing. I haven't taken this for a spin yet. I am really curious. Uh, I wonder if it's available on Freepik. 

I think some stuff, I'm curious, that's the issue, you know, with like what's available directly on flow and what's available on Freepik.

But Freepik would be in the same bucket as every other company where it's using it through the API. Yeah, I think, and looking at the Google's blog post, the insert elements, which I just talked about, where you could drag a box on video ingredients, no ingredients is your input. So ingredients is like

here's an image of a person, here's an image of something. It's 

like the basic components. Yeah. Yeah. That's like 

instead of a starting frame, it's like a couple of images. Got it. That's the ingredients. I think you can use that on the API. Mm-hmm. The insert is I have an existing video. I'm marking up my video and I want something to change.

Oh. 

Like I want to insert something or I wanna remove something. 

Yeah. I 

think you can only do that on flow. 

It's like video in painting. 

Exactly. 

Yeah. Yes. 

So I think that's just flow. I don't think that's an API thing. 

That's amazing. 

Yeah. So another reason also, I mean, flow is still the, the, the Google AI subscription is still like the credit for credit, cheapest way to use Veo 3.

Like the credit equivalent of the money you're paying there mm-hmm. Is cheaper than if you were to buy through the API. Really? Yeah. If you got a subscription to Google AI Studio, the math, if you're making enough stuff, the math comes out where at least it was, I don't know if it changed with the API rate dropping it, it was cheaper.

It was like half the price to just do it in Google's platform than to, uh, pay, pay for the API, 

remind me again, is, uh, Sora 2 like. Severely undercutting Google or are they at the same sort of level? 

I think they were a bit less, but I did the math 'cause uh, there's a Sora 2, API in ComfyUI. And so that's my easiest way to just be like, what's this cost?

Yeah. 'cause they'll just tell you the price right there. And so I cranked all of these settings up to the Mac. So I did like hd, the max duration was like 12 seconds and then with audio. Mm-hmm. And that was $6. For for 12 seconds. That's like a coffee at Starbucks. A little. Yeah. Yeah. 

Wow. 

Better not mess it up.

Here we go. I think if you were to do 12 seconds for Veo, you'd probably be a little bit more if you were to max out all the settings like Veo 3. So Veo would 

be over $6 if you maxed everything out. Do you think 

for that duration, I mean vo, I don't know if 3.1 increase the max duration. I think it's still similar max duration, maybe five.

It has the extend feature, but I think what it can generate is still. Maybe up to eight seconds. Mm-hmm. I'd have to confirm that I think some Sora stuff was undercutting, but then when you looked at the actual specifics, I think it's roughly the same. Not as drastically undercut as you would think. That was also using Sora 2 pro, which is the, the most 

expensive version of Sora.

Right. Which was the 

version if you got a subscription to Yeah. Open AI Pro or chat GBT Pro or whatever the, the $200 a month access. Yeah. Or we're accessing it through the API. Okay. Sora 2. Basic is what you get if you use a Sora app. Yeah. Which is not lower fidelity. 

Little bit off topic, but I, I don't know if you were the one that sent it to me.

Uh, there was a, like a YouTube video of a guy who was talking about, oh, Google. Yeah. They have Veo 4, 5, 6, 7, and 8 and 9. It's just a matter of when they're gonna release it. And I, I just, I just had to laugh. I mean, it sounded 

so silly. Everyone can, everyone can count numbers. Like yes we can. I'm sure there will be a Sora 20 at some point.

Sure. No, 

but no, his claim was like, Google, they already it Google already, they haven't released, it's like different versions of field. There 

was a weird thing with Veo 3.1 where it, it was like the weirdest. Companies, like the most obscure AI companies were trying to like be the first to announce Veo 3.1 and it was like, where's this information coming from?

Mm-hmm. And are you like leaking stuff that you shouldn't have been leaking? Right. By being an API partner? Uh, yeah. It's like leaving 

the iPhone at the bar kind of thing. Yeah. 

It's like is you just wanna be first to announce something about this, but I'm like, I don't, I'm not gonna trust you if you're the only one saying, oh, hey, we got the inside scoop on Veo 3.1.

Yeah. Like until it comes from Google, like I'm not gonna believe what you say. 

Do you think a lot of the Veo improvements are being driven from like the Aronofsky feature or the mic Michael Keaton feature? Like I'd imagine some are real filmmakers having an impact on AI development. 

I would imagine they're taking that into account.

I don't really know how much that weighs on the roadmap or, or these improvements. 'cause also, a lot of these things were things that existed in Veo 2. They hadn't loaded them into Veo 3 yet. 

Yeah. It seems like the arms race is not, well, it's still quality. Right? Like as we see with Sora 2, like the quality bar jumps, and then the next one will probably even further just feels like in a different bucket.

Frame.io. Yeah. Like 

I feel like Veo 3 is in its own class. 

Veo 3 definitely feels more cinematic filmmaker focused. Where Sora feels, yeah. More realistic quality. Yeah. Sora is more like anything. Make anything 

is like memes. 

Yeah. So I, I guess the question is like, is the arms race really in the quality side or it sounds like there it's a lot more on the workflow and how do you use it?

I think it's a workflow and the control. Yeah. 

Because I mean, I look, I, I've used flow, you know, a bit, but it's like still, I like, like workflow wise, I like something like comfy where it's just, it could kind of a bit quicker and easier if I just need to like, spit out a bunch of outputs or I just need to like plug a few things in.

It's, I still find it's the fastest way. Mm-hmm. Then a web interface where I'm like typing a thing and hitting generate and then waiting and then scrolling back. Yeah. And then trying to find the thing I made. Yeah. From a press version. And the, the organization tools with all of these are still kind of lacking.

Okay. So as a professional, you're still leaning towards comfy with like a Veo 3, API note. Yeah. I still just, 

like, I keep going back to Comfy and API, I mean, the problem is I'm paying for the APIs and so like, I'd rather use the subscription that I already have the credits on. So that's the, that's the, that's my personal hiccup.

Sure. 

Yeah. Like. I have the Freepik. I, I can keep going back to Freepik 'cause I have the unlimited image generation. Yeah. But Freepik is definitely the best. A lot out of the interfaces, but still, you know, not as good as, as running stuff locally. Yeah. I 

mean, uh, yeah, like we covered on the show, like, Joaquin doesn't pay us, dude do say this, but it really does.

Did that being like such a seamless aggregator of things. I, yeah, I, I mean, I started to pay for it recently after hearing you talk about it. That was a good idea. It was a good idea. I mean, even 

it was, yeah, like it was a smart idea to just make it unlimited. You don't have to worry about the, the credit stuff, right.

Yeah, and I mean, I, I keep hearing Freepik come up more and more as like a central aggregator platform for a lot of people using ai. Yeah. Very cool. All right. Other updates from Runway? They have a new thing called Runway Apps. Nice. So it basically seems like it's more of a, a, kinda like the chat BT apps or something where it's like a custom built workflow that just does one thing, but it, it's sort of a way where you don't have to worry about the prompting as much.

Mm-hmm. It's like someone, I, I think, I don't, it's not quite clear if they're making them all or if. It's an open platform that anyone can make the apps, but like for example, they have one that's like change the weather app. Yeah. So you just give it your input image and, and it just knows, like, okay, you tell what weather you wanna change it to, but you don't have to like get creative with the prompting.

You just like change it to snow and like behind the scenes it has the prompting structure to give you a good output. 

Maybe what's, what it's doing under the hood is it's turning your prompt into, uh, like it's noting the seed and all of the different values of the generation, and then it's creating like a custom preset and storing the custom preset.

I mean, it could even be like a lower rank adaptation of the Runway model that it's storing every time. Uh, it's, it's pretty brilliant. Like in, in a course of like a movie, right? You, you're gonna probably have 15, 20 different shots that need the weather change to the weather that's in the movie. So instead of trying to repeat that for all the 15 shots and get 15 different versions of it, you'd rather just use one preset and get it every time.

Yeah. And like and, and yeah, get the output you're looking for and not have to be like, let me reinvent how to prompt again to get the specific. Change that I'm looking for. 

Yeah, I would totally use this for a lot of the color grading stuff. I mean, not to sort of eliminate color grading altogether, but at least get it in the ballpark of what I was thinking, what the color palette should be.

Yeah. Some of the other ones they have, uh, out change, background change, time of day, relight scene. Yeah. I mean, I think it's a clever idea. Uh, I'd be curious. I'm trying to see the demos. I'd be curious to see, you know, 'cause it's like, uh, if you were to try to do, if you're like, I need to. Change this to rain and also like change it to daytime and you'll keep reprocessing a scene.

I feel like it would just keep warping more. 

Yeah, exactly. It'll degrade over time. Yeah. Yeah. Because 

the issues I've had with Runway Aleph is like it depending on the type of shot and what you give it, it sometimes changes too many details. 

Mm. Well, maybe Cristobal, you can come on the podcast and tell us 

how it works, tell us how 

it actually works.

But I think it's a clever idea where it's like, okay, I just need to do one specific thing, and someone's figured out a workflow that. Does that one thing really well? Yeah. I don't know to like, try to re-figure out how to process this. I gonna say like, strategically, 

Runway seems to have like a, such a unique path in the way it's approaching the tool sets.

Google is approaching it in a way that I think a normal filmmaker that's, you know, that that's used to a pen and a marquee tool would use it. Mm-hmm. But Runway is approaching it in a way that a non filmmaker would approach it and sort of break down what are the biggest. Challenges with filmmaking and how do I resolve it with presets or tools or workflows.

And, uh, they all sort of have their own DNA. Like if you look at, you know, Luma, right? Ray3? Yeah. Like they're solving a certain set of problems in a different way. Uh, and then Runway a f and now this, uh, you know, they're solving it in another way. Nobody's wrong here. They're all sort of chipping away at the bigger problem is like, what does filmmaking of the future look like?

Yeah, which is fascinating that we get to live through this era and actually see all of the stuff get built. 

Yeah. I mean, maybe, you know, this is just sort of an app thing you run now, but maybe it's a node in the future, and then you've got your clip and then to what I was saying before and when it gets good enough where it doesn't warp the video.

Yeah. You can have a stack where it's like, oh, run these clips through the change weather node to the relight node, to the, you know. Change the lighting to daytime node. 

Yeah. Also, maybe you can mask out like unaltered frame here, like don't touch the frame in this region and just do it this region and so on.

Yeah. 

Yeah. Also, just imagine the point when this does become kind of real time and you're just like, you're just have your video play back. You're like, ah, more weather. And it changes there like a slider and you're like, eh. Less rain. Right. Turns the rain off and then 

it'll be in DaVinci Resolve, like everybody will have it for $300.

It's crazy. 

Yeah. As you're running it on your DGX Spark, which, uh, we'll talk about in a second. Yeah. 

So I, you know, I follow Eyeline closely. Eyeline is, um, the VFX side of Netflix as a business. And look, recently they've completely hit like a big reboot button where they're, uh. Previously they were more of, more or less a VFX vendor.

They used to do shots and things like that, and now they're like a proper research arm. Yeah. As well as, yeah. Their RD arm. Yeah, like their RD team is really strong, and a lot of them are here in la. I think they're spinning up a new team in Korea as well as Vancouver. Uh, Paul Debevec. Mm-hmm. Who's at Eye Line?

I think he's the chief research officer there. 

Yeah. I think that title sounds correct. 

Who is on your panel? Yes, so he's on, uh, he's one of the names on this paper. It's called VChain. The title is Chain-of-Visual-Thought of Visual Thought for Reasoning in Video Generation. Okay. So I always get to do this research papers down, and I'm the least qualified person to do research papers.

I don't have a PhD. I barely have a bachelor's degree, but let's do it. So when a video generation occurs, uh, you give it the text. Then that goes into a clip encoder. The clip encoder is aware of what elements are needed in the video generation, and then it also populates the temporal, latent space. So you have the elements, uh, as well as a timeline of when those elements are appearing.

Now. For better or worse, that's a random draw. Mm-hmm. Right? Like, uh, the example they're using here is a rock and a feather, uh, get dropped, and then the feather slowly kind of does the Forest Gump feather thing and kind of lands gently and then rock just goes ta. 

Yeah. 

So if you give this to, if you give this prompt to the video generation model, it'll give you a completely different outcome than you, like.

The feather could just fall straight down, like it would have no notion of a feather falling elegantly. So VChain introduces. Reasoning and a level of IQ to the video generation. So it actually has a inference time rendition of what is happening and being able to correct it as the frames are generating.

So if you tell it, the feather needs to have like a sway to it as it drops. So after, let's say it generates 10 frames, it'll go back and say, actually the feather needs to slow down here for a minute. And then go to the left, and then it'll keep going. So it's 

like looking backwards to check its work as it's generating.

Yeah. And it's introducing a level of, uh, almost brain power to the video generation in real time as it's inferring. It's like 

a reasoning video model, sort of like with the, the reasoning models. They're like generating stuff, but then they're kind of going back and checking and like thinking like, oh, is this what I should have been doing or looking at Exactly.

For, for video instead of an LLM. Yeah. 

It's, it's much more sophisticated than just like going from, uh, you know, noise to diffusion to, you know, uh. Latent space, then decode. It's certainly doing that. But then if you picture like a, like a. Element of brain power, that's like supervising the whole thing as it's happening.

Mm-hmm. It's still in the research phase now I'm sure they have a version of it kind of up and running at eyeline that they're playing with. 'cause this stuff was generated and shown in this research paper here. But it's really interesting work. And this, this follows up on the, uh, go with the flow research paper that we covered on the podcast a few months ago.

That's right. Okay. I, I was like, I remember something sounds familiar where we, it would check itself again. So 

that was. Using something called warp noise where you know, it's not just denoising noise, like the name of our show. It's not just about Denoising noise, but it's about warping the noise into shapes that then the denoiser can pick up on.

Okay. Right? Yeah. So that will help you really dial in motion correctly. So VChain also is all about control and controlling motion and temporal domain stuff. Um, so all of the research, I think is pointing to, uh. Like a area where I, I don't think like somebody like Google or Sora would have figured out because eyeline is directly connected to filmmakers.

Mm-hmm. Like more so than Google or Sora. So they would have the need for the highest level of control. 

Yeah. And this, this looks like a step in that, right, in that direction. Exactly. Yeah. I'd be curious what, 'cause I mean, it's not a, it's a way for the models to work. So I'd be curious. Like you, you would.

Attach this to a model or just hope one of the models incorporates this into the way that it generates? 

Uh, that's a great question. Yeah. It's not, I mean, I haven't read the entire paper, but I feel like this is some, this is perhaps an LLM engine that ties into video generation models. Yeah. So theoretically, this VChain node was available in comfy.

Maybe you can use it with Oh, I see. Video model. 

Yeah. Basically I'm trying to figure out how do we use this now, because this sounds pretty cool. Yeah. How do we use this to improve, uh, video generations? Yeah, yeah. Like, could you attach this to like a Wan 2.2, like workflow kind of thing? 

So it would, it would tie directly into the case sampler where, you know, frame by frame the denoising happens, and then let's say after 10 steps, the VChain does something to correct it, and then after 10 more steps, it'll do something else.

Mm-hmm. And so on. 

Yeah. So we'll see. That'll be cool. All right. And then, uh, last one that crossed my radar was, uh, the DGX Spark, which we have the little gold box. The little gold box, which we've talked about quite a bit. Yeah. And was first teased at CES. Mm-hmm. Uh, it is now shipping. Nice. Finally. Yes. How much is it?

A good question. I don't know what the pricing, actually I've heard five grand, but I don't for starting, lemme see if there's an actual price posted on it. But the thing that got it popped on my radar because ComfyUI, 

oh, I can't imagine how nice posted they would run. They're running it native 

support, four grand, starting at four grand for a four terabyte.

I mean, that's a bargain because you're looking at the high-end GPUs and video makes. Mm-hmm. At four plus grand. Price range, right? Yeah. 

See how Comfy has a blog post saying that, um, they're supported on DGX Spark. And let's see, performance wise, it doesn't beat a full desktop with a 59 D, but it can run models and workloads that are too large for even high-end desktop systems.

Uh, yeah, because it's on the Blackwell architecture, which is their next gen architecture, it probably has a higher memory than. RTX lines Uhhuh. Yeah. But, but the clock speed on RTX is probably faster. Mm. So although you can you, the problem with most comfy instances on most people's desktop is that their GPU can't store the entire model on its own.

And then when you have to then split that across two different. Clock rates or cycles, then now you're slowing it down by half, right? Mm-hmm. Yeah. So you wanna be able to load that entire, like, uh, that whole thing onto that. Yeah. Wan 2.2 I think is like 20 plus gigs. 

Mm-hmm. Yeah. 

At least, right? So most GPUs memories is not that high.

Yeah. 

But Blackwell architecture, I think is 

okay, but not so it could do larger workloads, but not as. Fast. 

Uh, yeah. I mean, I'm clearly guessing here, depending on what the work, g GPUs are driven by two things, um, clock rate and memory. Mm-hmm. So, uh, you know, clock rate, those are when, you know, when gamers overclock their GPUs, that's what they're doing.

Right. Is they're just making the GPU just work faster. Mm-hmm. Even though it's not. Finishing like a, a cycle of computation, it'll just keep going to the next one. And then that's what overclocking is. So that clock is directly correlated to how much power it has access to, how much cooling it can do.

'cause you can melt that thing easily. Right? So a little box like this is probably not gonna have a high clock rate because it's a low power device. It's probably not gonna be cooled as well as like a desktop GPU with a giant fan on it or liquid. Cool. So that's my guess. Um, but. Having said that, it's still gonna be way better than any little box that you buy of that spot.

It's the best, the best little box. Yes, exactly. 

All right, cool. Yeah, I'm curious to see that. The Comfy blog said that they'll have benchmarks at a future blog posts, uh, so I guess after they test it, but yeah. I'm curious. I mean, that seems like the most obvious for the stuff that we're involved in. What else would you, could you use this for?

Yeah, I 

just wanna give a quick shout to our friends at Nvidia. If you wanna send us a DGX box, we'll definitely take it for Spain. 

We'll mess around with it. Yeah. But yeah. What, what else? I mean. 

What else would I use DGX 

for in the filmmaking world? What else could you, could this excel at any, 

anything I would imagine in the traditional computer graphics world would benefit from this.

So, you know, if you're an artist that's using Unreal Engine, realtime Renderers, you have a giant desktop that's fricking loud, right? Why not running on this thing and you could be on site. Mm-hmm. At the shoot. So like, uh, virtual production, real time, little shoulder, it can have way better visual.

Visualization on it. Yeah. Yeah. You know, anywhere where you need localized GPU Power. Yeah. I would imagine this thing would fit the bill. Light Craft is a good example. Maybe they can run higher fidelity stuff on there, on that 

or, or process the stuff afterwards faster locally. 

Yeah. And then even, uh, stuff like Codex, like compressing and decompressing videos in real time.

A lot of time that happens on site. That happens on giant workstations, where now you can probably do a lot of this stuff fast. Again, I'm just speculating here, but, uh, when you have a fast GP on a tiny little box mm-hmm. There's a lot you can do with it. 

All right. That'd be cool to check it out. Yeah. All 

right.

We're 

good? 

Yeah, we're 

good. Okay. Links for anything we talked about@denopodcast.com. 

Thanks for listening on Spotify. We had another five star review. Bam, bam. Here we go. If you're watching on YouTube, give us a comment. Give us a like, and hit the notification bell. Wow, I never thought I'd be saying that, but I'm saying that now.

You should hit the notification bell. 

All 

right. Thanks 

for watching everyone. We'll catch you in the next episode.