Denoised
When it comes to AI and the film industry, noise is everywhere. We cut through it.
Denoised is your twice-weekly deep dive into the most interesting and relevant topics in media, entertainment, and creative technology.
Hosted by Addy Ghani (Media Industry Analyst) and Joey Daoud (media producer and founder of VP Land), this podcast unpacks the latest trends shaping the industry—from Generative AI, Virtual Production, Hardware & Software innovations, Cloud workflows, Filmmaking, TV, and Hollywood industry news.
Each episode delivers a fast-paced, no-BS breakdown of the biggest developments, featuring insightful analysis, under-the-radar insights, and practical takeaways for filmmakers, content creators, and M&E professionals. Whether you’re pushing pixels in post, managing a production pipeline, or just trying to keep up with the future of storytelling, Denoised keeps you ahead of the curve.
New episodes every Tuesday and Friday.
Listen in, stay informed, and cut through the noise.
Produced by VP Land. Get the free VP Land newsletter in your inbox to stay on top of the latest news and tools in creative technology: https://ntm.link/l45xWQ
Denoised
The Topaz AI Workflow That Turns 8-bit AI Into 16-bit Cinema Quality
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Joey takes you inside Infinity Fest’s panel on Gaussian Splats and AI in virtual production. Learn how this technology enables lighter, faster 3D environments and how ETC upscales AI outputs to 16-bit EXR files. Plus, Addy and Joey compare AI tools like Nano Banana and Seedream for consistent image generation.
--
The views and opinions expressed in this podcast are the personal views of the hosts and do not necessarily reflect the views or positions of their respective employers or organizations. This show is independently produced by VP Land without the use of any outside company resources, confidential information, or affiliations.
All right, welcome back to another episode of denoised. We are going to focus on a bit of a recap of Infinity Fest. I hosted a panel there on 40 Gaussian splats and AI and virtual production. Addy cannot make it, so I'm going to fill him in and fill all of you in while we're doing it. And then we're going to talk about some of the other things, the updates we've been seeing in AI filmmaking workflows. You ready, Addy? Sound good? Awesome.
Alright, so Infinity Fest. It was last Thursday and Friday. Super cool event. A lot of really good panels, a lot of good talks focused on the panel that I've, that, that I moderated. Uh, we had a panel on AI innovations and virtual production, but that was the, the real thing that we were talking about a lot was Gaussian Splatting and 4D Gaussian Splatting.
Uh, we had Paul- look at you moderating big panels. Big panel. Yeah. Boy Cavalier himself.
Joey.
That was good. I'm just, I'm curious as well. I'm just like, I'm just gonna ask you questions I want to know as well. And there were like some good teasers at the end too of things that I, I wish we had more time to get into.
'cause I'm also still curious about them like triangle Guassian Splats.
Yeah. And, uh, I guess the audience, I mean, remind all of us, uh, what the AI connection to Gaussian Splats are because we, we tend to think of that as traditional photogrammetry, even though it's clearly not, it's
different, but it's also not AI as, uh, so on the panel as Paul Debevec, uh, from Netflix's Eyeline Studios, and then also Jason Schugardt, who's, uh, he works at Nvidia now, has a extensive history in VFX. Yeah, I know Jason. Yeah. And um, he's great. They made clear that Gaussian Splats is not directly AI, but it is. Related in that field.
So I think the correlation there is Gaussian Splats are not generative ai, they're not creating something novel from scratch. But the solve itself, how the, uh, point splatters the gian are, uh, correlated in three-dimensional space is using an AI engine.
Mm mm-hmm.
And yeah, also just to
like a short explanation of what this is and how it's useful. Basically if we kind of compare how we would rebuild things in 3D. The original kind of way that is still pretty standard is, uh, polygons, a bunch of little lines connecting everything and you're building out your shapes, but that can use millions or billions of a.
Little dots and polygons to build out whatever 3D object or space you're building Gaussian Splats. Yep. A splat is a floating blob with a, uh, variety of information inside it. And then if you have millions of these blobs with information inside it that can rebuild. Yep. Your 3D space and the advantage of this or 3D object, uh, the advantage of this is it can run much faster, much lighter weight than if you were rebuilding the same scene.
With polygons,
correct? Yeah. I think Gaussian Splats are still using a sort of quote unquote realtime rendering, like the fact that when you are moving the camera, it's calculating that novel view in real time. Mm-hmm. But instead of rendering per se, like with Ray Trace or you know, bouncing light off of surfaces the way traditional.
Computer graphics does, what it's doing is it's placing these blobs of color in the three space that corresponds to that object. And because they have a Gaussian splat, you can actually put them right next to each other and they blend in like a seamless object.
Mm-hmm. Yeah. And in between this too, there were nerfs, which kind of had a, a little bit of a moment, neuro, uh, radiance fields.
But the nerve, the thing with nerf is you had to process. The data and sort of, sort of got baked in. Is that, is that kind of correct? Where like, Gaussian Splats can kind of be run in real time because the data is in the splat?
Uh, again, that's beyond my understanding, but I, I think you're right in that nerfs are an outdated technology and most of the development work that's happening in this realm is now happening in 3D GS.
Mm-hmm. And the interesting thing was that are
not new. It was like a, based on a paper at I, I think 15, 20 years old, but they sort of just rediscovered that. It has this application in 3D space. Yeah. And 4D space, which we also touched on a bit too. So I mean, the idea of 4D Gaussian Splats is you are capturing not just the 3D environment, but motion in the 3D environment.
So people dancing or moving, and you're able to reconstruct that scene in 4 dimensions, so 3D space over a period of time, and reposition your camera, move your camera in that 3D space.
Yeah. So when you say 4D, you're not talking about Interstellar, like the end scene with, uh, I'm talking about
It's three dimension It's three dimension over time. Yeah. And so, right, like, and there was that demo clip I think we talked about on the podcast like months and months ago of, remember that clip of the guy sitting down, I think it was like a, a Chinese YouTuber and he was sort of sitting down and then it was like a 4D Gaussian Splat.
And the camera was sort of moving around him. Yeah. And then it kind of went a little bit viral in our nerdy sphere because people were saying, oh, it was generated from like one video angle. And then, um, yeah, it came out where it was like, no, they needed 20 cameras to pull that off, but it was still, you know, it was still cool.
But they needed 20 cameras.
Yeah. We, we never caught the BTS. And I think I was, um, like my friend Jim Alde kind of pointed that out. Yeah. Like, just so you know, this is as involved as traditional photogrammetry with like
50 cameras. Well, anyways, the company behind that technology, uh, they were demoing at Infinity Fest as well, uh, I think with a 4D View or something.
And they had the rig and that those demo clips set up. But they've been doing a really clever. Solve solution for building out a rig with a variety of PTZ cameras where they can capture 4D Gaussian Splats, uh, and that like a, so they're
not building the tesseract per se.
What was that? That's the thing from
the Interstellar movie.
I mean, I haven't
seen that in a long time. What was the Tesseract? Oh, come on.
I don't dunno if this is Carl Sagan. I dunno if this is some
real Carl
Sagan. Describe 4 dimensions. Using an object called the Tesseract.
Oh, that's right. That was in that, um, Black Matter Dark matter book as well.
Yeah. So like, you know, like a, a cube shadow is a square.
Mm-hmm. A
tesseract shadow is a cube.
Okay.
Does that break your brain? Yeah.
I'm trying to,
you can't picture it. I'm trying to wrap our brain. Can't picture it. No. Yeah. Okay. But, uh, yeah, yeah. When we talk 4 dimensions, that's where it goes. So, lupus back into why Gaussian Splats are important
for modern filmmaking.
Okay. So bringing this back into like tangible benefits and also, uh, some of the other panelists. And what this had to do with virtual production. Also on the panel we have companies or people select Fernando, uh, Rivas, who's the CEO of Bolinga. Um, I did a big interview with him at NAB, so there's extensive stuff on that, but he basically built Bolinga, which is an unreal engine plugin.
You can take a Gaussian Splat scan that you do, and you can make these. Through a variety of techniques, but we also talked about portal cam from X grid, which is sort of the latest device that kind of does this all in one and makes it easy to do. You could scan a room. We
covered that
here, right? Yeah.
Covered this podcast. Yeah. We're talking about, yeah. Portal Cam launch. Yeah. Uh uh, about a month or two ago, you could take your scan of a room, bring it into a unreal engine with Bolinga, they can load Gaussian Splats and load it into Unreal. And they also have ACES color management, and then you can push that into a wall and basically you can have your environment based on a real life environment.
Scan a room, scan a set. They've also been, you know, using that a lot as backups. So you scan a set from when, from production if you have to do pickups later, or if like season two gets picked up and they weren't quite counting on a season two, uh, or variety of reasons, you can take a scan of whatever space you want and put it on a wall.
And so that's like a very. Practical way. Also, they had a, a case study too of, um, that Volinga did where they scanned, um, Auschwitz, the concentration camp. And that one was they just did not allow anyone to film at Auschwitz. If they're trying to do any type of production, they did a scan and then now making that scan available to productions.
If they're doing a project around Auschwitz or that wants to use the location, they could put it on a wall and have that space, uh, digitally so they can film there digitally, but not disturb the actual site with production.
There's so many places in the world where bringing that place into a volume or into VFX would be super beneficial.
Mm-hmm. I mean, Auschwitz is one of them. Even something like the pyramids where there's all types of permits and time. Yeah. Or just limitations. Tourists.
Or tourist and convenience. Chernobyl,
you know? Yeah. Like the show.
Yeah.
Like if you were to recreate that in a volume, that's, that's probably the best way to do it.
And, and I think at the advantage over traditional photogrammetry where you take, you know, hours of video or thousands of images, is that there's a big solve process and a lot of it is hand, hand done. Traditionally uses, uh, reality capture, which is now acquired by Unreal Engine, and you're talking about not just running it on a super fast computer with a lot of GPU power, but an artist manually going there, cleaning stuff up, aligning things, and it's tedious work versus a gian splat, you upload the images and if you shot them correctly, it should more or less.
Quote unquote solve for itself. And then, um, you get this sort of super lightweight file back, which is like a hundred megabytes, not like 10 terabytes or whatever. Yeah. And that could then in real time. Kind of, sort of render it as your camera moves?
Yeah. I mean, other practical use, uh, you know, we would seen with light, a Lightcraft Jetset, you know, their, their value prop is you can use your phone as a camera tracker and you can load in your 3D scene into your phone.
So when you're filming, you can roughly comp in and see your 3D space. You could bring in blender and unreal scenes, but they're gonna be kind of heavy. But you could also bring in Gaussian Splats into. Light craft jet set on your phone and see this in real time on your phone. And because the splats are much more lighter weight, you get a better, um, experience when you're filming.
I'm curious to know what Paul Debevec said at the panel. 'Cause as you know, he's Yeah. One of the most respected folks in VFX today, so I'm just. Curious what he said. So
yeah, I mean Paul, he kind of talked about a lot of the basics that we sort of just covered now of like what is Guassian Splatting, both 3D and 4D.
He also sort of alluded to a paper that they've got coming out at Graph Asia on super resolution techniques, uh, to achieve extreme closeups from large volume capture. So we didn't get into more detail about that, but that's a paper that they've got. Uh, coming out soon. And they also sort of talked about some of the limitations of Guassian Splats.
So the tools to edit a Gaussian Splat, like a scan that you make, they're still limited. You know, definitely not as much as the existing 20, 30 year pipeline of working with polygons. Yeah. The other thing, the other limitation with Guassian Splats is they are ratings field representation. So basically the light that you captured during your scan is sort of baked into it.
And so also relighting, Gaussian Splats is trickier. Um, that's something where currently right now something like a combo of Guassian Splats lidar technology and traditional photogrammetry kind of comes into like rebuilding. a space if you need to have that flexibility of relighting, like that's something that Global Object specializes in, where they sort of combine all of these techniques.
If you need to have a space where it's an unreal engine, scene 3D, but has the flexibility of relighting, that's one of the limitations right now with caution and splats
for sure. But the advantage to having lighting is a lot of the reflections are super accurate.
Mm-hmm.
And it just looks like photogrammetry a lot of times.
Totally messes up. Like glass or metal, shiny objects, whereas Gaussian, splat are really good at that stuff. But then again, it's baked in.
And then the other thing he sort of alluded to. But we didn't get into details and so maybe we'll try to break this down in the future was, um, you know, we've gone from polygons, we've gone to nerfs, we've gone, now we're, we're looking at Guassian Splats, what's after that?
And Triangle Guassian Splats are what was alluded to, but didn't really get into details about that. And I have not looked it up. Uh, but I noticed some papers out about that. Uh, I'm guessing it is a combo of polygons and gagen splats.
Yeah, I would imagine tries would be more computationally efficient for GPUs because GPUs are really built for tries and quads, like the math behind, uh, just computing a million or a billion tries at a time in parallel versus a Gaussian Splat, I think is a sphere.
So geometrically it's more complex.
Okay, that makes sense. Okay. So
moving to tries would give you infinitely better reasons.
So it'd be like sort of the data stored in sort of, instead of like a fury blob, more of a triangley blob,
triangley blo blob. But that triangle is a flat triangle, which is uh, a natural unit for modern computer graphics.
Right. Okay. Okay. Yeah. Interesting. All right. That's my guess. We'll, we'll have to dig into this in the future. Yeah. Okay. And then he also, just to round out who else was on the panel and what they covered, uh, we had Ben Abergel, who is, works with ETC and is creative technologist involved in ETC. And, uh, he, we'll talk about The Bends in a bit, but, uh, he worked on Pathways and other project that ETC is producing and they were initially trying to do.
Some form of 4D Gaussian Splats with a few iPhones. Uh, I believe they recorded a scene with like four iPhones, but it just was not enough data needed to create a Gaussian Splat that you could like move around and still have, uh, good quality. So they shifted and they kind of shifted to a workflow. That was more something we've talked about before of like iPhones, like craft, jet set, people, real actors, green screen space, uh, and then styling the background with like a.
Kit bashed kind of, and real engine scene. And then styling that with video to video with AI later on. The thing, uh, going back to the original 4D Gaussian Splat that they're trying to do with the iPhone, I did ask like, how many iPhones would you actually need to successfully pull that off? And he thinks probably around 20 to 30 the same number that, uh, the other Gaussian Splat videos are using.
And then lastly, uh, Elle Roberts, she was the, uh. AI supervisor for The Wizard of Oz upscaling at the Sphere, or upscaling, not the word, uh,
\, I believe. Did all the work.
Yeah. Yeah. She works with them. She's the creative AI supervisor. And yeah, in broad sense, we kind of talked about the workflow and pipeline of what they did, but obviously couldn't get into too many details, but.
I mean, that was just sort of a whole different project in taking existing formats and sort of breaking them down from the original film and then rebuilding out each scene for the 16 k massive landscape of the sphere.
Yeah, it's a nice cross section of the industry on that panel. I mean, you have everybody from like tenured, you know, one of the foremost experts like Paul, all the way up to like you on the AI frontier side.
Mm-hmm. Gabrielle on the, you know. Sphere and location entertainment side. And then you also have, uh, Nvidia, you have Jason from Nvidia. Yeah,
no, it was really good. It was a really good, uh, panel. It's a really
cool panel. I'm bummed that I missed
it. Yeah, yeah. And shout out to Corin for assembling that panel.
But uh, yeah. I also wanna talk about one other thing while we're on Infinity Fest. 'cause there's another really interesting presentation about upscaling regular AI outputs, which are usually. Uh, 8-bit RGB 720, maybe 1080 output to 4K cinema quality, uh, 16-bit EXR files, which we've talked about how like O Ray3 from Luma, their cool thing is they can do that in the output in the generation.
But what if you don't wanna use rave three or you have other outputs? What is this workflow? And so this is from Yeah.
What if you just wanna upscale? You already have the generation done. Exactly.
And so this is a really interesting workflow where the, uh, that et see's other project, the bands, which is sort of this kind of, uh, cute animated short with like a fish, uh, deep sea fish underwater kind of traveling around and making its way to the surface.
They built out a pipeline using a variety of Topaz models. Uh, the local model has nothing to do with Starlight. And so basically they would take their source footage, they would run it through Nick's de no, kind of strip out the noise from it, then run it through, uh, Gaia uprights, which is one of their other models.
Mm-hmm. And then there were a couple other steps in the process, which I don't have in the screenshot that I took, but basically it was like three different. Topaz models. And then out of that they were able to get, uh, much more detail, much higher quality XR file, 16 bit that in the color grade and everything else.
They could just, they just had a lot more latitude to adjust the shots, match the shots, something that you would, you know, be used to if you were shooting with a cinema quality camera.
Yeah, it's really interesting work. Uh, one, the thing that fascinates me more about. Resolution, upscale is the bit depth upscale.
So, you know, when you talk about, typically eight bit is zero to 2 55, right? So if you have, you know, your sun in the frame, the highest value that sun will have is 2 55. Mm-hmm. Versus real world dynamic range. Like what our, I see, you know, is it won't be 2 55, it'll be like 1 million units. Right. Yeah. So like how do you get that crazy of a dynamic range and how do you create it artificially, uh, interpolated just from, I know.
Right from having this like kind of crappy source frame that you're able to like Right. Pull and generate all this data. Yeah, it was really impressive and I, I wish I had a shot of the waveform monitors 'cause they had before and after the waveform and that's where you could really see the latitude you get.
And also the other advantage is like it. Reduces banding a lot. You know, it's a banding. It's like just because there's limited if you're an eight bit Yeah, it's limited amounts of color space and so, right. Especially 'cause this short film took place in like the deep sea, so there's a lot of dark shots and so even with the kind of gradients and some of the shadows and the dark, you would see the like.
Banding lines and this upscale process. And you could see it in the waveform monitors where it went from like very rigid steps in the waveform monitor to a much more gradual curve, which is what you would want to see or normally see where you get more of a, a nice gradient with, with your color space.
And I think the, a lot of the. Sort of assumptions about having darker scenes with less light. You know, I'm, I'm thinking Game of Thrones, like all the scenes a game of throne, right? It just like you think that you don't need that big of a bit depth 'cause things are dark when in reality our eyes are actually much more sensitive to shadows and.
Differences in, uh, the darker lights and the regions that if you throw more bit depth at the darker regions, you actually get more detail out of it. Mm-hmm. So it still benefits the scene even though there is not a sun in it, for example. It's not a daylight scene. If it's a dark scene, you still need more bandwidth in the bit depth area.
Yeah. And also just when you're in the color grade too, and you have that, that that control and flexibility and, and this, and what the experiment they're doing too, is they're trying to really build out an AI. Generative workflow, but in a professional. Hollywood filmmaking pipeline. So like this entire process was supervised by an A SC cinematographer Roberto Schaefer.
And it, uh, you know, he, he's using his eye to analyze the scenes and like try to get what he's used to with shooting with an area Alexa or Sony Venice, uh, out of these ai. Outputs. And then the other thing they did was they did a grain film pass with photo chem and they showed some kind of comparisons of like, with grain and without grain.
Mm-hmm. And just, you know, adding the grain just makes it way better. Right. Way better. Yeah. For lack of a better word. Way better. Well,
one thing grain really helps with is banding. Like, uh, you know, there's a process called dithering, uh, which mm-hmm. Reduces banding. So it's essentially adding noise. Into the frame and grain is kind of noise.
So it does help overcome a lot of the limitations with AI having, you know, eight bed and low dynamic range.
Yeah, and I thought he said something really interesting too. 'cause like when I thought about adding grain to an AI image, you know, or even just a digital image part of it has felt like, oh, is this cheating?
Or is this, you know, is this doing something like. You know, faking it because like it's, you know, it wasn't actually shot on film, but we're adding film grain to it. But he said something interesting where he is just like, it, it doesn't matter what you shoot with, even if you're shooting this with digital cameras, like every sensor has a noise Pro profile.
Yep. Like they all, like nothing is shooting nothing super clear.
Clinical. Yeah.
Yeah. Like, and if you're doing, you know, even an, uh, an animated, uh, CG film, if you're doing outputs and it's just like too clean. Because like, that's the only way you could get generate something that like had absolutely zero noise or grain and it just looks too weird and too clean.
Yeah. And adding the grain in that process just, you know, helps sell the illusion. And it's not like a, it's not like, oh, you're adding fake film effect. It's just you're adding something that replicates, even if you shot it with a digital cinema camera.
Yeah. You know, the, the grain part of cinema is so important that a lot of cameras now have built-in grain adjustment.
I don't know if the red cameras have it just yet. Perhaps the V Raptor might, but the Aria Lexus for for sure, you can not only pick the level of grain, like a zero to a hundred, but also the texture of the grain.
Okay, and this is separate from like picking your ISO setting or whatever.
Yep, exactly. So this is something that the, the camera body will add after acquisition.
So it'll like kind of overlay it and I believe, um, if you're an re raw format, you can tweak it down the road as well.
That, that would make sense.
All right. So yeah,
those are the highlights. for me from, uh, Infinity Fest, I mean, a lot of the great panels, but, uh, I know we've talked about this for a bit, but those were the two standouts for for me.
Let's talk about some of the tools people are using. Alright.
What are you using? What have your been your go-to tools lately?
So I tend to focus more on image generation stuff over video generation. So I'm mm-hmm. Hyper aware of all of the nuances and the limitations of image generation.
For example, I use Nano Banana, like, I don't know, I make maybe 50 generations a day on Nana Banana, just to kind of figure out. What it can and can't
do. Are you doing, uh, when you're using out mat, are you bas are you doing like. Just straight text to image or are you giving it any input stuff?
I'm doing mostly image editing.
So, uh, I'm going through a bunch of, so I'm, uh, on the personal side, I'm building out a website and for the website, uh, I'm gonna take a bunch of images of, uh, real world places. Uh, for example, just if you think of the Huntington Library, right? Like beautiful place, if you just take photos of it. With no people in it, how do you bring it into Nano Banana and then add the people that you want, like populated with the crowd or hero characters and things like that.
So Nano Banana's been really good at that. You could give it a reference of each of the people that you want it to be filled. And then with, uh, like the markup tools, you can actually dial in where they'll be sitting or standing and it preserves the back plate really well, like it doesn't distort any of the actual photo.
And then when it puts on the new person or the object, sometimes I'll put a car in, the car will have the reflections of the environment, like an HDR almost. That's
cool. Yeah. Yeah.
So it's, it's doing a lot under the hood and I, I really have not seen an image model that capable. Doing that task yet.
When you said markup tools, are you using the markup tools built into Freepik or are you doing something else?
Freepik. Yeah.
Yeah. Okay. Yeah. And that works really well for, yeah, for pre precise notes and what you're looking to, uh, create for sure.
And then, uh, the other thing that I'm doing is obviously with website design and build, uh, and I'm not a, like a web builder, like I'm just doing this because I need it, it just needs to get done.
And I figured like just this world of AI that we always talk about, why not put it to the test and actually use it? Uh, so I've been using Ideogram 3 for a lot of the, uh, logo and text General. Oh, okay. And even motifs and stuff like design elements, uh, of the website. You know, like for example, if you have like a tractor or something and you want like a.
Hand illustrated version of that tractor, like you can put it through Nano Banana, get a tractor back, and then put it through ideogram and make a font in that style and have the, uh, text reflect the Lugo and live in the same sort of ecosystem.
That's cool. Have you done anything where you're, you've had an ai, um, mockup of the website and then give it to like Claude or something to make the website based on the mockup?
So right now I'm playing around with Framer AI. Framer is, uh, okay. What's that? Yeah, framer is like a known tool like Wix or Squarespace where you can build websites. Okay. Uh, so they have an AI engine built into it that's quite good at, it'll give you a skeleton of a website and then you can go in and you can, you know, add animation, add text, add another page, and so on.
There's just, there's just so much,
so much AI tools at your disposal. Have you done a comparison with Banana? Banana and Seedream? 'cause I feel like Seedream's probably the closest Seedream 4. Yeah. Where you can give it 16 or 20 input images all the time.
Um, yeah. How would say have you found any Pro would is more built for?
Quote unquote, production, like it has a more photorealistic output. It, it tends to understand, um, compositing better. It tends to understand, uh, like what I, what I just mentioned, like image-based lighting and sort of some of that stuff better. Seed dream, I would say is better for creative, completely novel outputs.
So if you're creating the world from scratch, I think go to Seedream and get a weird. Wacky view of the world that you've never could have imagined. Uh, versus if you already have a photo and you're just trying to do essentially Photoshop on it, then use Nano Banana.
I will say I've also found this tip, uh, a week or two ago from, uh, Henry Daubrez, uh, for Nano Banana, where, you know, sometimes I've had issues when I'm trying to give it an image and then change something in it, and then the prompting sometimes you like, change this or, you know, replace this.
He says that the best prompt is show me. So show me an aerial shot. Show me a side shot of that person, you know, show me blah, blah, blah, blah, blah. Uh, so I've been using that more and I've ha I have found that that does work pretty well. Oh, I love that. That's a prompting trick. I'll keep that trick in mind.
Yeah. And the, the, I mean, the other thing with like Seed Dream and Banana, banana both is like, you can prompt it like a thousand sentences, no problem. It'll like, yeah, swallow it up, you know, whereas I think there was like a 500 character or 500 line limit on the text encoders before. Um, so if you're in comfy UI land, if you use like T five or XXL for the clip encoder, I think those things have a 500 char.
Not 500 character, 500 token limit, whatever that means.
Oh, okay. Yeah. Versus
like Nano Banana, it doesn't really tend to have a limit. You could just keep going.
Do you, have you found like longer prompts work better? So
a lot of times I'll go into ChatGPT and I'll, I'll take a small prompt and I'll, I'll be like, Hey, can you boost the prompt?
That's literally what I say. And it'll come back with like a really verbose prompt. And then I'll go back in it and I'll adjust some keywords and then, then I'll have this like giant paragraph. I'll then cut and paste into Nana Banana.
Yeah, same. I use that for creating prompts. I'm like, I'm not gonna write along prompt.
Yep. Yeah. The other thing I've been seeing pop up a little bit more on X and I mean, now this could take us a grain salt because it's coming from, uh, this is from Emm who's the co-founder of Scenario Who that is a website that makes custom, helps you make custom models, training models, so. He's selling the pickax, but I've been seeing some demos of and talk of LORAs and if LORAs are still relevant and where they still come into play.
Sure. We've talked about this before of just like with now Banana cdre. Do LORAs still make sense to train some custom model to get the specific output you want consistently when you could just prompt it in something like flex context or. Not a banana or et cetera. Yep. And so this was a demo from him of turning something into an isometric view.
And basically his argument still for doing alo and a specifically LORA for Flux Kontext is you still just get much more consistent outputs at scale. And so this is sort of a demo of, uh, isometric view of a building compared to, I'm guessing this is probably just a text prompt with an input image. Yeah.
Where you get. You know, kind of looks like it, but if you're trying to do something consistent over a series of shots, it they'll, you know, you won't get that consistency.
Yeah. So the end argument is that you still need LORAs for the highest level of control.
Yeah. And for, I mean, so the, the thing I've also been trying to wonder too, 'cause like to train a Flux Kontext, LORA, you.
You don't just give it a group of images of what you want. You have to give it a, uh, pairing of images. You have to give it like input image. Output image. No, no, no. Input image. Output image. You give it a small
group of images, but you'll
have to caption them. No, I think you have to give it. Like input output pairs.
You have to go pairs to train it. Like this is what the input, this is the output. This is what, like you have to train it with a pair of images, not how do you know what output is not
if you haven't generated
it yet. That's the thing. You gotta make the output, like, so you have to modify, you know, 20 or so images to the output that you want in order to train it.
And so that's where I've been wondering, it's like, so do you just do that, you know, with. Flex context. You're not a banana yourself and just kind of refine it so it looks really good. So you have that pairing or do. Modify this elsewhere. That's the part I've been a little fuzzy on. 'cause with, with some of these models, you have to give it the pairing to train.
You can't just be like, here's, you know, not, you can't be like, um, Midjourney mood board where you just make a mood board of like a bunch of images that you like and then that sort of trains the model of what type of output you're looking to get.
Oh, that's interesting. I'm looking at, uh, Flux Kontext.
LORA training on Replicate
right now, according to Google AI overview, Flux Kontext LORA training involves preparing paired images start and frames of a subject, uploading them to a training service with unique trigger word and then running the training process.
Oh, that's interesting. I've, yeah,
so you have to like modify like 20 or so images manually.
To what you want. Yeah,
I mean
to train it flux do that is like
one of the OGs of image models, right? Like it was flux and stable diffusion and I think flux is, uh, built off of stable diffusion, if I'm not mistaken. And so the way to train SD XL and some of the older models was to just give it 50 or so, or 30 or so images.
With the caption and then, um, it'll train a lower rank adaptation, which then latches onto the foundational model.
Right. But
you're saying that now you need to have an output attached to an input, no caption needed. Mm-hmm.
And
that's your lower training?
Yeah. Like start point and point. Repeat 20 or so times.
But then how do you, and that's the data,
do you like get a style, like let's say you're trying to recreate like a very particular style of animation, right? Like black and white animation. So you give it a colored image as an input and a black and white image as an output?
Yeah, I would guess so. But it's like, my question was just like to make that black and white image, you just do that by hand.
Using whatever process you want. Yeah. I would go to Photoshop and just desaturate it or something. I don't know. Right. But I mean, if you wanted something like more stylized, so I, there's a project we're working on where it is like supposed to be an animation look. And what I've been doing was, I, I found that if you try to create the frame, 'cause it has, it has characters, it has like, you know, we have, we need to have specific locations with specific people in a specific animated style.
Um, I found that if I. Gave something like not a banana or a Sead dream, all of those. Commands like people place in a certain style, it would like melt its brain and it just wouldn't give me the thing in the style. But I found if I had it make a photo realistic looking image with the character and the place that like looked the way we want it to look, and then I've just been running it through flux context with a consistent text prompt of like, change this into a painterly, you know, oil based.
Painting style and then the output image looks pretty good and looks pretty consistent. I haven't found a need to train a lo on that, but I'm thinking I could, 'cause I have the start frame and I have the end frame. If I wanted to simplify the process or if other people on the team were doing it to make sure it's like a more consistent process, I probably could just give it the start frame of.
Photorealistic looking image, the end frame of the stylized image and, and train allure on that. Huh? It
seems like a pain in the butt.
Yeah. As long as you've been like, well, the text prompt works and it's like, as long as everyone just copies and pastes the paragraph of the prompt, and for flux context, like it, it, it works pretty well.
So why do I need to train a LORA? Yeah. So, you know, I'm still on the fence of that.
On the flip side. Uh, the, some of the like closed down models that are not open source, you know, uh, Nana Banana seed dance maybe. Um, definitely, you know, Chad GPT, there is no LORA training. Right. So the only thing it does have is reference image insertion.
Yeah. Or just try and to text prompted if you need to change the style. Yeah. And to, that's where chat, pt or cloud comes in handy where you have it write a very detailed text prompt and. The prompt is very detailed, where if you keep running it consistently, you get pretty consistent outputs. Yeah.
Yeah.
The, the jury is still out on which image model is the best. And I, I don't think we'll ever get to like, the most comprehensive image model ever. Uh, I think I, I,
I think it's always gonna be which image model is the best. For what you need to do. Exactly. And
now you're kind of seeing those lanes being defined, like I mentioned ideogram, right?
Like Ideogram has got the logos and text and font stuff down, like that's their lane. Uh, I think Nana Banana, the reason it gets so much attention is because it's just really good at photo realism. But that doesn't mean it's the best model. It's just really good at photo realism. And then you have seed dance, which is really good for creative work and sort of.
Animated looks. Mm-hmm. And stylized stuff. You know, I down the road six months from now, we won't be talking about. Model's the best. It'll be more like, what model are you using for X and Y and Z?
Maybe in Aputure episode we could revisit, 'cause uh, I keep seeing stuff from XX ai, whatever their video generator is called.
Oh yeah, pop up. It's on
version 0.9. Grok.
Yeah,
grok 0.9. I, I saw, I saw the Will will Smith eating pasta on done on that one. It looked pretty
good. Yeah. And I think, um, you know, it's probably getting popular again because I think with the back and forth, the boomerang of Sora, like Sora two comes out, do whatever you want, FIP and then a week later, complete u-turn of like, you can't do anything.
And now people are like, I wanted to do whatever I want. And then Grok is like. You can do, do whatever
you want here. Cut the doors wide open.
We don't care.
Yeah. And you gotta remember the whole Sam Altman Elon Musk beef, right? Like they're, they're gonna one up each other at every step of the game as, as we go on.
Yeah. Yeah. If only there was a, a, an Elon cameo on a rock. So you could have a Sam Altman and Elon Duke. Not on gr I'm sorry, I'm Sora. So you could have Sam and Elon. Duke it out with, uh, Logan Paul. Here's
an idea. You could do it with traditional VFX.
You could do that too.
Actually no, I take it back. You could do it with pretty much any open weight model that you just take reference images of both of them.
It's pretty easy.
Yeah. You could do it with, uh, one, 2.2 animate. I mean, has de problem
recreating images of that fight. 'cause today I just made somebody holding James Cameron. This. And, uh, as I was uploading James Cameron's image, I was like, are they gonna flag that this is a famous person? And it was fine.
It didn't do anything. Hey dude noises. So Joey and I had this idea, why don't we start answering our audience's questions. If you have a question about anything that we've covered in the past, certain AI models or what have you, or even stuff, stuff we haven't gotten to yet, submit a question at the following email.
Denoised at VP Land dot com. Again, it's gonna be on the text here if you're watching it on YouTube. Denoised at VP Land. Com and we'll get those signed.
It is VP Land because someone else is just sitting on VP Land and wants to pay, charge thousands of dollars for it. So don't, until the show grows way, way bigger.
Yeah, we'd love to hear your questions. Uh, we could do like a mail, mail room episode in the future. Uh, questions about, yeah, just stuff that's happening or workflow questions. Um, send 'em over and, uh, links as usual for everything we talked about@denopodcast.com. Thanks for watching. We'll catch you in the next episode.