Ohneis - The Pattern
Tech, AI, design, and the hidden logic behind it all.
Ohneis - The Pattern
One-Click Commercials: How AI Video Changed Everything | with Charlotte
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Technology moves fast. Design makes it matter. AI changes everything. This is oh nice. Here is a number I want you to sit with for a second. A 60-second commercial, the kind you'd see on primetime television, used to cost somewhere between $10,000 and $50,000 to produce. Camera crew, lighting rigs, location fees, post-production, all of it. In 2026, that same commercial can be generated from a text prompt on your phone for under a dollar in about half an hour. That is not a marginal improvement. That is the entire industry flipping upside down. In the wildest part, the company that started the whole craze, OpenAI, the most talked-about name in tech, just abruptly shut down its flagship video tool and walked away from regular consumers entirely. So we have a technology that is genuinely reshaping how the world makes moving images, and the company that kicked it all off just left the room. Today I have Charlotte with me, and we are going to pull apart exactly how a computer draws a video out of thin air, what is really going on under the hood, and why this moment matters even if you have never touched an AI tool in your life. Charlotte, welcome. Glad you're here. Let's start with the basics. When someone generates an AI video and it looks jaw-droppingly real, what is the computer actually doing?
SPEAKER_01Okay, so I love this question because the honest answer sounds like science fiction, but it's totally grounded in math. So picture this. Imagine handing a sculptor a massive block of pure chaotic TV static. Just noise, random pixels everywhere. A diffusion model, which is the engine underneath all of these video tools, has spent months studying millions of real videos. It knows what a dog looks like, what fire looks like, what a coffee cup with steam rising off it looks like. And when you type your prompt, what it does is it slowly chips away at that block of static, step by step, removing the incorrect pixels until a clear image emerges. It's literally reversing chaos into order. But here's where it gets interesting for video specifically. A video isn't just one image, right? It's thousands of frames, and every single one of those frames has to be consistent. The coffee cup on frame one cannot randomly vanish by frame 50. And that consistency problem is what stumped everyone for years.
SPEAKER_00Right, and that consistency problem is kind of the crux of why video was considered the final frontier for AI. Text was cracked, images were cracked, but video kept breaking down because time is hard. So how did they actually solve it?
SPEAKER_01Mm-hmm, exactly. So the solution came from something called transformers, and I know that word gets thrown around a lot, but here's a way to think about it that actually clicks. A video is basically a flip book, right? Thousands of tiny pictures played in sequence. The old approach was to look at each page of that flipbook one at a time, which meant the AI had no idea what was coming later, so it would just drift. A coffee cup would morph into something else. Lighting would shift randomly, it was a mess. What transformer models do, and this is what Sora, OpenAI's model figured out, is that instead of reading the flipbook page by page, you slice the entire flipbook into small 3D cubes that span both space and time simultaneously. So the AI is looking at a little cube that contains, say, the top right corner of the frame across 30 consecutive frames at once. It can see the whole arc of time in that region. Tim Brooks, the lead researcher on Sora, described it almost exactly that way, like having a stack of all the frames and cutting little cubes from it. And that's how the lighting stays consistent, the objects stay solid, and the physics actually hold together. And by the way, if this concept of AI tools having hidden costs and constraints you never see coming sounds familiar, it actually connects to something we touched on in another episode. The $50,000 app problem nobody told you about. Because there is always a bill somewhere.
SPEAKER_00So the Flipbook Cube thing solves the consistency issue. But that still sounds like an enormous amount of computing power. Like if the AI is analyzing the entire video across space and time simultaneously, that has to be brutal on servers.
SPEAKER_01It is absolutely brutal, yeah. And this is where the second big breakthrough comes in: latent space compression. Okay, so think of it like freeze-dried coffee. Bear with me here. If you wanted to ship a hundred gallons of brewed coffee across the country, the weight and volume would cost a fortune. But if you remove the water, turn it into powder, ship this tiny little packet, and then just add water at the destination, you've moved the same coffee for almost nothing. That's what latent space does. Instead of doing all the heavy computation on raw video pixels, which is like shipping the whole hundred gallons, the AI compresses the video down into a tiny mathematical code. It does all its editing and generation work inside that tiny compressed state, and only at the very end does it decompress or add the water back, so you can actually watch it. The energy savings are dramatic. That's the whole reason these tools became cheap enough to put on a phone.
SPEAKER_00Wait, so when I'm watching a generated video, the AI never actually worked in the pixel space that I'm seeing. It worked in this compressed mathematical shadow of the video and then translated it back?
SPEAKER_01Exactly. The pixels you see are almost like a final rendering step. The intelligence, the actual creative work, happened in a much smaller, denser space. It's a bit mind-bending when you think about it.
SPEAKER_00And then there's the audio side of this, which I think is criminally underrated in these conversations. Because for a long time, AI video was basically a silent film. The visuals could be stunning, but the sound was either missing or completely out of sync. That changed with Google's VO3. What did they actually crack there?
SPEAKER_01Okay, this is the one I find most impressive, honestly. So the fundamental problem was that audio and video were always generated separately. The AI would make the video, then someone would bolt sound on top of it afterward. Which is why lip syncing was always slightly off, ambient sounds never matched the visual environment, it always felt slightly uncanny. What VO3 figured out was how to compress sound and video into a single shared piece of data, one unified mathematical code, and generate them together in what they call a lockstep process. They're not two separate files being stitched together. They are born from the same mathematical womb at the same moment. I know that sounds dramatic, but it's the right way to describe it. The result is that when a character speaks, the lip movement and the audio are not synchronized after the fact. They were never apart to begin with. And the ambient sound matches the visual environment because they were generated from the same underlying code. That's what ended what people were calling the silent era of AI video.
SPEAKER_00And Netflix noticed. The Eatonaut, their show, debuted what was reported as the first use of this kind of video generation in mass market television. Not for an indie YouTube project, for a major streaming platform with millions of viewers. That is a line being crossed that you don't cross back from.
SPEAKER_01Right, and it's worth being clear about what that means practically. We're not talking about replacing actors or rewriting entire productions with AI. What studios are doing right now is using it for visual effects work, backgrounds, atmospheric effects, the kind of stuff that used to require expensive CGI studios for weeks of rendering. But the fact that it's in a Netflix production means the quality bar has been validated at the highest commercial level. And that validation tends to accelerate adoption fast. I mean, once one studio does it and nobody notices, every studio is doing it within 18 months.
SPEAKER_00Which brings us to the plot twist in all of this. Because if the technology is this good, if Netflix is using it, if you can make a commercial for 50 cents, why did OpenAI just shut down Sora? The company that arguably started this entire wave, the company whose research on those 3D flipbook cubes we just talked about, they killed the consumer product. What happened?
SPEAKER_01So this is the part that feels almost poetic in a dark way. The word in tech is compute. It just means the raw cost of running these mathematical engines, electricity, servers, infrastructure. And video generation is orders of magnitude more expensive than text generation. When you type a question to ChatGPT, that's relatively cheap to process. When you ask a model to generate 30 seconds of photorealistic video, you are asking for thousands of times more computation. Sora was hemorrhaging money on consumer subscriptions. Regular people paying maybe $20 a month for a subscription were generating videos that cost far more than that to actually run. The unit economics were broken from day one. An OpenAI is already burning through cash at a scale that makes investors nervous. Keeping a product alive that is structurally guaranteed to lose money on every single user. At some point, someone in a boardroom does the math and says, we can't.
SPEAKER_00But here's what I find interesting about that decision. They didn't just shut down video generation, they pivoted to enterprise, meaning they're still selling the same technology just to corporations who can afford to pay what it actually costs to run it. So the technology didn't lose, the business model for regular consumers lost.
SPEAKER_01Totally, and that's a really important distinction. The compute cost problem isn't going away overnight, but it is getting cheaper every year. Hardware improves, compression techniques get better, the latent space tricks we talked about keep reducing the energy load. The trajectory is clearly downward on cost. So OpenAI pulling back from consumers now doesn't mean consumers never get this. It means there's a gap right now and other players are filling it. Runway, Kling, High Lore, these are tools that are right now offering video generation to regular users at price points that are actually sustainable. The gap OpenAI left? It was filled within weeks.
SPEAKER_00Because I want to put a number on this for people who run a shop, a restaurant, a freelance service, anyone who has ever paid someone to make a promo video, a professional production company for a 60-second brand video. You're looking at anywhere from $5 to $50,000 depending on who you hire. A decent freelancer might do it for a few thousand. With current AI video tools, you can generate a cinematic, fully voiced, perfectly edited, 60-second promotional video, not a rough draft, something you can actually post for under a dollar in compute costs in about 25 minutes. That is the economic shift happening right now.
SPEAKER_01And I think what people miss when they hear that number is that it's not just about cost, it's about iteration. When you're paying thousands of dollars for a video, you get one shot. You brief the agency, you wait two weeks, they come back with a cut, you have one round of revisions because each revision costs money and time. When it costs a dollar, you can generate 20 versions. You can test which visual style resonates with your audience. You can make one version for Instagram, a different pacing for YouTube, a different tone for a professional LinkedIn post, all from the same underlying prompt, tweet slightly each time. The creative flexibility is what changes the game, not just the price tag. I mean, a small bakery in a mid-sized city can now have the same quality visual content as a national chain. That is a real leveling of the playing field.
SPEAKER_00Although, and I want to be fair here, there is a shadow side to all of this that we can't just gloss over. These models are trained on billions of videos and images scraped from the internet. And the internet is not a neutral place. It has biases baked into it, about how people look, about what professional means, about whose stories get told. When an AI learns from that data, it learns those biases too. So when you ask it to generate a successful CEO or a family enjoying dinner, the output reflects whatever patterns dominated the training data. That's not a small footnote. That's something anyone using these tools for marketing needs to actively think about and correct for.
SPEAKER_01That's a really fair point, and I'm glad you brought it up, because it gets skipped over in a lot of these conversations. The flip side is that because it's so easy to iterate, you can also catch those outputs and correct them. You can specifically prompt for diversity, you can test your outputs critically before you publish. It requires intention though, the tool won't do that for you automatically. And I think that's the broader truth about all of this technology. It amplifies intention. If you use it thoughtfully, you get something powerful and genuinely useful. If you use it lazily, you get something that reflects whatever the path of least resistance was, and sometimes that path is not great. That's not unique to AI to be fair, that's true of most creative tools. If I had to leave people with one thing from this conversation, it would be this. Don't wait for the technology to feel comfortable before you start experimenting with it. The gap between people who are building fluency with these tools right now and people who are waiting for things to settle down. That gap is widening every month. And the beautiful thing about where we are in 2026 is that the barrier is low enough that experimenting doesn't require a budget or technical expertise. It requires curiosity and 20 minutes. The small business owner, the solo creator, the freelancer who starts playing with this stuff today, they're not going to be the person asking, how did they do that, when they see someone else's incredible content. They're going to be the one making it.
SPEAKER_00Charlotte, that is exactly the note to end on. You brought the right mix of technical clarity and real-world grounding to this. And that is not easy to do with a topic this dense. I want to say thank you, genuinely. For everyone listening, if this episode made the black box a little less black, do us a favor, subscribe wherever you get your podcasts, leave a review, it takes 30 seconds and it means more than you know. And share this episode with one person who has ever complained about how expensive it is to make a video. The age of the one click commercial is here. The only question is whether you're going to be the person pressing the button or the person watching someone else do it. We'll see you next time.