Heliox: Where Evidence Meets Empathy

⚙️ P-1 AI Develops Engineering AGI for Physical Systems

by SC Zoomers Season 4 Episode 36

Send us a text

Explore further with this episode's substack comic and other resources included 

We're standing at the edge of something unprecedented in human history. Not another technological breakthrough that makes our phones faster or our videos sharper, but a fundamental shift in how we solve the complex problems that shape our physical world.

While everyone's been obsessing over ChatGPT writing emails and generating cat poetry, a quieter revolution has been brewing in the engineering world. It's called Engineering Artificial General Intelligence, or E-AGI, and it promises to do something that should terrify and exhilarate us in equal measure: think like the best human engineers, but without the coffee breaks.

On the Evaluation of Engineering Artificial General Intelligence

P-1 AI, a startup founded by former Google DeepMind researcher Aleksa Gordić, along with former Airbus CTO Paul Eremenko, raised $23 million in seed funding.




This is Heliox: Where Evidence Meets Empathy

Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter.  Breathe Easy, we go deep and lightly surface the big ideas.

Thanks for listening today!

Four recurring narratives underlie every episode: boundary dissolution, adaptive complexity, embodied knowledge, and quantum-like uncertainty. These aren’t just philosophical musings but frameworks for understanding our modern world. 

We hope you continue exploring our other podcasts, responding to the content, and checking out our related articles on the Heliox Podcast on Substack

Support the show

About SCZoomers:

https://www.facebook.com/groups/1632045180447285
https://x.com/SCZoomers
https://mstdn.ca/@SCZoomers
https://bsky.app/profile/safety.bsky.app


Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs

Curated, independent, moderated, timely, deep, gentle, evidenced-based, clinical & community information regarding COVID-19. Since 2017, it has focused on Covid since Feb 2020, with Multiple Stores per day, hence a large searchable base of stories to date. More than 4000 stories on COVID-19 alone. Hundreds of stories on Climate Change.

Zoomers of the Sunshine Coast is a news organization with the advantages of deeply rooted connections within our local community, combined with a provincial, national and global following and exposure. In written form, audio, and video, we provide evidence-based and referenced stories interspersed with curated commentary, satire and humour. We reference where our stories come from and who wrote, published, and even inspired them. Using a social media platform means we have a much higher degree of interaction with our readers than conventional media and provides a significant amplification effect, positively. We expect the same courtesy of other media referencing our stories.


Welcome to the deep dive. This is where we take a big pile of information and, well, pull out the key stuff for you. It's really designed for you, the listener. If you want to get smart on something fast, maybe find a few aha moments without getting totally swamped. Mm hmm. So today our mission is to get our heads around this emerging field called engineering artificial general intelligence. E-AGI for short. You sent us a really interesting research paper about how you'd even evaluate EGI, plus a news piece on a company, P1AI, that's actually trying to build it. Exactly. Think of it as your quick guide to a field that could, well, really change things. And that paper you mentioned, it really highlights this core problem. designing physical things, planes, cars, power plants, you name it. It's just incredibly complex. Right. It's not like software, is it? Not at all. With physical systems, everything's tangled together. You've got these tight connections between different physics. You often don't have all the data you need. Constraints can change mid-project. The goals might even evolve. It's messy. Yeah, I can see that. And right now, engineers lean pretty heavily on software tools. Right. Things like CAE, computer aided engineering and MBSC, which is model based systems engineering. That's basically using digital models for everything. Pretty much. Yeah. Using models as the central blueprint. But, you know, even these advanced tools, they have limitations. They can have steep learning curves. They often still rely on heuristics kind of rules of thumb. And you still end up doing a lot of iteration trial and error. It takes time. And the paper really zeroes in on that interconnectedness, doesn't it? Like a small temperature change might impact structural strength, that kind of thing? Precisely. That coupling is key. Add in uncertainties about materials, how things will actually operate in the real world. It makes for a really challenging design space. The current tools are powerful, sure, but they're often more like super calculators than, say, creative partners. So that's where this idea of EGI comes in. The paper suggests moving beyond just tools that follow procedures to AI that can actually reason with engineers. We're talking AI that could maybe suggest novel approaches, simulate how a system behaves, way different trade-offs, and actually produce structured designs. It's like moving from calculation to cognition. Exactly. Cognitive automation. The paper defines EAGI as a specific type of AGI, artificial general intelligence, but specialized, focused on expert-level reasoning, analysis, and that creative synthesis part, specifically for engineering physical systems. Interestingly, they explicitly carve out software engineering, at least for now. Okay, just physical systems. To the scope, yes. The aim is basically to automate the complex thought processes of of a really experienced human engineer. So what does an EDI agent actually need to be able to do? The paper lists a few key characteristics. First up is background knowledge and recall. Right. All the basics. Engineering laws, formulas, standard practices, design rules, material properties. It needs all that foundational knowledge accessible, like a massive internal library. Yeah, that makes sense. And that ties into their evaluation idea, right? Using Bloom's taxonomy. It does. Yes. That's the first level. Level one. Remember, Bloom's taxonomy, for anyone unfamiliar, is just a way to classify thinking skills from basic recall up to complex creation. So for EZHI, remember means accurately recalling facts, like knowing Ohm's law instantly or the properties of a specific alloy. Got it. Okay, second characteristic, process and tool familiarity. So the EGI needs to know how to use those CAE and MBSE tools. Yes, and more than just use them. It needs to understand the workflows, how to connect to simulation pipelines, work with digital design models, and even know the limitations or requirements of different software packages. It has to navigate that whole digital engineering environment. It's not just knowing what the tools are, but how engineers actually use them together in practice. Exactly. The sequence, the inputs, interpreting the outputs in context, that's crucial. Third is contextual understanding. Being able to look at a design, maybe, and recognize standard components or common system layouts, like spotting a heat exchanger or a specific type of aircraft wing structure. Precisely. Having that sort of built-in recognition of common engineering building blocks and how they usually fit together, it makes reasoning much more efficient, lets it apply old knowledge to new problems. Okay, fourth, and this one sounds like a big deal, creative and adaptive reasoning. This is where it goes beyond just applying formulas. Absolutely. This is about exploring genuinely new solutions, maybe taking an idea from one domain and applying it elsewhere, or optimizing a design across multiple competing objectives like cost versus performance. So we're moving up those bloom levels now. Analyze, create. Definitely. We're getting into analysis, creation, even a bit of reflection. It's about innovation, finding non-obvious solutions. Imagine an EGI suggesting a completely new way to bond two materials that maybe a human hadn't considered. Wow, okay And the last one is collaboration and communication So it needs to play well with humans It has to Be a useful team member Understand what the human engineers are trying to achieve Explain its own reasoning clearly Participate in discussions naturally No dense jargon Right, because it's meant to be a partner Not just a black box spitting out answers

That's the core idea:

human AI teamwork.

Now, the paper draws a really important line here:

EEGI isn't just fancy surrogate modeling, like making quick AI approximations of complex simulations. No, it clarifies that. EEGI should be able to call or use surrogate models. But its core function is the cognitive automation, the thinking part. And it's also different from today's large language models, right? They say LLMs, while great with text, don't really have that structured grasp of physics or the cause and effect reasoning across different domains that EEGI needs. That's a critical distinction. Yeah. LLMs can generate plausible sounding text about engineering. but they don't inherently understand the underlying physical principles or the causal chains in a complex system in the way EGI would need to. You can't just wrap an LLM around an engineering problem and expect it to work robustly. Which brings us to the huge challenge. How do you evaluate this? How do you test if an AI is actually capable of all this complex engineering thought? Exactly. The paper argues existing AI benchmarks... whether for general AI or very narrow tasks, just aren't suitable. Engineering problems are too context dependent. There's often no single right answer. So finding a way to measure something this complex and multifaceted, that's the core problem they're tackling. It really is. How do you build a meaningful assessment framework? And their answer is to adapt Bloom's taxonomy specifically for engineering. They've developed this six-level hierarchy of cognitive abilities for EAGI. And the clever part is how they link these abstract thinking levels to really concrete engineering tasks. It makes it practical, relevant to what engineers do day to day. Okay, let's quickly walk through these six levels. Level one, remember, we touched on that recalling facts formulas, like what's the thrust equation for a propeller? Simple recall. Level two, understand. This is about grasping the meaning. identifying components in a drone design, knowing what the parameters mean physically. Right. Understanding the semantics, the relationships between parts.

Level 3:

Apply. Using that knowledge. Evaluating a design. Predicting performance. Like calculating the new thrust if you change a propeller's diameter but keep the RPM the same. Maybe using a simulation tool here. Yes, putting the theory into practice. Applying principles to solve concrete problems, potentially involving standard tools. Level four, analyze. Getting deeper. Filling in gaps in a design or diagnosing errors, like completing a partial design to meet specs or figuring out why a drone's thrust is too low, requires some detective work. Breaking down the problem, identifying causal links, understanding why something isn't working as expected, more complex cognition now. Level five, create. This sounds like the really exciting one. Design synthesis. Generating new designs for requirements, adapting existing ones, exploring trade-offs. The example they give is designing a whole EV-tall aircraft with specific payload, hover time, and motor limits. This is the peak of engineering innovation, really. Exploring the design space, balancing trade-offs, possibly finding truly novel solutions. And finally, level six, reflect. Reflect. This is meta reasoning, critiquing its own process, understanding broader implications, spotting limitations in the tools or data, like questioning why a drone designed with manufacturer data underperforms at high altitude, recognizing the assumption about air density might be wrong. Exactly. This is expert judgment, stepping back, assessing the whole process, understanding assumptions, identifying potential pitfalls or areas for improvement. Really sophisticated stuff. Now, to make this framework even more useful, they add three dimensions of problem complexity. First, directionality. Are you analyzing something existing forward or synthesizing something new backward? Right. Analysis versus synthesis. KICKEN, design behavior. Static properties, like material strengths versus dynamic behavior over time, like vibrations. Static versus dynamic. And third, design scope. Is it a closed world problem with limited options or an open world one with many possibilities? Closed versus open. Okay. These dimensions really help classify how difficult the task is. A level one remember task is usually forward, static, closed world. Simple. But a level six reflect task. That could be backward or forward, dynamic, open world, much harder. It shows why you can't just test one type of problem and call it done. Precisely. The complexity varies hugely. And they add another layer for evaluation. metadata tags to make sure testing covers everything yes the tags very important tags for system type EVT all HVAC etc design scope component subsystem whole system domain thermal electrical structural fluids modeling requirements steady-state transient and applicable standards like industry codes These tags are crucial for systematic evaluation. They let you ensure you're testing across different disciplines, complexities, system types. It's like creating a detailed curriculum map for the EHEI's assessment. You can ensure breadth and depth. Right. So you could specifically test, say, apply scales in the thermal domain for HVAC subsystems using transient models. Sure. Very targeted. And you can use these tags to structure the benchmarking. Start with simpler tag problems and as the EGI improves, increase the complexity across these dimensions. It allows for progressive evaluation. They also mention using these tags to generate new test questions using templates. Like a generic, "Given X, what happens if Y" template could be filled in using different tags to create tons of varied questions. That's a smart way to create diverse evaluation problems efficiently. It ensures you're testing the underlying reasoning ability, not just memorization of specific examples. And you can get some good examples. Level 3, apply. Given a 400 kVV motor and 22.2 V supply, calculate RPM. Simple application. Trade forward calculation. Level 5, create. Design a propulsion system for a 10-kilogram drone minimizing current draw during hover. Much more open-ended. Requires synthesis and optimization. And Level 6, reflect. A complex scenario about an HVAC design flaw, asking the EGI to identify the underlying assumptions that might be wrong. Yeah, that Level 6 example really shows the depth of critical thinking they expect. It's about questioning the inputs and the model itself. So how do they actually score this? For the lower levels, remember, understand, apply, where answers are often clearly right or wrong, they suggest automated scoring. Using symbolic solvers, lookup tables, simulations. Makes sense. For deterministic problems, automation is efficient. But for Analyze and Create, levels 4 and 5, where answers might vary but still be valid, they propose simulation augmented heuristics. Using simulations to check if the EAGI's solution meets constraints or performs well compared to a baseline, even if it's not the only possible solution. Right. So the simulation helps judge the quality and feasibility of the proposed solution, rather than just checking for one specific correct answer. Validating performance targets, checking if a fix actually works. And for level six, reflect the really nuanced stuff. They say you'd likely need expert humans involved or maybe even other advanced AIs acting as judges. Which seems necessary. Evaluating that kind of meta reasoning, the why behind the decisions, the soundness of the thought process, that's hard to capture with truly objective metrics alone. It requires judgment. Okay, so this whole framework gives us a way to think about testing EGI. Okay. Which brings us neatly to the news about P1AI. They're actually trying to build this stuff, right? An EGI called Archie. It seems so, yes. Specifically aimed at physical systems engineering. And look at the co-founders' backgrounds. Airbus, United Technologies, DARPA, DeepMind, Microsoft. That's some serious pedigree in both engineering and AI. And they raised $23 million in seed funding, which is not insignificant. Suggests investors are taking this seriously. Definitely signals strong belief in the potential. So what is Archie supposed to do initially? The article mentions multi-physics and spatial reasoning, understanding how different physics interact and how things fit in 3D space, and focusing on tacks like identifying key design drivers, developing concepts, doing those early rough tradeoff studies, and maybe even helping select the right simulation tools. Again, that cognitive automation idea, not replacing the tools themselves. That aligns almost perfectly with the EAGI concept from the paper. Archie sounds like it's designed to be that collaborative reasoning partner, helping with a higher level thinking and strategy. But one of the big roadblocks mentioned is data, right? There aren't huge public data sets of detailed engineering designs like there are for text or images. It's often proprietary, sensitive stuff. That's a massive challenge, yes. Training data for this kind of AI is inherently scarce and specialized compared to, say, training in LLM on the Internet. So P1 AI solution is interesting. They plan to create large synthetic data sets, generated data, but grounded in physics principles and even considering supply chain information. That's a clever workaround. If real data is scarce, generate high-quality artificial data based on fundamental laws. It allows them to explore a vast design space computationally and train the AI on the underlying rules, not just specific past examples. Their first target application is data center cooling systems. complex, important, lots of physics involved, but they plan to expand later industrial systems, automotive, aerospace, defense. Data center cooling is a smart starting point. It's a challenging multi-physics problem with real-world impact and potential for optimization. Good place to prove the tech. One of the co-founders, Paul Aromenko, has this vision. He talks about every engineering team having an Archie, like another team member, initially doing the repetitive stuff, but eventually helping us build things we don't even know how to build today. That's ambitious. It's a very compelling vision. It really speaks to the potential for EGI to augment human engineers. free them up from more creative tasks, and maybe push the boundaries of what's possible. True innovation. And the other co-founder, Alexa Gordich, makes the point that this isn't just about slapping an LLM interface onto existing tools. He says it needs fundamental breakthroughs in data and the AI models themselves. Which reinforces that paper's point. Building true EGI isn't just an incremental step. It likely requires new architectures, new ways for AI to represent and reason about the physical world. It's a hard problem. So pulling it all together, it feels like EAGI is this potentially transformative force for how we design physical things. This paper gives us a really solid framework for thinking about how to measure progress. A much needed evaluation structure, absolutely. And P1 AI looks like a serious attempt to make it real, tackling that tricky data problem head on. It feels like a step towards maybe getting a shortcut to much more advanced engineering capabilities down the line. I agree. The framework provides the roadmap for assessment, and P1AI represents a tangible effort to actually build the vehicle, so to speak. It suggests the field is moving from purely theoretical to practical development. Which leaves us with a big question for you, the listener, to think about. As EAGI, like Archie, gets better, how does the job of the human engineer change? Do we shift away from doing the detailed design work ourselves and more towards defining the problems, setting the goals, overseeing the AI and managing the overall system integration? What does the future engineer look like in a world with EGI partners? That's the key question, isn't it? It's about the evolving human AI partnership. As AI handles more cognitive tasks, human skills, and problem definition, critical interpretation of AI results, ethical oversight, and complex system thinking probably become even more valuable. It's a fascinating future to contemplate.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.