Building Enterprise-Ready AI Agents While Protecting Your Code Artwork

What's Up with Tech?

Tech Transformation with Evan Kirstel: A podcast exploring the latest trends and innovations in the tech industry, and how businesses can leverage them for growth, diving into the world of B2B, discussing strategies, trends, and sharing insights from industry leaders!

With over three decades in telecom and IT, I've mastered the art of transforming social media into a dynamic platform for audience engagement, community building, and establishing thought leadership. My approach isn't about personal brand promotion but about delivering educational and informative content to cultivate a sustainable, long-term business presence. I am the leading content creator in areas like Enterprise AI, UCaaS, CPaaS, CCaaS, Cloud, Telecom, 5G and more!

All Episodes

What's Up with Tech?

Building Enterprise-Ready AI Agents While Protecting Your Code

September 02, 2025 • Evan Kirstel

Interested in being a guest? Email us at admin@evankirstel.com

Have you ever wondered how AI coding assistants could work in secure enterprise environments without sending your proprietary code to the cloud? That's exactly what Tabnine is tackling as an enterprise-first AI assistant that predates even GitHub Copilot.

The secret to effective AI agents in enterprise settings isn't just raw model power—it's context. Tabnine creates a comprehensive "map" of your organization by connecting code repositories, documentation, configuration files, and project management tools. This enables their AI agents to understand the environment they're operating in, dramatically improving accuracy while reducing costs. As their representative explains, "Enterprise without context is meaningless."

What makes Tabnine particularly valuable for industries like finance, automotive, and defense is its deployment flexibility. You can run it as a SaaS solution, in your virtual private cloud, or completely air-gapped on your own hardware—partnering with NVIDIA to ensure optimal performance across various GPU configurations. This means sensitive code never leaves your secure environment.

The platform also addresses intellectual property concerns with built-in attribution and provenance tools that flag potentially non-permissive code generated by models. Combined with comprehensive audit logs, this gives organizations control over what AI agents can access and modify within their systems.

Perhaps most importantly, Tabnine is helping to solve the delegation challenge—how do you know if AI-generated code actually does what you want? Their approach emphasizes breaking tasks into well-defined modules that humans can easily verify, providing rigorous specifications, and ensuring generated code follows organizational guidelines rather than some "platonic notion" of good code. While we've reached what they call a "code generation singularity," the "software engineering singularity" is still approaching.

Support the show

More at https://linktr.ee/EvanKirstel

Speaker 1: 0:01

Hey everybody, fascinating discussion today, as we talk about building smarter agents without handing your code over to the internet. Let's unpack what this means with Iran. Iran, how are you Doing? Well, how are you? Thank you for having me, thanks for being here. Really timely and interesting discussion. Before that, maybe talk about Tab9 and the journey and the mission that you're on.

Speaker 2: 0:26

Yeah, Tab9 is an enterprise-first AI assistant. It means that we serve enterprise customers with the latest, greatest AI agents, but agents that are aware of the enterprise context in which they operate, aware of the enterprise context in which they operate. Maybe one of the biggest challenges of agents in the enterprise is situational awareness, like how do you know what you're even working on right? Typically, it's not like a greenfield situation, which you are generating application from scratch. It is more like hey, you asked for a feature. I need to read 30,000 lines of code before I can even change the one line that it ends up requiring, right. So it's a very kind of different agent situation than the Greenfield one.

Speaker 1: 1:15

Fantastic and what spurred the creation of the company? What were the big challenges that you saw over the past number of years? With keeping code private and secure and on point.

Speaker 2: 1:29

Yeah, so Tab9 originated, I think, a couple of years back. Actually, we are the OGs of this area. We predate co-pilots. Initially the technology was not yeah, technology was not very mature then. Technology was there. Uh market has matured with us and I I think over time, uh, people have realized the potential. I think this is the fastest growing application of uh gen ai and I I think maybe one of the biggest challenges now, uh, with agents, or two of the biggest challenges are one, being aware of the context which you operate, and the second is like hey, how can we trust the code that is being generated right, like so much code is being generated automatically by AI will kind of like reach code generation singularity already, but it doesn't mean that we reach software engineering singularity, right. So that's fantastic already but it doesn't mean that we reach software engineering singularity.

Speaker 1: 2:28

That's fantastic. And how do you keep agents on task and stop them from wandering? I mean, you've got your own tools. Can you share exactly the magic that's working sort of in your platform?

Speaker 2: 2:43

Yeah, absolutely so. Agents really rely on if you want to think your platform. Yeah, absolutely so. Agents really rely on kind of if you want to think about it abstractly dimensions. One is the horsepower of the underlying LN and that keeps improving all the time. Just, you know, we're seeing like rapid improvement of models, both API-based models like Cloud, but also open weight model, namely, you know, the Gwen family is doing pretty well these days. Gpt-oss is another open weight that is doing relatively well for agents.

Speaker 2: 3:14

So one dimension, as I said, is like the just raw horse bar of the underlying LN. But second dimension, which is often overlooked, is the context that is available to the agent. Agents need some map of the organization in order to work effectively in non-Greenfield call it Brownfield if you make a situation and that is what Hub9 is really focusing on Connecting to all sources of information in the enterprise think code repositories, documentation repositories, confos, jira all those things connected them as first class citizens, creating some map of the organization that is then used as context to the agents in whatever tasks are given. So if an agent is not given like the top nine context engine, they have to constantly rediscover things about the org, which is hurting them in terms of accuracy and hurting them in terms of token consumption, which translates, at the end, to much higher costs than what you get when working with top nine.

Speaker 1: 4:19

Interesting, and how do you test that the reasoning is genuinely better with your platform, you know what sort of uplift can you share. How do you measure that?

Speaker 2: 4:31

Yeah, obviously we have our own internal benchmarks for large code and we are working like for several years now, with very large enterprises spanning tens of millions of lines of code.

Speaker 2: 4:45

Each which we dive into top nine, kind of understand the lay of the land, creates its own maps of the world and then makes them available to the agents that are operating in org. Again, a lot of this technology has already been built in earlier generation chat right, when you kind of get the prior to agents to just have the functionality of chat with my code base. Let me ask the question how do I implement the Christmas discount? Well, implementing Christmas discount in our code base takes touching 52 files across multiple services, right? So that's the kind of stuff that you'll get from tab nine, and initially it was built to help and guide people working on the code base, but this has been evolving very rapidly to serve as foundations for any agent operation in the org and we're seeing agents being significantly more successful where they're handed this kind of contextual information. Again, the progress is being done on both fronts, like models are improving and context engine is improving, and the compound of that is that the agent's ability to operate within an enterprise is improving all the time.

Speaker 1: 6:05

Yeah, that's very exciting that the agent's ability to operate within an enterprise is improving all the time. Yeah, that's very exciting. And you're integrating with NVIDIA's hardware platforms which products in particular? And maybe talk about that integration and how it's going?

Speaker 2: 6:19

Yeah, so we are a very close partner with NVIDIA. We are deploying. Tab9 has full deployment flexibility, so part of the magic of Tab9 is that you can deploy it on SaaS, you can deploy it in your VPC, in your virtual part account behind your firewall, and you can also deploy it on your own hardware, completely air-gapped, so you can run Tab9 on bare metal, in which case we partner with NVIDIA and with others to run effectively on the platform. We constantly things using NVIDIA NIMS, which is their NVIDIA inference kind of platform. We also have partnership with their model team on deploying Nimctron models, which are increasingly improving all the time, and I think maybe the bottom line is that by partnering with various parts of NVIDIA, we're able to provide very effective agents and very effective AI assistance in general in air gap situation, in any situation that run on very wide range of different GPUs H100s and beyond.

Speaker 1: 7:34

Interesting. So agentic workflows can often sprawl and kind of get out of control, kind of unconstrained. We've seen pretty high failure rates in the early testing and deployment. Any lessons learned from failure models and other things that you could share?

Speaker 2: 7:52

Yeah, I think it goes back to the fundamental question of partitioning the work between the human and the agent and finding the right granularity of delegation. So a lot of the frustration I think comes from over-optimistic delegation without understanding what the agent is and is not capable of doing. And is not capable of doing, what we're seeing smart users do is partition the high-level tasks to smaller pieces which are, importantly, easy for the human to understand if the right thing happened or not. So I think people underestimate that problem alone of, say, I delegate something to the agent, agent goes off, comes back with 30,000 lines of code. Say, how do I know that I got what I wanted?

Speaker 2: 8:46

That's actually a fundamental problem of delegation, regardless if you're delegating to a human agent or to an AI agent. You are delegating some task, you're getting back massive amounts of code. How do you know that the code does what you intended? So we're seeing increasingly a move to more rigorous specifications that are given to the agent and also to more modular partitioning of the work to smaller pieces that are given to the agents, such that the task is a well-defined and be easy enough to kind of say okay, I, I think I got what I wanted and if not, I can iterate with the agent. So I think that's the art of working with agents really breaking the task into these manageable pieces, and it's really a problem of delegation. Again, the evolution of that is that specifications are getting more sophisticated and more rigorous to enable this delegation.

Speaker 1: 9:45

Interesting. You mentioned you're working with some very large clients. I know customers are pretty protective of secrets and their names. Can you share any stories or anecdotes about the customers you're working with and how you're deploying?

Speaker 2: 10:01

yeah, sure, again, part of top line magic is being able to deploy behind the firewall. So we are working with many customers who are in finance, automotive, insurance, defense, etc. And for these customers it is very valuable to run all these agent workloads without even being connected to the internet, right? So this is deployed either in their virtual private cloud or, as I said earlier, completely air-gapped. I think for those, what we have seen over time is the improvement in models and our ability to deliver the models as they improve very rapidly. We're seeing model evolution in the span of weeks or months.

Speaker 2: 10:56

Being able to deliver these models to AirGap or VPC customers is very influential and I think one thing that maybe, if you are talking about stories, is the ability of top-line context engine to help agents, even with relatively weak models, is something that is not, is maybe not immediately apparent.

Speaker 2: 11:23

Maybe I'll try to explain that intuitively. When the agent uses a model that is not as strong as whatever you have out there trillion parameter models, trillion parameter models, trillion parameter models A context, an effective context engine, kind of serves as a shortcut for the agent, rather than the agent having to go off and discover things about the code base incrementally, kind of like doing massive exploration. The context engine kind of gives you the ready-made context capsule for a particular situation in which the agent is able to process even with a weaker model. Does that make sense? Yeah, I think. Maybe another way to put it is like the total power of the system is a combination of the ln power and the context uh power right, and so you can compensate for one by over provisioning. On the others, you can compensate for relatively weak model by over provisioning. Sophisticated context management you can maybe over-provision. It encompasses, maybe for some weak context with overly strong models, but again, it's a combination that makes the system successful.

Speaker 1: 12:49

So what happens when intellectual property or safety concerns show up? How do you vet? You know providence and license risk and align with all the enterprise legal standards and sort of thing without neutering the model quality that you want.

Speaker 2: 13:07

Yeah, that's a great question. There's a lot of functionality built into top nine exactly on these points. Top nine, as I said, is an enterprise-first product, so it means that we are also able to work with external API-based models, like you know, cloud or GBI5, right and in which case Tab9 protects you from non-permissive code that may be generated by this model by. We have our own kind of attribution, provenance and censorship module in tab 9 that flags when a non-permissive code may have been generated by the model, in which case you can control the behavior, whether you'd like this censored or maybe you'd like to just record it in some log. You can control if this is even presented to the user or hidden from the user.

Speaker 2: 14:00

There are other modules of Web Hub 9 that serve as an enterprise level audit log for all the code that has been generated by Hub 9 and inserted into, just for controlling the behavior of the agent and what it is able to do and change inside the organization. Again, we've seen in early days agents changing things that they should not have been changing, agents reading things that they should not have been reading. So you have extensive control around that in top nine. Again, this is only improving based on our experience. So you have extensive control around that in top nine Again, and this is only improving based on our experience with enterprise customers.

Speaker 1: 14:45

Impressive. And what's the shortest path, the time horizon, between a proof of concept kind of agent and a production grade agent? What are the gotchas and how quickly can you move to production at the moment?

Speaker 2: 15:01

I think, again, agents are a reality. People are using them. I use it every day. I don't write code without a Pub9 agent and I write code every day, so I write it with Pub9. So in that sense, it is production.

Speaker 2: 15:17

I think being successful in an enterprise setting is largely a question of, as I said earlier, setting is largely a question of, as I said earlier, scoping the granularity of the task for the agent and also controlling what gets consumed at the end, like how do you know that you got what you wanted?

Speaker 2: 15:40

Again, we are also focusing heavily, investing heavily in code review functionality, that is, reviewing the code generated following organizational guidelines and not just like what the model thinks is good or bad code. You want to actually and the code that is being generated to respect the guidelines of the organization, not just be glued in some platonic notion. And so top nine is heavily investing in scaling and automating the code view process following organization guidelines, because we believe that that's the next frontier of being effective with agents. Right, because generating code is easy, but you need to control which codes you actually want to put into the code base. At the end, it is still us humans who are responsible for the code base when it breaks, at least for now right that makes sense and talk about your roadmap with NVIDIA.

Speaker 1: 16:41

maybe a little bit what you can share in terms of what's next new capabilities, what are you excited about working with them and others?

Speaker 2: 16:51

Yeah, I cannot share specific details. I guess, at a high level, that we're seeing constant improvements from the NEMOTRON models and we're excited about working with NVIDIA on that. Other than that, I think we'll have to wait and see what comes out, but I think there are many exciting things coming and agents will keep improving. We see that all the time.

Speaker 2: 17:21

Context engine keeps improving, being able to grok more and more deeper understanding of the enterprise, that is, cross-repo, cross-systems, connecting and correlating things like you know your confluence stages with your code base, with your georepticus, and kind of getting the same situational awareness as your best engineers. I think again, abstractly, we believe that in order to be effective, the AI engineer needs the same observability, same visibility like the human engineer, right? So when you do a bug, you need to be able to look at the Zendesk ticket and the Jira ticket and the conquest page of the best practices and the code base and the history and everything together and that's the traditional awareness and what makes you, together with your memory and your brain, is what makes you a great engineer and this is what we're building for the AI engineer the same situational awareness, again for the enterprise. This is critical. This is the only way to operate within an enterprise right. Enterprise without context is meaningless right.

Speaker 1: 18:29

Yes, not too much vibe coding going on there. Vibe coding again is valuable.

Speaker 2: 18:35

It's valuable even in the first setting for prototyping things, but when you reach actual production, you need to know the constraints and the overall code base that you're dealing with. Otherwise, you just reinvent, maybe, things that already exist or do things in a way that are not compatible with the existing codebase and the existing org guidelines.

Speaker 1: 18:55

I hear you, so we're sort of back to work now in September. What are you excited about in the run-up to the year-end? Do you have any travel events or meetups?

Speaker 2: 19:05

uh, lots of customer meetings yeah, yeah, a lot of uh, customer engagements, uh, meetups, etc. I don't remember off the top of my head. I'm I'm really excited about the next generation of agents. I think what what we're seeing is abilities that were really science fiction just a couple of years back, and I think really what I said initially is that, you know, the co-generation singularity is already here. Software engineer singularity is not yet, but we're rapidly approaching, I think, a situation in which every developer is really a team lead of AI engineers, and these engineers are able to accomplish many of the daily tasks and you just need to kind of break the work for them, plan it a bit and make sure that what you get is maintainable by humans at the end, because, as for now, humans have to deal with the breakages and the maintenance moving forward again, which with some AI systems, but it's still the responsibility of the human. So I'm really excited about that.

Speaker 1: 20:17

Well, with ultimate power comes ultimate responsibility. I think that was Spider-Man. So yeah that was a mic drop moment, so thanks, we'll end on that very intriguing note. Thanks for joining Aran, thank you for having me and thanks for listening, watching, sharing everyone, and be sure to check out the TV show Tech Impact TV on Bloomberg and Fox Business. Thanks everyone. Thanks, aran, take care. Thank you.