Seeing Through the Clouds: Observability with Justin Ryburn Artwork

The Cloud Gambit

The Cloud Gambit Podcast unravels the state of cloud computing, markets, strategy, and emerging trends. Join William Collins and Eyvonne Sharp for valuable conversations with industry mavens that educate and empower listeners on the intricate field of innovation and opportunity.

All Episodes

The Cloud Gambit

Seeing Through the Clouds: Observability with Justin Ryburn

May 06, 2025 • William Collins • Episode 49

Send us a text

Justin Ryburn is the Field CTO at Kentik and works as a Limited Partner (LP) for Stage 2 Capital. Justin has 25 years of experience in network operations, engineering, sales, and marketing with service providers and vendors. In this conversation, we discuss startup funding, the challenges that organizations face with hybrid and multi-cloud visibility, the impact of AI on network monitoring, and explore how companies can build more reliable systems through proper observability practices.

Where to Find Justin

LinkedIn: https://www.linkedin.com/in/justinryburn/
Twitter: https://x.com/JustinRyburn
Blog: http://ryburn.org/
Talks: https://www.youtube.com/playlist?list=PLRrjaaisdWrYaue9KVLRdq5mlGE_2i0RT

Show Links

Kentik: https://www.kentik.com/
Day One: Deploying BGP FlowSpec: https://www.juniper.net/documentation/en_US/day-one-books/DO_BGP_FLowspec.pdf
Stage 2 Capital: https://www.stage2.capital/
Doug Madory's Internet Analysis: https://www.kentik.com/blog/author/doug-madory/
Netflix Tech Blog: https://netflixtechblog.com/
Multi-Region AWS: https://www.pluralsight.com/resources/blog/cloud/why-and-how-do-we-build-a-multi-region-active-active-architecture
AutoCon: https://events.networktocode.com/autocon/

Follow, Like, and Subscribe!

Podcast: https://www.thecloudgambit.com/
YouTube: https://www.youtube.com/@TheCloudGambit
LinkedIn: https://www.linkedin.com/company/thecloudgambit
Twitter: https://twitter.com/TheCloudGambit
TikTok: https://www.tiktok.com/@thecloudgambit

Speaker 1: 0:00

You have what you call general partners and you have limited partners. The general partners are the people who are basically running the fund. They're raising money, and then you have what are called limited partners, so they're the people putting the money into the fund. I specifically picked stage two because their particular investment strategy. They're a venture capitalist fund, so they're investing in startups and they're specifically trying to raise money from limited partners, from LPs that are go-to-market leaders at other startups.

Speaker 2: 0:40

Hello and welcome to another episode of the Cloud Gambit. With me is my co-host, yvonne, who just got done traveling. How was your trip, yvonne?

Speaker 3: 1:06

Yeah, yeah, just got back from Google Cloud like workload migration and cloud foundations and all of that, so it was a good week, glad to be home though.

Speaker 2: 1:17

Were you excited to talk about AI all week?

Speaker 3: 1:21

I think so. I mean, it's always interesting to see what's coming, and we know that it's going to change the world, and so, yeah, it's always fun to talk about what could be.

Speaker 2: 1:32

Awesome. Any traveling difficulty, by the way, or was it all smooth?

Speaker 3: 1:37

It was pretty smooth. Yeah, yeah, I booked direct flights, which is always a good trick, and yeah, what I will say is I drove through some pretty good Kentucky flooding to get to the airport, but after that it was good. Had a lot of water here this spring.

Speaker 2: 1:55

Yeah, lots of that to go around. Well, anyway, with us is Justin, who was actually recently in Kentucky as well doing some sightseeing, and we had the both of us, yvonne and myself both had the privilege to hang out with Justin at a USNUA event not too long ago in Lexington, kentucky, of all places. But how are you doing today, justin?

Speaker 1: 2:19

I'm doing well, doing well. I actually missed out on Google Cloud Next because I was in Kentucky. We were in opposite places yet at the same time, Yvonne.

Speaker 3: 2:27

Yeah, hey, the Bourbon Trail is always a good thing.

Speaker 1: 2:31

It is fun. Like you said, a lot of water in Kentucky, a lot of flooding. It was amazing, just in the few days we were there, how quickly it receded, though.

Speaker 3: 2:38

Yeah.

Speaker 2: 2:41

Yeah, my whole backyard was pretty much flooded flooded. We have a creek that runs to the back of our property. That thing was like a river. It was crazy. My son actually took a. We have one of those like toddler plastic play pools, like the round ones. Like he threw that sucker in there and jumped in and just went. I'm like, yeah, don't do that. You could, you could get hurt. It's a little too vicious to be, uh, doing that right now. So good, good times. But anyway, thank you so much for joining us. Um, I think you we frequent a lot of the same circles. I feel like I see you at pretty much every conference I go to reinvent. We ran into each other at reinvent uh, recently, um was that last year, yeah, last reinvent yep, yep, that was in december in vegas.

Speaker 2: 3:34

Yep yeah, um, most of the um network automation forum conferences, all sorts of stuff. But yeah, you've been, you've been up to a lot of interesting things lately. But before we get into the topics of the conversation, do you want to just give us a little bit about your background, just kind of when you started in tech? It doesn't have to be too deep, but just a summary.

Speaker 1: 4:00

Sure, I'd say the summary of my background came up through the networking silo. I'm old enough to have started before there really were silos. But, as people kind of specialized in IT or in technology, networking was what I was passionate about. So I spent a lot of time doing that and, like you said, I now spend my days going to a lot of these conferences and doing either public speaking engagements or helping us staff booths and so forth. My current job title is field CTO with an organization called Kintec and we do some cloud-related stuff and some traditional network-related stuff doing what we call network observability Field CTO as Yvonne asked a little while ago what in the heck is a field CTO?

Speaker 1: 4:45

Yeah, I find it's about 50-50. When I introduce myself, whether people know what that means or not, so I'll explain. And you know, like a lot of jobs in tech, it means different things in different companies. But the way we defined it at Kentik is I do a little bit of brand awareness and thought leadership, so doing some blogs and podcasts and speaking engagements at events, do a bit of executive involvement with our customers, so building some executive relationships with some of our key buyers.

Speaker 1: 5:19

Really, what that comes down to is translating from one engagement to another the use cases for our product and how customers have found success out of our products in the past. So, for example, you know, if I'm at reInvent, where you and I ran into each other, william and I talked to five different customers I start to pick out themes from those conversations that I can then bring to other executives at our customers and say hey, you know, I've heard from two or three, four other customers that they use the product like this. They're really getting value. You know, this is where they're helping save money or they're helping be more efficient in how they run their operations by implementing the product in this way. So it kind of helps me bring some value to those conversations. Which is really what the executives are looking for is to be able to not just understand your product and how the widgets work, but how are they really going to have a successful business outcome from that?

Speaker 2: 6:14

Gotcha. So that sounds like it rolls up under sales. Yeah, it does.

Speaker 1: 6:22

It rolls into the sales organization.

Speaker 3: 6:23

I'm peers at our particular company, peers with the VP of sales, but help him kind of drive revenue Right kind of willing to go out and beat the bushes and spark a lot of those initial conversations and they're willing to hear no, a lot and it is an important and valuable role. But the peer of this, the technical peer of the sales leader, is really the person who brings trust and credibility to the product and is able to say, to validate the stories that the salespeople tell in a way that's going to resonate with a technical audience. Because at the end of the day, the thing that you're selling has to work and the job of you know technical sales leaders is to validate that and it's a pretty important thing to do to be able to put reality to a lot of the stories that our friends and peers over in sales tell. My accent's getting the best of me here.

Speaker 1: 7:39

Yeah, and I think what we're seeing is the economy tightens up a little bit, interest rates are high. There's a lot more pressure than ever on the CFOs to make sure that if they're going to invest money, in buying you know, something from a vendor that they're going to have a successful outcome.

Speaker 1: 7:56

They're going to have an ROI for that spend Right. And so you know more and more sales folks. As you're saying, you're having to have conversations around. Well, I understand how the product works. I understand what we're trying to accomplish. What's the business outcome? How is this going to help me? Where's my ROI going to come from? From this? And sometimes, with things like public cloud, it may be obvious, because they're going to be able to shut down a data center. They're going to be able to be more efficient with their operations. They're going to be able to be more efficient with their operations. They're going to be able to scale faster. Some of those are obvious. Some products are a little bit less obvious. Unfortunately, my employer their products like that. It's a very highly technical sale and it's not as big a brand, household name and so forth. Not every executive understands what they may be able to get out of our product. That's where myself and my team come in makes sense.

Speaker 2: 8:47

How, how would you? So? You're one. I've done some work with one of your counterparts uh, phil travasi, and uh, how would you differentiate? Like, what a field cto does with a tech evangelist is like one facing community and one is facing customers, but they're kind of sort of doing the same thing, different lens.

Speaker 1: 9:11

Yeah, there's definitely like, if you think about it from a Venn diagram, there's definitely some overlap.

Speaker 1: 9:15

So, phil and our technical evangelist team, they report into marketing officially, right, and so they're doing a lot of that brand awareness and thought leadership as well that I was talking about.

Speaker 1: 9:25

They're helping us build pipeline, so they're trying to build awareness in the community that we're out there, that our product exists, so that our sales teams could have very initial conversations with our customers. That's really what they're measured and compensated on is driving what we call top of funnel leads and getting more interest in call top of funnel leads, um, and in getting more interest in the top of funnel. Where my role comes in, I do a lot of the same type of of tasks when it comes to the external facing brand awareness stuff, but it's more to um, let's say kind of mid-funnel, more with existing opportunities. We're already engaging with customers or with existing customers, helping them get more value out of a product, helping them expand the usage of our product to get more value out of it. So, yeah, there's definitely some overlap and I work pretty closely with Phil and the rest of our tech evangelism team. We share a lot of ideas and content and so forth and collaborate on a lot of that.

Speaker 2: 10:22

Gotcha. So sometime, and thank you, that was a really good explanation, by the way, that's one thing. I was actually reading a blog the other day about the trying to pick apart the differences between all the different evangelists. You know, sales field, cto type. A lot of these are new titles or new roles that didn't exist for sure that long ago, so it can kind of be confusing, for sure. But sometime earlier last year, I think I think it was last year, uh, you, you mentioned or you made an announcement on linkedin that you were, you know, joining, uh, stage two capital is as an lp, and well, actually, let's. I don't want to get too ahead of myself here, but, um, do you want to just start, like before we get into the details, by defining what a limited partner is in venture in the first place? That might be helpful.

Speaker 1: 11:17

Yeah, sure, so, um, we'll come back to venture capital in just a second. But the way the, I'll say, the SEC defines private money funds, there's a specific designation which I don't remember off the top of my head. When you're setting up one of those type of private money funds, you have what you call general partners and you have limited partners. The general partners are the people who are basically running the fund. They're raising money from private investors to go and invest that money one way or another to get returns. Could be that they're investing in real estate by buying properties. Could be that they're investing in data centers. We hear a lot of money about companies like BlackRock investing in AI. They could be doing some stuff like that. There's a lot of different ways they could be investing that money, but the general partners are the ones running that. That private money fund, and you think of it as like the fund manager in a mutual fund is probably the easiest way to think about it. And then you have what are called limited partners. So there are the people putting the money into the fund and they can have all kinds of different relationships with the fund. Some of them can be just like hey, you know, I'm giving you my money, I'm trusting that you, as a general partner, are going to return my money to me plus interest, and they're pretty hands off.

Speaker 1: 12:33

I specifically picked stage two because their particular investment strategy they're a venture capitalist fund, so they're investing in startups and they're specifically trying to raise money from limited partners, from LPs that are go-to-market leaders at other startups.

Speaker 1: 12:50

So they're going out trying to find people to invest in their fund that not only can obviously give them some money that they can then turn around and invest in these startups, but also people who can invest some time based on their experience to help coach their portfolio companies. So help with their CEOs and say, hey, you've been successful before as a field CTO. What does that look like to build a field CTO team? Or, prior to this role, I actually helped build the SE team here at Kent right, the solution engineering team. Like, how do you build out and scale the solution engineering team at a modern SAS company? There's a lot of you know as a founder. There's a lot of things that people who are founding a company just don't know because they don't have the experience right and surrounding themselves with people who've been there and done that and understand what they're going to need to do to be successful and can give them some coaching is what a stage two is trying to accomplish with their, with their LP network.

Speaker 2: 13:39

So you're basically blending together, you know, go like basically the venture capital aspect of it, but also the go-to-market and coaching, because, like, let's be real, if you're working for a startup and you want to increase sales, the way to do that is to not quadruple your sales team and just do that and like nothing else. Like there's other things that you might want to pay attention to.

Speaker 1: 14:02

So you're taking that if only were that easy. For four extra reps equals four extra. If only it were that easy. For for extra reps equals for extra revenue. If only it were that easy. Unfortunately, it's a little more nuanced than that right. There's things you know like they call product market fit right when you got to make sure that your products evolved to a phase where there are repeatable use cases and there's a repeatable different aspects to that that. You know founders need some help in coaching in a lot of areas.

Speaker 3: 14:32

Well, and so you mentioned product market fit. What's the sweet spot there for Kentik, like, what's your ideal customer? Where does Kentik make the most sense? How do you see that playing out in the market?

Speaker 1: 14:50

Yeah, so, like I said earlier, we're a network observability platform and what that means is we can ingest data from a lot of different networks, and so the way we really differentiate ourselves is the variety of different networks that we can ingest data from. So we can do traditional data center networks, we can do enterprise SD-WAN environments, we can do I call it the big four cloud providers, so Amazon, microsoft Azure, google Cloud and Oracle Cloud we can ingest they all call it something slightly different, but it's essentially VPC flow logs, it's telemetry data about the traffic that's flowing through those networks into our product. We do the same thing for service provider on large, big carrier networks. So we can provide them network observability around the traffic that's flowing in those environments. So there are some performance use cases. Obviously, there's some planning use cases. There's some security use cases.

Speaker 2: 15:47

There's some security use cases that customers, uh, can use our product for talking about like there's a lot of stuff when you, when you say observability, there's a lot of like changes with observability, like over the past few years. I mean especially with like public cloud, um, the cncf, open telemetry, kind of like setting a framework or a baseline of where some vendors begin building on, you know, as far as the technology is concerned. But do you like? One question that I have just out of the gate is like do you see a big difference? Because like Kentik is framed as like network observability, like that's the focus, that's the essence of what kentik does. But do you see a lot of like overlap with cloud stuff or cloud products or cloud products, just different, you know, because they're really giving you observability wrapped up for for different things other than the network.

Speaker 1: 16:48

Yeah, I mean again, like most things in tech, right, there's a Venn diagram. You know, what we find with a lot of our customers is the cloud native product offerings, depending on the particular cloud we're talking about. Some are more mature than others, some are better than others, but the reality is most of them are really only looking after their own cloud network right. So if I'm going to go log into Amazon's portal, they have CloudWatch, right, and CloudWatch does a great job for customers who are only in CloudWatch. But if they have this is a specific example from a customer that I was talking to here a while back they have a data center that's using Cisco's I forget what they call their SDN product in their data center, but they have their own portal right where they can see their data center fabric. They can see all the applications deployed on data center fabric. They have a portal they can log in. They can see all that telemetry.

Speaker 1: 17:37

But if they have an application that's in there that's going across their SD-WAN and terminating in AWS, they have three different portals they've got to log into to look at that data.

Speaker 1: 17:45

They have three different portals they got to log into to look at that data. They got to look at it in their data center and they got to look at it on their SD-WAN and they got to look at an ADBS to troubleshoot it. The approach that we've really taken is like let's pull all that data into sorry, I'm going to use a marketing term single pane of glass so the customers can see across. You know, I guess you would call that hybrid cloud, right. But multi-clouds and other use cases we see with a lot of customers where they have some workloads in Google and some in Amazon and there's traffic flowing between them and they need to be able to do troubleshooting. They need to be able to see the traffic that's traversing both those clouds so that they have full visibility of all of their traffic across all the various different environments. And that's typically really difficult to do, or near or impossible to do just using the tools that any given cloud provider provides.

Speaker 2: 18:34

So is that the pain point right there? You know so, being a field C2, I'm sure you're privy to all sorts of customer pain points. I'm not imagine so, do you like. Is visibility the main thing, or are there other, like big pain points, just kind of that you see is trending across the board with, you know, customers?

Speaker 1: 18:54

I mean, at the end of the day, it all comes back to the visibility, right, and the reason that they care about that visibility is going to differ from customer to customer and what they're dealing with and how I mean even which team engaged us right. And so if I were to break it down the next layer, deep of details what I would say is it comes down to one of three things either performance, cost or security, right? So a lot of times when we get just cloud customers specifically to come to us, they have migrated to the public cloud. Their dev team probably started that initiative. The networking team was brought in late in the game and so it wasn't well architected. And so they're spending a lot of money on the cloud environment maybe a lot more than they planned on, a lot more than they hoped for, and they want to be able to see all their cloud traffic, to be able to figure out how to re-architect the network to be more cost-effective, right. Right, that kind of dovetails into performance, um, those are kind of two sides of the same coin, right? If it's not well architected, if it wasn't well designed, you're going to have some performance implications from that as well. Um, and then you know.

Speaker 1: 20:02

The third one is around security right. A lot of companies, if they had a traditional perimeter firewall type of security posture, when they migrate applications to the public cloud it's a lot harder to figure out whether the security posture is being enforced correctly, right, and so being able to see across all those environments, see what's traversing it, see what's being accepted, see what's being rejected, even be able to get proactive alerts when you know something that used to be accepted is now being rejected because somebody deployed a new EC2 instance or a new GCE instance and didn't update the firewall filters to allow that traffic in. You know, there's a lot of different things around that that we can help customers with.

Speaker 3: 20:47

So you guys ingest a ton of data from a ton of sources. Can you talk a little bit about how you manage that scale and availability of that data? Because flow data can get out of hand very quickly from a volume standpoint. So can you talk about how you handle that a bit?

Speaker 1: 21:07

Yeah, that actually was the original problem statement for which the company was founded. Right was this is a big data problem. So our founders had, I'll say, tried to solve this on their own when they were running networks and realized it's a lot bigger problem than they had realized and and started kentik.

Speaker 1: 21:22

But, um, you know, we run our own I'll say our own data center. We lease space in a data center and have our own infrastructure that we pull the data into and we've built our own systems that are optimized for flow data, because flow data is very, has very high cardinality and what that means is like one flow record versus the next may have big differences in the various different fields that are in it. The, you know, a lot of times people want to get down to a given IP address so that you know you have to store all the IP addresses that's in the flow record. So there's a lot of different challenges with, you know, flow data whether it's coming off traditional network or whether it's coming from the cloud, versus things like SNMP or streaming telemetry that are a little more structured, I guess, in the way the data is ingested. So we ingest the data, no matter which protocol we get it in from. We normalize it into our own internal format that's slightly compressed and then we do enrichments.

Speaker 1: 22:22

So we'll take in the data. I'll just use Google Cloud as an example. We'll take in VPC flow log from Google Cloud. It'll tell us things like you know source and destination IP addresses, but what a customer really wants to know is like well, which GC instance does that belong to, which project does that belong to, right? And so what we can do is scrape the API to pull in that metadata and store it. So when we ingest the new flow log, we can enrich it. Basically add more columns for each row that we ingest that shows that additional metadata and store that along with it. So when they go to query that data, they're able to do the query really fast. Because that's the other challenge that we see. You know a lot of customers who try to build their own stack is like if you try to do that correlation at query time, your queries become really complex and really slow to return results.

Speaker 3: 23:10

Well and so slow that the data is then not useful to you, right? If you're trying to do real-time troubleshooting or understand what's going on in the moment, querying vast volumes of data is not easy to do unless you have very well-tuned data systems and structures to do that.

Speaker 2: 23:31

Well, one thing I was going to say is like bringing a lot of this together, something I don't even. Maybe you could just educate me. I don't even know if this is like a marketing tactic with Kentik or what it is, but Doug, I can't remember his last name now for some reason he does like the rca stuff that he'll post on like linkedin, oh yeah doug madori yeah, those are so good like I pretty much read like all of them like end to end, like they're all fantastic.

Speaker 2: 24:00

How did how did that start? I mean that's like a good marketing exercise for kentik uh 100.

Speaker 1: 24:06

This goes back, so this goes back. So he's a tech evangelist as well, so he's peers with Phil that we were talking about a little earlier, right and again his. At the end of the day, what he's trying to do is raise awareness that we you know that we're out here as a company, as a brand and, to your point, doug does a fantastic job of doing post-mort Could be on undersea cable cuts, could be on BGP hijacks or leaks that take place out on the internet. Those are the two biggest ones that he does most of his reporting on, and he does a fantastic job. It's not really even a Kentik pitch when you read them right, it's just like hey, kentik has an interesting data set. We have interesting, you know, big data in our system that we can anonymize and see these things that take place on the internet that are interesting to other people who are, you know, uh, in the industry, and so he provides really good analysis around.

Speaker 1: 24:56

Um, you know a lot of these just call them internet weather things that take place, whether it's uh, you know, an undersea cable that was cut, either accidentally or as a high, you know as a, as an active attack. He can see changes in that. He can see when, like, the internet shut down because a uh, you know, a political regime has shut the internet down because they're about to do a coup or there's a test coming up. There's a lot of different reasons that countries will shut down the internet in their country to to knock their country offline for various different reasons. That countries will shut down the internet in their country to to knock their country offline for various different reasons. There's a lot of really interesting, fascinating geopolitical and uh internet type of things that that doug's able to see. Um, and a lot of it just comes down to looking at the bgp routing table, but some of it comes from, uh, the aggregated anonymized flow data that we have in our system as well yeah, and they're so good.

Speaker 2: 25:46

I one of them had gotten and this is like why I said an amazing marketing exercise. One of them had gotten picked up by like some major, like a huge tech publication. It was like referenced in there, I think it was like tech crunch or one of those. Not tech crunch, it was. Maybe it was tech crunch, but that's huge. I mean, that's something that marketing teams would pay top dollar to. You know, get featured in, you know, on some of these pieces, yeah, Well, a hundred percent.

Speaker 1: 26:13

Doug's reporting is amazing. I mean, um, he, he was actually called by the Washington post the man who sees the internet right, cause he can look at these different outages and he does nice reporting on I don't know what, what, what was going on during the event and how it happened, and so forth A lot of them. The news outlets in this country will pick up his phone and call him when you know, when there's any type of internet outage, because he's just got a reputation of uh, of doing a good job of of reporting on this stuff and you asked.

Speaker 1: 26:42

I don't think I fully answered your original question, which is you asked how this gets started. Doug was before Kentik at another company called Renesys who was ultimately acquired by Dyn and then they were acquired by Oracle and they did a BGP route table analysis and so that's kind of how he got started, was way back at Renesys doing analysis of the BGP table just to find these types of trends in the industry. He's just been doing it long enough that he's built his own brand and reputation as someone who has really good knowledge on this type of thing and interesting things to say and, again, does it in a way that it provides value to people beyond just a product, pitch right, which is really the um, the art behind tech evangelism.

Speaker 2: 27:24

Yeah, and it's incredible too because, like now's like the best time to do that type of stuff, because you have some I mean so many businesses you know are starting out in the cloud and whenever there's an outage or whenever something major happens, you know whether it's like a Cloudflare thing or whatever it is, you know, okay, like something happens in the, the internet for some provider or multiple providers impacted could, you know, have serious problems and everybody feels it. It's huge, like the whole world knows about it. So coming back and saying, oh hey this is kind of how that happened.

Speaker 2: 28:00

This was you know the you know order of operations of what led to you know xyz.

Speaker 3: 28:05

It's huge uh, it's very cool yeah yeah, no, uh, question that I have.

Speaker 3: 28:12

So there are always emerging texts and trends. Like you know, there's open telemetry that we're hearing a ton about their technologies, like eBPF and what emerging technologies and we can even include AI in this list. Do you see, you know, impacting Kintec's business, how you provide observability? Are there new trends that you're incorporating into your platform? What do you, as you look to the future? What, where are some of the big impacts that you see and how are you, you know, changing your business to respond to those?

Speaker 1: 28:53

I think all of those are interesting, but I'm glad you added AI, because that would be my answer Right.

Speaker 1: 28:58

You know, I'm sure you heard a lot about AI last week at the Google Cloud Next conference too. Right, and we're we it in our product too? We're still experimenting with a few use cases that we think will be really interesting for customers. The very first one was what we called Journey AI. It's essentially using an LLM to allow customers to, in natural language, ask a question and get back a visual as a response, right? So for a long time, customers could and would have to go into RUI and navigate and build themselves basically the graphical equivalent of a query, right? Like think it was like a Grafana dashboard, right? You can build all these different panels, all these different data and different graphs on it, but you have to know how to do all of that, right? Your C-level executive is not going to take the time to learn our UI and its nuances from every other UI that they've seen, right? So the idea is all right. Let's give them the ability to go in and say you know, hey, kentik, what's going on with my network today? Why do I have a performance issue? And for us to be able to go and turn that into a query using basically an API, into a couple different LLMs. We have a couple of them that they can choose as an option. So that was our first foray. That actually works amazingly well.

Speaker 1: 30:23

But we want to take it a step further and say, okay, we have all this rich data across flow, across synthetic tests that we do, across all the cloud stuff we take in, all the SNMP instrument telemetry data we take in. How do we provide faster root cause analysis? Right, how do we get to the root cause of a problem faster? And so we have a new feature we've come out with, called probable cause analysis, where you can basically highlight a spike in traffic in our UI, right click and have the AI say what's the most likely cause? Um, you know of this.

Speaker 1: 30:58

And of course it's going through and looking for correlation and then the data points to say, well, at the that, uh, you know, I had this spike in traffic. I'm gonna pick a bad example maybe, but like, fortnight was really released a new episode, right, and so that's the. You know the increase in your ott traffic is coming from these three cdns, and those three cdns are the ones delivering the primary, you know, the majority of the traffic from this new, uh, fortnight game that was released, right, you know. And then being able to take in even more data to be able to get down to actually suggesting remediation, is really where we want to go with this right being able to say not only can we help you figure out what was the probable cause of changes to traffic patterns in your network, but also try and suggest some uh things you might do to to fix that Right.

Speaker 3: 31:51

Very cool, that's really cool.

Speaker 2: 31:53

Yeah, is AI a feature or a product? And, like I think, for for most tech companies that have been around, it's like all that Google did with Google docs, with Gemini, is like really, really useful. It's not a new product I mean, they do have new products around Gemini, of course but it's like taking something that was already awesome. You know something that many people out there use, which is Gmail and Google Docs, throwing some really useful AI on it and wow, it's just it. Like as a user of Google Docs myself all the time, some of those features are just so natural, like once you just use them once or twice, it's like, oh, now I can't imagine my it just becomes part of your workflow, yeah Well yeah, and I think at some point it's not even a feature, it's just embedded in it and we don't really see it anymore.

Speaker 3: 32:46

But I think one of the one of the powerful things I've seen folks do with AI and Justin described it here is how do we take requests in natural language, translate those so that we query our system of record and then hand back meaningful data? And so sometimes the AI is not necessarily actually processing the data and it's not doing generative things with it. We're using it as a translation engine to help us say what we want in natural language, translate that into the technical language of our system of record to be able to turn data around, and that way you get the benefit of the generative part, the natural language translation, but you're not getting the hallucination part, because you're turning this natural language into a deterministic question to your system of record. Right, and that is where we see a ton of value coming from AI. It's like how do I interact with my systems in a way that doesn't require me to have to speak to them in their language?

Speaker 2: 34:01

Yeah, that's a really good point and that's something that I wanted to ask.

Speaker 2: 34:05

This is almost like a thought experiment, justin, but just Yvonne kind of teed it up perfectly.

Speaker 2: 34:12

I've heard and I don't want to say three like maybe the last month, three weeks to a month, two separate individuals and two separate conversations. So the first one, who's an IT leader at a big company, was talking to me about like, yeah, this AI and the networking stuff. You know, when it figures out what the problem is, I just want it to fix it and let's go off into the sunset. And then I have another person that I'm talking to that's like, oh, I hope they never get on the hook of AI trying to actually change my network flows and changing traffic patterns and changing routes, because, you know, the moment something breaks is the moment we're going to have to stop using it. So these are like two really different stories. And you know, with the network is we've both, or all three of us really have been very deeply embedded in network engineering for a long time and it's uh I know yvonne and I just the other day were discussing automation like wow, is it still still a thing? Like we're not automating networks still, and it's like, still why?

Speaker 3: 35:17

are we not there yet?

Speaker 1: 35:19

yeah, I mean that's why autocCon exists. Right, they run twice a year for that very reason. Yeah.

Speaker 2: 35:25

Exactly. So how do you frame that with this AI thing of almost like de-risk? And you know, we know that, like if we go and we do something on the network that causes an outage that leads to millions of dollars in loss, then the business is going to say, hey, we don't do that thing anymore. Then the business is going to say, hey, we don't do that thing anymore. Or you know, they probably are going to go to the extreme to not do that thing anymore and freeze changes or something crazy.

Speaker 1: 35:50

Yeah, they're unlikely to take the time to understand the details of what went wrong. They're just going to ban what they perceive as what went wrong.

Speaker 2: 35:58

Well, they've got to buy Pentix, they know right.

Speaker 1: 36:01

Yeah right, you know I don't have an easy answer here.

Speaker 1: 36:05

I mean, the thing I would say is just, there's going to have to be some amount of checks and balances and you're going to have to build a little bit of the trust over time, whether the trust is in the human beings that are writing the code or the trust is in the AI engine that's behind the scenes no-transcript.

Speaker 1: 36:35

You know from Kentik and from our product strategy, at the moment we have no plans to actually make any changes on any customer's network. It's not really part of our roadmap or strategy. We have a partnership with your day job employer, william, with Itential, where we're more than happy to do some API integration with the two companies and let them handle the automation, because there's a lot more that they can do to build entire workflows, build some checks into the product and a lot of things that just aren't, you know, at least in the near term part of our roadmap. Right, and I think that's where you really start, where a customer can really start to build some trust in AI, suggesting changes at some point, maybe even going off and doing the changes, but I think the first step is just like all right, show me what changes you would make. Yeah, that passes the sniff test, or whoa no that's a hallucination.

Speaker 2: 37:34

Let's not do that.

Speaker 3: 37:34

Let's back that out and a lot of that just comes to having a check in the flow chart, if you will, or in the workflow and how you build the automation.

Speaker 3: 37:42

Yeah, I think a lot of like well, something bad happens, we just have to put in a system to be sure that that never happens again, instead of taking a more mature approach, like we've seen with SRE, where you know you have an error budget and you assume that there are going to be so many failures over the course of the year and you determine how many, what kind of errors your business can tolerate, and then you measure that and you use that error budget to help meter your risk from a technology standpoint, like you know, because when you've consumed all of your error budget, nope, we're not doing anything risky at all.

Speaker 3: 38:22

But if you don't approach it that way, then ultimately you're going to end up in an environment where nobody ever changes anything because they're so afraid of breaking it. And you have to build a culture in your organization that allows for some degree of error, no-transcript organizations and and what it is. You know, what is it to be a healthy technology organization? And I think that's an even deeper problem that that ai is is not going to solve necessarily, unless somebody asks like hey, how do I build a very solid, stable IT organization? It's going to be oh, you need site reliability engineering and here's what you need to do. And then they actually go do it.

Speaker 1: 39:39

But yeah, one of that also has to do with building redundancy into the system, right? I mean, you know a lot of the ways that SREs can build those error budgets in is having a redundant you know pod of applications, right, this one pod dies, this other one takes over. There's some redundancy built into that. That way you can have an error budget If you've built your network, the underlying infrastructure, to where you only have a single path. That single path is critical because there is no such thing as acceptable errors, especially like what I know in the Kentucky area there's a lot of healthcare companies, right, it's unacceptable for an emergency room to be offline for a couple hours while you do a switch upgrade.

Speaker 2: 40:20

That's just unacceptable right.

Speaker 1: 40:21

But there's no redundancy in the system and you can't have that switch offline. Then what do you do? Right?

Speaker 2: 40:28

This is true, yeah.

Speaker 3: 40:30

Failure domains and canary deployments and all of those you know like fundamentals that have existed and that are. You know we continue to iterate on best engineering practices, but those are some key ones.

Speaker 2: 40:46

You're right, though, like both of you had great points about the redundancy and building. You know the whole point is building your system for failure. So then when it does fail, nobody really notices. And I don't know if I ever used this example on the podcast before, but I was in this. I was up in like middle of nowhere, canada, like middle of nowhere, for a wedding, like many years ago, and I could not I mean, there was not many applications I could reach from my phone. It was pretty, you know, the service was like okay, but one of the apps that worked which is I still, it boggles me to this day was Netflix. One of the apps that worked which is I still it boggles me to this day was Netflix. I could get Netflix and I could actually watch, you know, tv shows and movies, and I always thought that was amazing.

Speaker 2: 41:32

Like they have built their platform in such a way that, like I don't know if I've ever had a Netflix outage. It's like very, it's highly available and there's so much out there. They've been good stewards of the technology that they've built and also good evangelists for how they've built it. They've published so much stuff on the internet. They had this great blog. I'll find it and I'll put it in the show notes. I remember reading it when it came out. It was about eventual consistency.

Speaker 2: 42:05

Um, basically, you know, being a streaming service, like when something launches, like how does that work? Um, you know, asymmetric, symmetric data replication. You know az's cross region active, active essentially across the entire planet, it it's just incredible and it really opens up. I know it's just a streaming service and it's not like a lot of the problems that the companies I worked for in the past are much more challenging. You have many lines of business, many different applications that do different things with different scopes and different priorities, some that might touch patients, others that might not. So those situations are much more complicated, especially in health care. But they've done a really great job of showing OK, this is what's possible If you want to put in the time, the effort. You know, like Yvonne was talking through the SRE mentality, you know the technology's there, you just have to be able to harness it and change your culture.

Speaker 3: 43:03

You know it's the technology's there, you just have to be able to harness it and change your culture. Well, the great thing about the Netflix example is, you know, marrying what your business needs to run and what the technology. You know what the business can tolerate from the technology, right, because if Netflix, I've had a stream die every now and then, and you know that's probably some container crapped out on the back end, and they just, you know, relaunched it when you try to stream again and then everything continues to run.

Speaker 1: 43:30

Put in some buffer and cached it locally on your device so that you don't even notice which is part of the key there too.

Speaker 3: 43:38

Yep, very much, very much.

Speaker 2: 43:41

It's amazing, very much it's amazing. But, like we said, like you know, I know I've I'm on this because I've just I've been, it's been such my experience. My frustration is like, yeah, there's still network devices out there that are in production, that are only reachable via telnet. Yeah, that's still a thing, that's we're in 2025 and that's still reality. And we wonder why we can't automate our networking, you know, and we wonder. But we have these gigantic networks full of how many vendors, how many interaction services, how many different ways, and then variety. Variety can really kill some productivity and it's not like you can flip a switch. You can't just say, okay, I'm going to refresh every campus switch or branch device that I have, let's go. It's not that easy.

Speaker 1: 44:34

Yeah, there has to be a business justification for doing those refreshes, right? It's not just because the CLI is old and you need a more modern one, right? You're going to have to be able to spin that to your execs on, like, what are the other outcomes that this is going to help us? Right? And back to the conversation we were having about SRE, like, if we can improve our uptime, if we can come up with business outcomes that are going to help justify the spend to refresh that equipment, that's different. But just going to your execs and saying, hey, I want to spend, you know, millions of dollars to refresh my gear because you know, CLI only allows telling that they're like I don't care.

Speaker 1: 45:09

That's not a business priority for me.

Speaker 2: 45:11

Yeah, exactly, and I have one thing I want to fit in here at the end. I know we're probably coming up on what should be in time here. But, vendor, like you have customer responsibility, responsibility. Like you have to be a good steward of your network, you have to be on top of things, you have to try to set the culture and do what you can with the funds that you have. But like, where, what is a vendor's role in this ecosystem?

Speaker 2: 45:41

And you mentioned, like our teams, you know kind of doing cool things together and that's one of the reasons I like working where I work is like we we have a lot of flexibility with the way that, like we secure and we just do integrations with third party stuff. Like we have full support for, like you know, open API spec schema swagger. You know we we support the different authentication mechanisms, like you know, oauth to mutual tls, oidc and you know, on top of the basics, uh, for authentication, like you know, token based and and such um. So we have that flexibility there you know, to really enable, you know, environments that do have a lot of stuff that like need some, they need integration, like they need to be able to take the old but also work with the new at the same time. But like what? What do you see is like how do vendors move into the future with how they think about this?

Speaker 1: 46:41

well, I think as more and more things move to sass, we're going to see more and more things be api first. So a lot of the things that you described are because these vendors support swagger definitions right and very well defined, very standards based api to get access to the data or to the various functions that you know, whether it's your company or anybody else needs to be able to integrate with, and that's in contrast to some of the stuff we were talking about. If the only access you have to the device is to tell that in and use you know an old expect script or something more modern, you know that is interacting with that CLI and looking for certain returns back from it and so forth that's going to be a very brittle and very fragile way to interact with it versus a much more modern API approach. And so the more we see these systems moving to SaaS, I think you're going to see a lot more of these integrations be possible.

Speaker 2: 47:37

Yeah, 100%. Yeah, you make me laugh because I was just messing with a problem where I kept having to extend timeouts in different places. So we're like I could get this response to go on to the next thing and this response based on the stuff that I was trying to automate.

Speaker 1: 47:52

It's something's never changed, but anyway yeah, once you've dealt with a modern API, that stuff becomes really frustrating to deal with, right yes, it does.

Speaker 2: 48:04

So I guess, as we wrap, where can the audience find you or connect with you if they want to, justin.

Speaker 1: 48:10

Yeah, I'm on LinkedIn or X or Blue Sky. You can just search for Justin. Last name is Ryburn R-Y-B-U-R-N. You.

People on this episode

William Collins

Host

Eyvonne Sharp

Co-host