The Art of Network Engineering

What is IS-IS?

Andy and Friends

Send us a text

Most network engineers know BGP, OSPF, and maybe EIGRP, but far fewer have hands-on experience with ISIS. In this episode of The Art of Network Engineering, Andy Lapteff sits down with Russ White and Mike Bushong for a deep, opinionated, and refreshingly honest discussion about routing protocol design in modern data centers.

We explore why BGP has become the default hammer for every networking nail, what we lose when we blend underlay and overlay into a single protocol, and why some of the largest networks in the world still rely on IS-IS for simplicity, scale, and resilience.

This isn’t a “which protocol is best” argument, it’s a design conversation. One about failure domains, operational reality, education gaps, and why many engineers never learn the protocols that quietly power hyperscale networks.

In this episode:

Why BGP is policy-rich but intentionally slow
The architectural value of separating underlay and overlay
How ISIS works and why it’s simpler than you think
TLVs, scalability, and protocol evolution
Why familiarity often beats good design (for better or worse)
Where RIFT fits and where it doesn’t
The cost of losing deep protocol knowledge as engineers retire

If you’ve ever wondered why networks are designed the way they are, or if you’ve felt uneasy about “just using BGP everywhere,” this conversation is for you.

Subscribe for more conversations where technology meets the human side of IT.

This episode has been sponsored by Meter. 

Go to meter.com/aone to book a demo now! 

You can support the show at the link below.

Support the show

Find everything AONE right here: https://linktr.ee/artofneteng

00:00
This is the art of network engineering,  where technology meets the human side of IT.  Whether you're scaling networks, solving problems, or shaping your career, we've got the insights, stories, and tips to keep you ahead in the ever evolving world of networking. Welcome to the Art of Network Engineering podcast. My name is Andy Lapteff, and in this episode, I am joined by oh luminaries, kings of their field, if you will. We're going to start with  Mr. Michael Bichon. Hi, Mike. Mike, you're missing your crown. I'm looking.

00:30
No, I'm sitting on it. The wheels have already fallen off. Hi, Mike. Have you said hi yet?  Hey, Andy.  Good to be here. Thank you for stepping in. had somebody uh dip out and Mike jumped in. thank you. Russ White, sir. How are you? It's great to see you. I am a year older. uh

00:52
Russ, I looked up the last time you were on the show and it was almost three years ago to be three years in January. So that's been way too long. And I listened to the episode and I learned more stuff. So you're one of those people that I learned something from every time I talked to them. Anytime you want me on, you know how to find me. I'm always around doing something. Listen, pal, I'm still waiting for my hedge invite, buddy. uh Podcast wars. That's doable too.  You know, that's fine.

01:20
I've been trying to get Mike on the hedge. It's always a pain to get him, you know, he's traveling so much. It's hard to get him scheduled. I love your show. I love listening to you communicate about networking. It's really, it's always a fantastic time. So this episode is going to be me kind of picking a fight with you about ISIS. And I say that lovingly. Kevin Meyers, a friend of ours, we did a BGP episode and it blew up and people were in love with it. And I thought, wow, people really want to talk for an hour about a protocol. That's interesting to me.

01:49
So, but it was a good conversation and we were beating up on BGP. think he quoted you as having said, BGP is the trash can of the internet or trash can of data center, whatever it was, right? The  internet, yes, it is.  Right. We just throw BGP on everything. And so  somehow in that conversation during the BGP episode, ISIS came up and I'll be honest with you in all my years of, I mean, have certifications from multiple vendors.  I've, you know, I've managed the things and studied the things and I've never come across.

02:18
is is so I think the framing here would be Yeah, I know. So the framing here is it's a design discussion, I think. And really, why do we need is is right? I've never used it. And I've managed fortune 50 global data centers. So it must not be right. I'm being provocative, right? Like, we don't need this thing. What did you know, change my mind, but I'd like to talk about routing protocols, like Mike and I were talking earlier, and we were, you know, we were thinking about some different scenarios, like our analogies, you know, like we don't think about

02:48
what the pipes in our house  are made of, right? Like, I really don't care. I just want the water to run. And really, if we think about networking infrastructure, it's just plumbing, right? Highways are another analogy. But I don't know if I care what the pipes are made of. I don't know if I care what routing protocol you're running. They all kind of do the same thing. I mean, I've run rep, EIGRP, OSPF, BGP, right? EDP, NVX LAN. To me, they all kind of do the same thing. Now, I'm provoking you because I know you're Mr. Design and you're going to tear me apart.

03:17
But  it's okay. don't know where we want to start. Do you want to start with why would you pick one routing  protocol over another? Like who cares which one you use? But I really want to jump into ISIS because they know nothing about it. Okay. So I'll begin here. First of all, BGP, Tony Lee got really mad at me one time when I said this,  because I said it  in his hearing. BGP to me is less of a pure routing protocol and more of a policy.

03:46
distribution system that happens to do shortest or happens to do loop free paths sometimes. I mean, 99 % of the time it does loop free. I don't know if 99 % is the correct amount of time, but BGP is slow, extremely slow, extremely policy rich. If you ever throw, I don't know.

04:13
120,000 routes at BGP and a dense topology and try to pull the routes out. You will find that either it will not converge ever, or you will find unless you do the correct design work, or you will find that it converges very slowly, like minutes. But that's by design, correct? Because we don't want flaps on the internet to change all the things every minute. Well, it's because it's not a distance vector, it's a path vector protocol, and it's just

04:40
As radio would say, it's routing by rumor. Like it takes a long time. And because of the way it works and because it depends on TCP and it has a lot of overhead and dealing with all those 13 or whatever they are steps. Now we keep adding to the steps and the best path. You say relies on TCP, like that's a bad thing.  Well, it means it's a bit slower because you're actually one, one step up, right? It's more, well, I won't even say that it's more reliable.

05:09
Because it's really not more reliable than flooding in either OSPF or EGRP or ISIAS. But it is more multi-hop reliable. Like ISIAS doesn't do multi-hop. And you would really struggle to do multi-hop with ISIAS. How did it get in the data center? Because it was supposed to be for the internet. It wasn't supposed to be an IGP, right? Because Peter Lupkoff said, I need something that'll scale to 120,000, 200,000 routes. And I need something that'll do traffic engineering.

05:34
And at the time he came up with these requirements, number one, nobody believed that ISIS or OSPF could do 120,000 or 200,000 routes. There's always been this thing that these IGPs can't scale. Now that you can't do 1.2 million routes, but you can do, you can do 120,000 routes in ISIS or OSPF. Following what he, mean, so at that point though, a bunch of people standardized on, so it's been proven that it works. so the question is if.

06:04
the network works if  the data center works if  it scales  if it ain't broke.  Like does anybody care at this point? I know that you care but like  the average network architect I think might ask me why I care.  I've asked you before and so I  but  like  because to me it's not just is is right like like there's the whole rift discussion and there's like a

06:30
And the commercial challenge with all of that stuff for people who build product is that the set of people who care are, you know, infinitesimally small compared to the number of people who don't. Which raises a question, like if the average or even, you know, let's say above average person  doesn't see much difference or enough difference to go and  upend their architecture, you know, should we spend time on it at all? So a couple of things. The first is that

06:59
Most of the time when I sit down and spend time with people who are doing large scale fabrics or mid scale fabrics, if I explain to them my reasoning,  most of them will go, oh, I agree. We probably should run an IGP separate from BGP. But then the second answer you'll get immediately is, but nobody in my knock knows how to do anything but BGP. So therefore, no matter how good I think that is much better, the design I might think might be, I don't really care because my knock.

07:29
can't support it. They can't support anything but BGP. Now, I have honestly very little sympathy for that answer, but that's me. Hold on.  Is there anything more complex than BGP with all it with everything when it's doing everything that it does? So I have two problems with using BGP for all of it. The first is we are twisting. This is why I say BGP is the garbage can of the Internet.  We are totally,  you know, BGP was designed so that you had to manually configure neighbors.

07:58
because I'm talking to a customer, I'm talking to another network that I don't know anything about and it's very policy driven. So I want to  be very intentional. I mean, even the way that BGP is set up  by default, which wasn't always this way, by the way, which is why there's a lot of implementations that don't do this,  but the way the RFC changed it so that you have to have policy configured to advertise something. Everything is meant to be intentional. It's meant to be slow. There you go, Andy.

08:28
not supposed to converge fast, supposed to converge slow, supposed to be very intentional. And then we throw it in a data center and we go, oh, but it needs to converge faster. So you know what we're going to do? BFD. We're going to change everything, everything about the way it works. We're going to configure it so it looks like fancy rip. That's literally what we do. We're going to make it so the adjacent, the neighbors come up and you don't need to manually configure any peering sessions. Now at some point you're like,

08:58
This is a completely different protocol. Why am I building a completely new protocol on the same packet formats? Just because I can call it BGP. Is it because the people are familiar with it? Like to Mike's point earlier, who cares? And you said the NOC knows BGP. Like  I came up on a NOC and if you told us who all knew BGP, now we all have to learn ISIS and make it reliable. That's a  lift, right? For people who  working 32 tickets a shift. Yeah.

09:22
It is the beauty of splitting it, of having an IGP and  an EGP is exactly the way we used to design transit networks is that I separate my infrastructure from my workload.  And today in the data center using BGP, my infrastructure and my workload are blended together in a single table and people play games. Oh, I can make the infrastructure EBGP  and the workload IBGP or the other way around. Various people do different things with this.

09:50
But to me, having them in separate protocols makes a lot more sense.  question. Is that to constrain failure domains?  you said earlier? Yep. Exactly. That's why. That's one reason.  Another, not just failure domains, but it's also security domains. It's also my attack surface is completely different. And one of the reasons I like ISIS in this role is because ISIS is not IP based. So you cannot, no, you cannot

10:19
send multi-hop ISI packets. Let's take your segue because when I looked up ISS earlier in my new teacher, Chetjie Pitey, the first thing I saw that exploded my brain was it's a layer two protocol and I refuse to understand or like the fact not that anybody cares what I like. How the hell is a layer three routing protocol operating at layer two? Please explain  and why. So originally  the thinking was exactly the opposite. One of the big concerns about  using

10:49
Dane,  which you probably don't know what Dane is, but  essentially when RPKI came out, one of the things I was working on when I was at VeriSign was why are we building a separate database? have this database called DNS. It's already distributed. It's fast enough. I can add a new DNS record, just like I have a CA  or a cert or an HTTP or whatever. And I can just add a new one.

11:15
that is the 501 certificate for origin authentication and stick it right in DNS. So now the router can query DNS and it's going to get a cached response back with the correct origin stuff. And it's just going to work. And people are like, no, you can't do that because you can't have routing depend on DNS and DNS depends on routing. It took 13 minutes for me to get lost. So I did pretty well. That's good for me. Following Russ.

11:43
So what does this have to do with building it on layer two? We went to RPKI and DNS and crazy town. Okay. So when I SIS was invented by Radia and Mike shand and  radio helped invent. Yeah. With Mike, I should have known the person responsible for spanning tree would  then something as terrible as I SIS.  I love her. I'm teasing. It's even funnier than that because

12:11
routing was she invented routing before she worked on routing. I shouldn't say she invented. She worked on routing before she did spanning tree. Yeah. Spanning tree was she didn't want to do it. Like she don't, she does not like spanning tree. I know she doesn't like ethernet. She doesn't like spanning tree. It's my favorite part about her. The thing she invented she calls garbage.  She likes ethernet,  but she's still like part of the reason she did trill. She and I did trill many years ago, tried to do trill was to get back to the original concept.

12:41
of routing, even at layer two, don't do this spanning tree stuff. And she doesn't, she doesn't, she's not ever been a huge fan of that. So you were there, let's leverage that. Why was the decision to create another internal gateway protocol made? I guess OSPF wasn't doing something they needed it to do? Nope, it's the other way around. So ISIS was designed first. uh

13:06
even really before spanning tree to some degree. And the reason it was layer two was because again, they thought, first of all, ISIS is not an IP protocol, right? It was designed for CLNP and CLNS. It was not designed for IP at all. even know what that stuff is. That was just before IP, right? was transport stuff before IP. no, it's, it's parallel. It's parallel.  It's at the same time, OSI stuff, all these, all these ISO protocols. At that time, by the way, there was Banyan Vines.

13:34
which was a layer two protocol that ran on top of something called VIP, which was its own version of IP. And there was Novel Network, which Netware, which ran on top of something called IPX, which was an analog to IP, but it was essentially a layer two. And so the thinking was you don't want a layer three protocol dependent on layer three. You want a layer three protocol. If you're going to carry layer three connectivity, it needs to be in a layer two protocol.

14:02
And so that was, I mean, that's part of the reason. wasn't the first IGP, right? Like again, why did they create ISIS? What was it for? So the first IGP was what was called the Hello Protocol. It's very similar to RIP, but without  the hop count limit and it crashed. It crashed bad. So they had a flag day and they replaced it with a link state protocol, not ISIS, but a link state protocol. And that developed eventually into.

14:31
the OSI, the ISO protocols and everything else  that resulted in IS to IS. So then why did OSPF win?  I don't, first of I don't know that OSPF is one, but that's another entire- Everybody knows it.  You and Rady are the only two people and three other people at hyperscalers that know ISIS. Well, I don't know. mean,  if I, most  of the large scale networks that I've ever touched, other than perhaps AT &T,

14:58
I think AT &T is the probably the lone standout on OSPF, but Sprint has always been. But the question is the same. So, ISIS kind of, so you create the separation, right? Separation of concerns and then that gives you some measure of resiliency. So that's a positive. Presumably, know, at some point, probably didn't matter at the beginning, but you'll get different scaling characteristics as a result of that. So, okay. But then OSPF comes along. So why create OSPF?

15:27
If I SIS is the answer. And  just so you know where the story arc is going. If you create OSPF because  I SIS isn't getting there. And then you use BGP because OSPF wasn't getting there.  How do you end up back at I SIS? Okay. So OSPF was created because way back in the day,  most processors were the late bit processors  and they didn't like TLVs at all. Can you tell me what a TLV is?

15:56
Yeah, a type length vector or a type length value depending on like how you how you want to  so Essentially on the wire I can tell you what I'm gonna tell you as part of the packet I can inline if you think about it as a grammar or dictionary I can inline the dictionary or the grammar I can say I'm gonna tell you this word and I'm gonna give you the context of how I want you to Understand this word in line

16:25
Is it a header bit that notifies you what's coming?  It's a header that tells you, right? Right. So think of Spanish or Greek. Greek in particular is a big one for this more so  and the conjugations in those languages, right? Like in Greek, oh, you know, if it's a female or a male verb, why would you ever care such about such a thing? Cause you want to know what word in the sentence.

16:55
the verb applies to or the adjective or the adverb applies to. And you do that with gender, gender differentiation and other ways of doing it. You don't do it by word order. In English, we do everything in word order. Where's my subject? It's before my verb. Where's my object? It's after my verb. Greek doesn't work that way. Hebrew doesn't work that way. A lot of languages don't work that way. You actually tell what the object is by the ending on the word tells you.

17:25
This is the subject. It could be the last sentence, the last word in the sentence. It could be the first. It doesn't matter where the position is. So this is the way a TLV works. I don't care where it's positioned in the packet. I'm telling you by conjugating it effectively, by telling you this is your metric. Your metric is going to be six bits wide. And it doesn't matter where I put the metric in the packet, because everyone knows TLV number one, whatever it is, 133, 132, that's going to be the metric.

17:55
So  this is great from a  flexibility perspective. And this we'll get into, right? Maybe perhaps we'll have time to talk about OSPF versus IS2IS and their flexibility as protocols.  But the guys who designed OSPF had two goals. Number one was strip the TLVs because they're bits on the wire. I don't want the bits on the wire. I want the protocol. mean, look, when I was in the Air Force, we did an entire

18:24
white paper, taking a sniffer and measuring the efficiency of sending an X  megabyte file, 10, 20  megabytes, because at that time we weren't doing gig files, over VINs versus  IPX with NetWare versus  IP versus OSI. We actually did a white paper where we measured which one put more data on the wire.

18:50
Cause we were running one and two and three meg per second networks and we were running inverse multiplexers on T ones. so chatty protocols weren't exactly what is fractional T one at 64 K. And that mattered where now we're running multi-games. cares? Right. at the time, ISIS was considered a much chattier protocol. want to make a less chatty protocol. So that was one reason.

19:16
For bandwidth constraints and also, like you said, the CPUs that were in use couldn't handle all the... That's correct. Yeah. So is your argument that that's all marginal now? so then the constraints... Well away marginal at this point.  And what's happened over time is if I want to support IPv6 in OSPF, I have to build a totally new protocol.

19:41
The whole protocol is built around the V4 address space. All of my fields and all of my packets are fixed length and set to 32 octets or whatever. So BGP could add an address family and ISIS you could do a TLV, but OSPF you have to refactor the whole. Yes, exactly. So what's happened in the long run is OSPF become a very complex protocol. mean, you have however many different types, you know, packet types and everything else. And now they've added TLVs to it.

20:10
Whereas ISIS, I mean, the biggest complexity with ISIS as far as packet format goes is,  oh, we messed up and put in six octet or six bit metrics.  we need bigger metrics than that. Okay. So we'll create a new TLV set to do these new metrics. So I'm a network operator and I want to add a new TLV type, let's say to support V6 or something cool that's new, Rocky V2, whatever it is. Is that a, is that an upgrade of the code? Can I just...

20:38
How do you upgrade a TLV? Does it come with the next version or is it a command? It's a version of the code. mean, you would  have to a new TLV type, get it standardized or run as an experimental or an opaque or whatever you want to do. But who drives that? Is that IETF? who, where would the new TLV, yeah. Okay. That's in the LSR working group. Correct. If the protocols, so let's say they evolve to overcome  limitations and kind of the load that

21:07
that those limitations bring. And then you end up removing the limitations.  If you've already moved on to whatever the new protocol is,  is there still justification to then make another change? And let me give you kind of a weird example. Cloud comes out. em Every board  in the world gives their CEO a cloud directive. You must go to cloud.  How are you getting to cloud? Companies all over the world perform the great lift and shift. They spend a bunch of money.

21:35
They rewrite, they refactor their applications or rewrite them entirely. Now their app, instead of running on-prem runs in the cloud. There's no difference in what the app does. Maybe it's marginally less expensive in the short term. then cloud proves out to be, you know, not quite the economic benefit that everybody thought it would have. Do they move their apps back? Like they don't, they just leave them where they are because you're like, I've already made, I've already incurred the cost once the benefit is marginal. not going to go back.

22:05
Yeah, this would depend on how marginal the benefit is, right? So do you think that the benefits are large enough? mean, like, because you are like an ISIS zealot, right? I like, are. I wouldn't say yeah. OK, go ahead. You look at maybe not as zealot, right? You're like, he's a zealot. I mean, you write on your sneakers, I heart ISIS. You put in like you're practicing like your last name, like I'm to be Mr.

22:33
You have a love affair with Isis. So do you think that the benefit is there? And then I'm going to set you up like this is a trap. So just be aware you're about to step into a trap. I got a follow up question that once you answer this one. So Isis was almost dead, honestly, almost no one was using it. was very, very, very few people, but several things happened generally around the age of the explosion of the dot com bubble.

23:03
and the growth of transit network, Brent and Sprint Link being particular, Peter Lothberg and his work being particular things that happened. um First of all, the original implementation of OSPF in Cisco IOS, classic, old classic Cisco IOS, was marginal. It was okay. It was fine. I don't want to put anybody down or anything, but it was not the most optimal.

23:31
implementation as far as shortest path first, flooding,  all the components of it. They were there. There was literally a thing on the Cisco online website that Don Slice and I had to go and spend six months proving was incorrect to get it taken off the Cisco online website, which was you should never have an OSPF flooding domain or an area larger than 40 routers. We had so many problems intact in global escalation.

24:00
with large, with flooding domains larger than 40 routers in our OSPF implementation, they literally put it in the documentation.  No flooding domains over 40 routers, no areas, like everything beyond this company, we distributed blah, blah, blah, all this other stuff. that a scale constraint of OSPF just full stop? It's a scale constraint of that particular implementation on that particular hardware. It's not the protocol itself, right?

24:27
So I mean, very careful here. The distinction between implementation and,  and the reason they had to remove it was because they, took an implementation detail because they had so much  weight in the industry in terms of what was  accepted and not  a post designed to  reduce tech calls  turns into a pseudo standard. That's exactly what happened. That's right. And so  during this time  sprint.

24:55
And all these other large providers, these transit providers, they were throwing a thousand, two thousand routers, five thousand routers in their networks. Okay, you design an OSPF network with a thousand routers, with a 40 router limit in your flooding domain. Okay, good luck with that, because that ain't going to happen. You just hit practical reality. So they just tried ISIS. Well, it just happens to have turned out that Tony Lee wrote the original ISIS implementation.

25:24
And there were some very good coders working on it, Hank Schmidt and some other people. So the ISIS implementation was beyond excellent.  was,  it was such a good implementation. the ISIS documentation  lead with that? Like this is better for scale. No, I'm just,  is that an accident? Tony's such and such, like how did they know to do that? They just tried it. They just tried it. Let me jump in with a question. So do you like ISIS?

25:49
because it's a better protocol or do you like it because it was implemented? I like it because I think it's a simpler protocol. Okay. So simpler is doing a lot of work for you there. Yes. Unpack simpler. It's rare networking, right? Yeah. So hang on. basically I figured out we can do five to 6,000 nodes in a single ISIS flooding domain. A single ISIS. Like you don't even need

26:15
flooding domains and ISIS necessarily depends on your route count and your processors. It's like an area, right? that? Yeah, it's an area. It's same thing. Can connect routing? Can you connect ISIS areas? Yeah. Yeah. Yeah. It's exactly like OSPF. You have an ABR,  blah, blah. There are some critical differences, by the way, that again, I prefer ISIS implementation for. So ISIS is a simpler protocol in that I have TOVs. I don't have all these types.  I'm not going,  oh,

26:45
It's a type five and it hits an ABR and the type five generates an EBIT and a type four with a type whatever it is with this and that. types and the LSP and the LSPF create a  I don't have any of that stuff, right? But aren't TLVs just a different route type? No. Isn't it the same thing? No. Everything is also backwards in ISI. It's from OSPF in some ways. So for instance, in  OSPF,

27:14
When I hit an ABR, my default is I create a type two, which is essentially says everything that I get in my type one to, yeah, it's not type two, it's type three.  See, this is why gets like, it's a summary LSA. So I create the summary LSA and I say everything that's a network LSA  or a link LSA or a router LSA gets summarized. And the way it gets summarized is I take the metric and I attach them to the type

27:44
three, the summary as if all those things were connected to me. Right. That's what I do. Which seems like a good idea because you're summarizing routes, which you're doing some routing table, right? Like, I mean, this is all good. Right.  And so I do it in both directions. Area zero into my outlying area, outlying area into area zero. And if I want to only have a quad zero in my outlying area, I've got to configure all that. I got to make it a totally stubby. I got to think through, am I going to redistribute? Does it need to be a not so totally stubby? Does it need to be a

28:13
totally stubby, does it need, like there's all these different things I got to think about. In ISIS, you put the same intermediate system in both flooding domains with net statements and it's automatically a totally stubby area. In fact, it's a totally not so stubby area. It only sends a default from the level two domain into the level one domain. And then you summarize just like you do in OSPF from the level two, one domain into the level.

28:41
So you're outlying area into your area zero. like everything is backwards. I have to intentionally  leak the routes.  It's embedded in the logic of the protocol instead of internal the nerd. I just do it and boom, I'm in a totally not so stubby area. Like the most complex thing I can configure in OSPF is the default. So the configuration is simpler. It's less complex. It's not even just simpler. The protocol itself is simpler in that way to me.

29:10
I was going to do more like on its own. em then if somebody was like, if you had your way or you teach classes, you build out curriculum, you educate people on how to do architecture, you're  very committed to having people understand the why.  Because the the recipe model of  building networks leads to

29:33
misunderstanding and complete architectural abdication. And then you end up with things where Amazon goes, you Amazon West goes down. Yep. And an entire company is unreachable because they didn't. They basically said we don't own the architecture. That's Amazon. Yeah. If you were building, let's say you were educating the world today. So let's say there's that's the magically we make networking cool again. And I this wins. And then there's a

30:01
50,000 network engineers that enter the  workplace over the next couple of years, and you're going to be training them all. What do you tell them? Like what protocol you're like, this is the architecture you should go with. What do you tell them? I would always do an ISIS underlay with a BGP overlay personally. And I'm not going to say that you shouldn't ever use OSPF. That's not, I mean, that's okay if you want to use it. It's just not my preferred way of going about it. It's like using BGP alone for all of your data center fabric. You can do it. I know how to design it. I've done it many times.

30:32
Is it my ideal? No, it's not really my ideal because I know where the holes are. But it's your position that if people were starting from scratch, learning everything from, from scratch, they would find ISIS simpler to deploy and then sort of almost, you know, by definition, simpler to learn. Yes. So when I used to teach OSPF and Cisco TAC, and then I would teach ISIS because they insisted I do OSPF first. Okay. So here's your trap then. You're in for the trap? Uh oh. That's right.

31:02
coming. I warned you ahead of time that I was walking you into a trap. So I would always teach OSPF first. And then when we got to the end of it, people would say, aren't you gonna teach us ISIS? And I would say just forget half of everything you know about OSPF and you already know ISIS. Who these weirdos who are asked for ISIS? Well, I guess. So Russ, you're on the witness stand. And so you are obligated by podcast law to tell the to speak the truth, the whole truth and nothing but the truth.

31:31
If you are indeed  in a  long term relationship with ISIS,  do you have like what's what's going on with your side piece rift?

31:45
Okay. So I used to say this all the time. RIP is not a routing protocol. is what you put on your gravestone.  No, Rift, Rift, not Rift. Oh, Rift. Oh, I thought you said RIP.  Oh no, no, no, no. Yeah. And actually RIP is a perfectly fine protocol and very small networks, but I never deployed at scale, but  Rift. Okay. So Rift is interesting. I like Tony P a lot.

32:09
And we, I'm actually on the Rift drafts, I suppose still. don't know. Maybe my name is still there. I've looked at them.  Humble brag.  Huh? Humble brag. say this to say that I don't hate Rift. It's not like I'm like down on Rift. helped design.  I'm looking it up in real time. I have no idea what it is. Routing in fat trees. But if ISIS was objectively better and simpler, why, like why look outside the marriage?

32:39
So  Rift  is essentially  an attempt  to make it where you can fire and forget in extremely large scale because Tony P and I have a bit of a, maybe a small disagreement  over  the scale to which you can push IS to IS with modifications. Because if you go look up distop flood, which Mike, you know all about distop flood. I think you can push IS to IS to

33:08
5,000 routers with distop flood in a dense environment and have a lot of routes in it and it's fine. So the biggest problem with doing that is that your top of rack switches, which are your cheap switches most of the time in most designs, you have very cheap boxes up there, have very small forwarding table sizes. So if I have to throw 120,000 routes at this thing or 200,000 routes at this thing, those poor edge switches are going to run out of gas really fast.

33:36
And not only that, they share their table size with their filtering tables. So Tony's perspective is that in order to save that memory and make those small boxes viable in a large fabric, you build a protocol that sends nothing but a default down from the spine towards the top of rack. So you reduce the table size. So he basically took IAS to IAS, he took distop flood plus IAS to IAS, and then he said, well, you know what?

34:05
Is there a way for me to only send a default from to each one of the top of racks? And so he figured  out that he could do basically a distance vector from the spine to the leaf and a link state from the leaf to the spine and make an incredibly scalable fire and forget routing protocol. And it gives you all sorts of interesting things like it does. So this upload originally many years ago,

34:34
had the ability to calculate which stage I'm in. I could actually figure out from the router's perspective, I could say,  oh,  I am a top of rack.  Oh, I am in the spine. Oh, I am in the fabric. I am up at the plane layer. I could actually figure that out. um It's not actually that easy.  It's mathematically impossible as Ivan Pepinyak read my draft and he went, what you're doing doesn't work. And I'm like, uh-oh, that's a problem. So I spent some time with Ivan.

35:04
Yeah, how to make it work. Thank you. This is all right. Correct.  No, no, no. was before this was just a flood.  So that's what I was getting at. You mentioned that a few times. I don't know what this stop flood is. Is that a, it's a flooding optimization draft for ISIS and OSPF. Dist off. we were saying this dist distributed optimum flooding. Thank you. Yeah. And so, um, he took all of that work and he added it to his.

35:33
Distance Vector plus link state and came up with Rift. Rift has a lot of really cool properties. My concern with Rift, I suppose, is, I'm not really sure we need it all the time. Like I'm much more of a just use what's already there. Why am I, like you said, Mike, right? Like it's there. Everybody's implemented ISIS. Well, it's there for scale, I think, right? Once you get over that 5,000 ISIS unspoken limit. Okay. Well.

35:59
If you're over the 5,000 router limit, you have other problems. But that's your whole bread and butter, right? Don't you specialize in hyperscale with the bajillion routers? I thought that was kind of thing. if you have a single fabric that's large, and I'm saying that in a single flooding domain. It shouldn't get any Yeah. A single flooding domain shouldn't get that big, honestly.  When you talked about how big those route tables and the top Rx switches were, I got a little confused. Like, why are they that big? And then I thought, I assume maybe that's just hyperscale world.  Yeah. Right. If you start thinking about...

36:28
If I have a 2,500 router, five stage butterfly, and you start thinking about if I have 120,000 edge port, all of a sudden I can have 120,000 routes, right? Now, theoretically I shouldn't in ISIS because what I should really carry in my underlay is my loopbacks. Just so BGP can build the BGP sessions and I can build tunnel tails and heads. I should never ever carry workload routes.

36:56
My edge route should not exist. Any of my workload ports should not exist in my underlay. So you don't think we need Rift because your network should never get bigger than a 5,000 ISIS. I wouldn't say we don't need it. I would say it fits a, it's, it fits a narrower slice. Very small. Number of environments that are going to operate at that scale. And Tony, and Tony, the way, will disagree with me. He'll say it's the opposite. And that's cool. That's fine.

37:21
Yeah, but who's right? is always an ongoing discussion. He wouldn't say that it's widely applicable in all. He would say that the protocol can work at subscale, but he wouldn't say that everyone's operating at that same scale and that the same scale requirements have proliferated broadly. Yeah, he really targeted RIP at the 2500, 3000 router fabric is what he really targeted it at. If you're down at a hundred routers or even up to about thousand routers,

37:50
I think he would say, just use ISIS and be done with it. Like this is just not the only other thing that was kind of cool about Rift is the ability to deploy EVPN,  kind of a lightweight cheating version of EVPN. And you don't even need BGP. There is a way to do that with Rift. Blasphemy. Real EVGP. How many people in the world,  mean, I'm ballpark, obviously not specific,  like  know enough to express like a uh

38:19
strong opinion about the protocol choices  in the underlay. Because you have this discussion, right? But if you go to even like very large  networking players, like the number of people that can have that conversation and actually have an opinion.  a lot of people can  recap, know, here's how a thing works because they've read about it. But to know it well enough to say,  and here's why I agree or disagree, and then I'm willing to debate on it. How many people can do that? It's not very many.

38:49
And I find that unfortunate. find that a failure in our network engineering training world myself. Do you think it gets better or worse over time? I think it's gotten worse over time. But like from here, like at some point, do think it, like the protocols become sexy again? No, I don't think so. I I don't think so. I think, I think we are moving out of that realm.  Um, I think we're going to move. Now, see the thing I like about running ISIS is an underlay is you actually don't have to know much other than just basic troubleshooting.

39:19
It's literally a fire and forget protocol. Like you turn it on, you tell it to run on everything except your workload ports, which is probably that in the running the net addresses, figuring out the net addresses  are the two most complex things about running IS to IS. And you just let it go  and it just does stuff. And all of a sudden you have IP reachability, V6 and V4 for all your loopback addresses  in all your routing tables on the whole fabric.

39:47
It just works like it's not very difficult to do. It's actually really simple. I'm building, I probably shouldn't talk about this very much, but I'm building a test topology I've been trying to play with. And the I-Style configuration in this little, I'll say more than a thousand router test topology I'm trying to build is eight lines of code and one line of configuration for every interface. mean, seriously, there's nothing. My BGP configuration is 20, 30, 40 lines of configuration on every box.

40:17
80 lines, I size it's literally like. Router is is lab net or whatever you want to call it. You have to give it a name, a net address and. Oh, that's it. And then on each interface you say run is is okay. I'm done. And all of a sudden you have a routing table. Like it's really pretty simple. was funny, Ross. I'm thinking of how much pride I used to have in my complex BGP configs.

40:46
with my prefix lists and my complicated route maps and you know, all kinds of other stuff we're doing and BFD to make it faster. And we better put you DLD for one like, like it was just, it was so complicated and I was so proud of it because look, look at what we've done. Like, Oh yeah. There's a lot of that network engineering.  There's a lot of it. And yeah, it's cultural, think. Right.  And I think it starts. So for me, and where, where I'm going with this is

41:10
My peers and I have never come across ISIS and any of the vendor trainings that we all consume to grow our careers. I don't understand that gap. I don't understand why I've spent a bajillion hours on EIGRP, BGP and OSPF. And hearing what I'm hearing from you about ISIS, I don't know why our educational system is doing this, that the service, if everything you've said is true, which I believe it to be.  think it's people are scared of it to some degree. They've heard really

41:38
Oh, the net address is so long. Oh, it's so complex. It's, it's actually a very simple protocol. In fact, the training is to the point that like, who has it? Somebody and I wrote, I should remember all these things, right? A book on IS to IS, which Mike Shand wrote the intro for, one of the people who invented it. And that is the book that most vendors buy when they want to train somebody on IS to IS on the protocol. And it's like,

42:07
15 years old, 10, 15 years old now. Nobody ever writes anything about it. It's completely. think it's  the story you told about like the Cisco, the OSPF, you know, break point with 40 routers. Like to me that that's a, that's an interesting  symptom of  the actual root cause. The number of people who have enough confidence in almost any of the  really deep networking topics, it's declined over the years. m And, and as it declines, then you end up with.

42:35
you know, anecdotes and stories and fears and whatever they travel further and are more durable than truth. And then  it will get worse because I think, you know, we are in a bit of an attention span economy. know Russ has talked about this a lot in other forums.  If people are no longer willing to invest the time required  to learn the thing,  then they'll never develop the confidence or conviction to do anything different than what is done.  If that's the case, right? Are the operational concerns, do they outweigh

43:04
architectural benefits and even the operational benefits, by the way. It's an interesting question. And Andy, I mean, when you and I were talking earlier, I'll share  the same analogy I shared with you earlier. Like,  you my wife and I, we,  and I always say we because I don't want her to feel bad. It was her though.  But we, we, we bought a new car, you know, I don't know, five years ago, or whatever it was, we got a Subaru Ascent.  And, you know, I'm not a car guy. I don't know anything about cars. uh

43:34
We got this car. It was nice to drive or whatever. Turns out  it was the first year of this new  like powertrain, like new transmission, new whatever.  And then, you know, does that matter? Like, I don't know much about cars as long as you know, to me, that's plumbing, right? As long as it works. I just want the outcome of being in a car like that. What the outcome of getting from point A to point B. So I didn't care about the details. And then you know what happened? The car broke down when my wife was driving with our two kids,  toddlers at the time.

44:04
and three dogs and it broke down on the freeway and I wasn't there. And then good luck getting a tow and then getting an Uber to pick you up on the freeway to pick up your two kids and your three dogs. That's impossible. So they were stranded for hours until they found a enterprise rental person who was willing to come pick you up in a car large enough to handle everything. But that was like a crazy ordeal and trying to juggle like with the tow driver or whatever. You know, that's that's that's crazy stressful.

44:34
And so we're in the process now of evaluating a new car and you want to know what questions I'm asking about? Like, tell me about the transmission. It turns out the plumbing,  does matter. You just have to know that it matters. And then once you find out, once it breaks, like it's,  it can be catastrophic. And so for the people who don't peek under the hood, the people who don't take the time to understand, I think there's some value in abstracting out the detail. not suggesting you have to go down or that everyone in the company should go down to that level of detail.

45:02
But I do think there's an argument to be made because, and then here's my thrilling conclusion from my analogy. The reason electric cars are interesting, know, partially, you yeah, there's a power or a climate impact or whatever, but like it just has fewer parts. And if you don't care what's under the hood, then you miss some of the actual value that comes out of that. know, ISIS to me is the equivalent of an electric engine, right? It has fewer parts. It has fewer parts. That's exactly right. And there's a couple of things there.

45:31
First of all, we don't realize that most we don't seem to think about most of the major failures we've seen in large networks recently have not been protocol failures. They've been interaction failures. Correct. Interactions between protocols and something like DNS. DNS in particular is very problematic. And again, like BGP, we throw trash at DNS. Oh, it's DNS distributative. We do anything we want to with that. Yeah. Till it breaks. Then all of a sudden

46:01
It don't work no more. And now it's a big problem and it takes a long time to reconverge. And the other thing I'll tell you about the decline, you talk about the decline of knowledge. Look in my world, when I was coming up in the IETF, I spent 20 years in the IETF and I don't know if I'll be doing a lot more of it, but, and in the NOGS and the Cisco Live and everything, I don't believe, and you're going to laugh, Andy and Mike or whatever, but.

46:25
I am not the world's best expert at any of these protocols. I'm actually pretty stinky compared to a lot of people that I know. And it saddens me that that is not only true, but that now most of those people have retired and they are gone. Like there are very few people. hit a problem. I, somebody emailed me the other day, like a week ago and said, I have this problem. I'm implementing this RFC and it's IS to IS and it's  um,

46:54
encryption for IS to IS on the wire.  And I don't, you know, you were a coauthor on the draft and this is, you know, I'm struggling with this little bit of it. So I read it I'm like, yeah, that wasn't my part of the draft. so I thought about it and I asked somebody, I know who's implemented it. And then I thought the person I ask is semi-retired. Who am I ever going to ask once they're retired? Who do you go to next?

47:24
And it's true for every protocol in the world.  I don't,  mean, BGP is the only one I can think of people who are under 25 or under 30 or under 40 who are as steeped in it as anything I would know or knowing more than I do. Do you think we'll go back to big closed systems, the big black box things? Oh, I hope not. Oh, I hope not. I know that's probably where we're trending. That's where cloud trends to a large degree.

47:52
But I really hope not because  I don't think it's good for the global internet. I don't think it's good for network engineering. I don't think it's good for companies personally. Like I don't know how you.  Yeah, I really, I hope not. I hope the open source, the open standards and open source vision can  weather whatever this storm is we're seeing and come back to life  or be serious. I don't, I don't know how to make it, but that's kind of my.

48:20
my belief. There's a lot of tribal knowledge that dies off as people retire and go off into the sunset.  I think it's incumbent upon those of us left here.  know, my little hope is that conversations like this might poke at, you know, some curiosity of folks and have them dig in. That's what happens to me. So as we're wrapping here, I'm looking up what network operating systems support ISIS that I can try to lab this with. Because what I'd like to do, Russ, because I have access to you and why wouldn't I?

48:47
I'd like to go lab this, fire it up because I've never done that and,  see what happens. Maybe we could talk about it. I have some labs  in  container lab that I can send you some small labs that run ISIS. I actually don't have any OSPF labs in container lab. They're all ISIS and BGP. um Again, not because I dislike OSPF, but I just find ISIS easier to configure. But this past year I've leaned into.

49:12
Linux and containerized NASs and SR Linux being one of them I see that it supports ISIS and that that'll probably be just you know based on where I work and what my interests are right now what I'll yeah so  most of my labs are in FR routing for a very specific reason because I give them out for training yeah and I built a lot of them before I started working for Nokia so you know it's it's kind of a thing I will probably rebuild some of them in SR Linux the other thing is that's gonna I don't know Mike might not like me saying this but

49:41
One thing about FR routing is that unlike almost any other protocol stack, I can shut off the protocols I don't care about. So it becomes very, very lean for a lab environment and building large topologies in the lab. It just is the way it works. Most of my labs, I only run IS to IS  and BGP. I don't run GRPC. I don't run Yang models.  All of that is shut off. And not because I don't like them, not because, you know.

50:09
It's fine. RIP is there. EEDGRP is, I just turn them off and I turn off PIM.  I mean, I turn off everything. And why? Because I can get the image down so small that I can push a lot of routers very quickly. It improves my lab speed, which has nothing to do with the quality  of SRL versus FR routing versus blah, blah, blah, blah, blah. I don't really care. I'm just trying to build labs. Right? When you turn them off, how do you know they're not still chatting?

50:39
And the reason I'm mentioning this so late in this episode, my friend Lexi Cooper discovered that most vendor NASs when you turn off auto negotiation with her oscilloscope, she discovered it's still sending auto-nag link pulses. And I thought, wow, you can turn off a protocol and it can completely ignore you and keep doing its thing. So I'm being half facetious here. It depends on the architecture of the NAS. Most commercial network operating systems do not have the heritage of FR routing.

51:09
which is good thing. Honestly, it's not a bad thing. It's a good thing because most of them are built as complete systems.  The protocols interact with each other. They have all sorts of APIs.  routing is literally a bunch of routing daemons that were written by different people  and thrown together under a single management plane just to make something.  That's  so that heritage has some good pieces.  The good pieces.

51:35
I can actually go into the daemons file, turn off RIP. And if you go into the router and you say router RIP, it says, what are you talking about? I don't even know what you're talking Router RIP. What is RIP? Like those commands don't even exist. So you're also an FR routing zealot. ISIS and FR routing. Not necessarily.  I like FR routing because I like it for labbing. It's lightweight. It's easy. It's open source. And I am a maintainer. So I do have to be a little. Well, now that I can ping you on Teams, sir.

52:05
And if I have any ISIS questions,  if you're open to that,  may be sure. Sure. Hearing from me. I'm sure that's what you want. I'm You're going to uh learn some SR Linux. uh Ross, is there any closing thoughts you have around this? mean, I would like to learn ISIS and I'm going to lab it because  that's what these conversations are about for me. A little bit of curiosity, learn something from somebody smart and then go.

52:34
touch it and see what it's like. Do you have any educational material around ISIS? Have you written books or done classes on it? Yeah, I a book and I have a training course here. I have a couple of training courses I've done. Can you plug the book?  somebody just look up ISIS Ross White, guess it'll come up. mean, look up ISIS.  Okay. It's probably one of the very few books, but you know what? might not be able to get it in physical format any longer. I know a guy. It's really out of print at this point, but you can get it in digital format, Safari books or whatever.

53:03
Yeah.  So no, not much else. mean, thanks for having me on.  I need to do this more often.  And Mike, I promise I'll learn SRL better and do some labs with it.

53:16
Thanks for coming on guys.  For all things, Art of Network Engineering, can check out our Linktree, Linktree forward slash Art of NetEng. We have new merch up. We have a community called It's All About the Journey Discord server. Thousands of folks in there  studying, learning, helping each other out. Hop in. If you don't have a community, check it out.  It's a great place to be.  As always, thanks so much for joining us and we'll catch you next time on the Art of Network Engineering podcast.  Hey folks.

53:42
If you like what you heard today, please subscribe to our podcast and your favorite podcatcher. You can find us on socials at Art of NetEng, and you can visit linktree.com slash Art of NetEng for links to all of our content, including the A1 Merch Store and our virtual community on Discord called It's All About the Journey. You can see our pretty faces on our YouTube channel named the Art of Network Engineering. That's youtube.com forward slash Art of NetEng. Thanks for listening.


Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

The Hedge Artwork

The Hedge

Russ White
Heavy Networking Artwork

Heavy Networking

Packet Pushers
Your Undivided Attention Artwork

Your Undivided Attention

The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin
Cables2Clouds Artwork

Cables2Clouds

Cables2Clouds
Tech Field Day Podcast Artwork

Tech Field Day Podcast

Tech Field Day