Living on the Edge: The Network Resilience Podcast by Opengear

The ‘Engine Room’ at PayPal; Site Operations and the Role of SRE with TJ Gibson, Director of Site Operations at PayPal

March 13, 2021 Steve Cummins Season 2 Episode 2
Living on the Edge: The Network Resilience Podcast by Opengear
The ‘Engine Room’ at PayPal; Site Operations and the Role of SRE with TJ Gibson, Director of Site Operations at PayPal
Chapters
Living on the Edge: The Network Resilience Podcast by Opengear
The ‘Engine Room’ at PayPal; Site Operations and the Role of SRE with TJ Gibson, Director of Site Operations at PayPal
Mar 13, 2021 Season 2 Episode 2
Steve Cummins

Site Operations at PayPal covers Incident Response, Network Monitoring, Site Reliability, and a whole lot more. Security is an ever present concern. AI and Machine Learning are disruptors on the horizon. As the person who runs the Network Command Center – the “Engine Room” at PayPal – TJ Gibson has a lot on his mind.

In this episode of Living on the Edge, TJ walks us through the complexity of running a large corporate network as part of a team of 600 engineers. We talk about the growth of Site Reliability Engineering from its early days as a “bug fixing team” to its current place at the heart of the organization, and throughout the business.

TJ has an interesting view on the three phases of a technology career, and how value is derived at each stage. From deep domain knowledge, to knowing - and being known - as a key contributor, and then taking a broader view that focuses on linking business outcomes and technology solutions. He also shares his thoughts on how SRE as a discipline will continue to grow as a career path for network engineers.

Mentors are important in any career, and TJ gives a hat tip to one of his first managers, Roy Santarella at Vigilant Minds, and also to Wes Hummel at PayPal.

Contact Details and Publications Mentioned in the Podcast
LinkedIn: https://www.linkedin.com/in/tjgibson/
USENIX SRE Conferences:  https://www.usenix.org/srecon
Range by David Epstein: https://davidepstein.com/the-range/
Freakonomics by Levitt & Dubner:  https://freakonomics.com/books/


Show Notes Transcript

Site Operations at PayPal covers Incident Response, Network Monitoring, Site Reliability, and a whole lot more. Security is an ever present concern. AI and Machine Learning are disruptors on the horizon. As the person who runs the Network Command Center – the “Engine Room” at PayPal – TJ Gibson has a lot on his mind.

In this episode of Living on the Edge, TJ walks us through the complexity of running a large corporate network as part of a team of 600 engineers. We talk about the growth of Site Reliability Engineering from its early days as a “bug fixing team” to its current place at the heart of the organization, and throughout the business.

TJ has an interesting view on the three phases of a technology career, and how value is derived at each stage. From deep domain knowledge, to knowing - and being known - as a key contributor, and then taking a broader view that focuses on linking business outcomes and technology solutions. He also shares his thoughts on how SRE as a discipline will continue to grow as a career path for network engineers.

Mentors are important in any career, and TJ gives a hat tip to one of his first managers, Roy Santarella at Vigilant Minds, and also to Wes Hummel at PayPal.

Contact Details and Publications Mentioned in the Podcast
LinkedIn: https://www.linkedin.com/in/tjgibson/
USENIX SRE Conferences:  https://www.usenix.org/srecon
Range by David Epstein: https://davidepstein.com/the-range/
Freakonomics by Levitt & Dubner:  https://freakonomics.com/books/


Steve Cummins (00:00):

TJ Gibson is the director of site operations at PayPal. In his 14 years there, TJ has worked in a number of technical roles and currently runs PayPal's network command center. So TJ, I saw one of your LinkedIn posts that you said a big part of your job is talking. So I guess podcasts is a natural outlet for you. Thanks for talking to us on the Living on the Edge podcast.

TJ Gibson (00:27):

Absolutely. Thank you so much for having me.

Steve Cummins (00:30):

Great. So first thing is, there's a lot of things I think we can dig into. Site reliability engineering, I think, is something that you're deep into. And I really want to get into that, but before we dive in, just give me a rundown of how the IT organization is set up at PayPal.

TJ Gibson (00:49):

I mean, as anybody probably imagined PayPal is a very large organization, but my organization, specifically the organization I belong to, I should say, is called site reliability and cloud engineering. And it includes everything you would traditionally see in sort of a legacy infrastructure and operations organization, as well as some platform measurement capabilities. My specific organization there is sort of incident response, network operations, site reliability monitoring and alerting and response. And then we also have a function called embedded SRE. That really is maybe the most closely thing aligned to a pure SRE role within that organization. But it's an organization of about 600 people today that it really has their fingers in everything. I think my boss likes to refer to it as the engine room of PayPal. It's where all of the capabilities come from, that our products really leverage to deliver services to our customers.

Steve Cummins (01:43):

Well, that's great. And a lot of us use PayPal a lot of the time and as with everything we're never happy when it's not working. So I guess it's thanks to you and the other 600 folks in your group that means we're never sitting there and cursing at the PayPal app. So, let's talk about this site reliability engineering. And I think, Google are often credited with being the ones that really did a lot of the early work and launched the idea of it, but I'm sure it means different things in different organizations. So what does SRE mean at PayPal and how has that changed over the last few years?

TJ Gibson (02:21):

That's a big question. I would say that SRE at PayPal really started before SRE was really a term that we were throwing around at the industry. I think in a lot of ways, the early days of it was almost looked down upon by traditional technologists, right? They looked at it like a bug fixing team or like somebody that was really just coming in to clean up some messes somewhere and as Google kind of put their thoughts into words and started to really define this as a niche within the industry is when we said, "Hey, there's so much overlap with what we're already doing. Let's look at this framework, let's look at this way that they're defining this role and find out how do we squeeze the maximum amount of value out of it." And we've really seen it grow from that point forward.

TJ Gibson (03:02):

Again, I think the way it's structured today has really grown beyond what traditional or pure SRE is. But we've really seen this conglomeration, I guess, or this aggregation of some of the things that PayPal needed, specifically. Some of the things in the industry that were really coming along in terms of technology and resiliency, and then bringing this framework of site reliability engineering as a practice, and really kind of put those things together to deliver what we're delivering today. And really when I think about our mission, it's not just providing these capabilities from an infrastructure or a platform perspective, but it's really ensuring that all of the products that we deliver and all of the capabilities that we expose to customers have this reliability, this resiliency, this fault tolerance, this usability kind of baked into it from the beginning. Much like, I think, the industry 10 or 15 years ago came around to the idea of information security and really that became a niche practice and still is a niche practice, but it's much more just a core element of a lot of products and services that people are delivering today.

Steve Cummins (04:07):

So do you think that that's one of the defining things of SRE? Is this idea that it's a holistic approach. It's not looking at one piece of the operation or one location, but really looking at the whole networking ecosystem and make sure it's covered sort of from A through Z.

TJ Gibson (04:25):

Yeah. I think the way that we tend to talk about it really is from a customer perspective and everything that a customer would expect from our products and services operate the way that they intended to be. I don't think that it's a niche in the way that we would think about security or networking or application development. But I do think that it's niche in the way that it brings the sort of core skillsets of each of those, or at least the core awareness of each of those domains and brings it under the umbrella and provides this lens of resiliency and operability and scalability kind of to the way that these things come together. I don't know if that was helpful or if that was a little bit too ambiguous.

Steve Cummins (05:03):

No, that makes sense. You mentioned an embedded SRE function and I'm curious, are you talking about that you have folks that are sort of out in the various operating units that are reporting back into your group or is that something else?

TJ Gibson (05:21):

No, I mean, it's essentially that it is a more of a dotted line from the SRE organization out to these various domains. I think a lot of places when they implement SRE as a specific discipline will look for filling a role within an engineering team. A lot of times when you talk about an agile team, you'll have your software developers, your product manager and so on. And I think a lot of organizations will bring that site reliability at that level. The way that PayPal has really approached this is more from a centralized aspect, but building these dotted lines between dedicated teams who have the awareness, not just of the domain that they're embedded but the business that they're embedded with.

TJ Gibson (05:58):

So within our payments organization or within our identity organization and really understanding how their products and their platforms are put together and bringing that SRE discipline or that SRE mindset to help them holistically figure out how do we make this thing the best it can be? How do we make it a great citizen within the application, so that as those handoffs upstream and downstream are happening, that all of this reliability and operability components are being accounted for.

Steve Cummins (06:24):

Got it. That makes a lot of sense. So in terms of security, which you mentioned is sort of a similar evolution, is security interwoven with SRE or do you still see that as a separate entity?

TJ Gibson (06:37):

I don't know that I would say it's woven into SRE. I think it's become more a part of everything we do. So yes, it's a part of SRE, but it's not something that SRE is bringing to the table. It's something that we hold ourselves accountable to. It's something that our customers and our business holders hold us accountable for, but really it comes down to the security aspect of it being interwoven to everything we do. if I could kind of project this forward in terms of SRE, I think where I see this progressing in the coming years is that we start to see a lot of these core things today that we would say belong to site reliability engineering. We will see them adopted and integrated with the ways people work when they're developing products, or when they're deploying a new network segments or new data centers.

TJ Gibson (07:26):

I think it will become more just a sort of foundational component of how we do IT, the same way that security has over time. I think we're in a little bit of a transition period here where SRE has kind of come into its own, it's become a mature role or discipline within the industry. And I think that will stay true as we go forward. But I think some of those things that SRE are bringing today will become part of core platforms and become part of the core workflows as opposed to always being a centralized single accountability kind of function.

Steve Cummins (07:57):

And I guess even the role that you have, although I focused in on this part of your job, which is the SRE, I mean, you really have responsibility for the overall site operations. Right. And SRE is a part of that. So, I guess it emphasizes that idea that SRE is a part of everything that you do rather than being looked at as a standalone piece of the business.

TJ Gibson (08:18):

Sure. And I think in a lot of ways before we actually started recording here, we talked a little bit about DevOps and SRE and what are these things mean and how do they play together? And when I think about my role specifically in that operation center, there's definitely a push to a more federated accountability model. Right. And so it kind of speaks to DevOps, but I don't know that either one of these things could really exist independently. I think they're flavors of the same kind of thing. Right. It's the way that we work and the way that we provide services and capabilities to our customers and making sure that we have the right mechanisms or the right levers in place to be able to respond when we need to, when something's broken, but also to be able to plan and design for failure and to be able to plan a design for customer outcomes.

Steve Cummins (09:05):

And I think, particularly in a big organization like yourselves with many moving parts, it makes sense that it covers across it. Do you see any drivers over the next few years that's going to change the way that SRE is implemented or you think it's just a continuing evolution?

TJ Gibson (09:25):

I certainly think that there's always going to be changes. I think, as we see more and more large enterprises really grab a hold of this idea of public cloud and really start to move workloads, I think I saw recently that Capital One essentially was declaring victory in their cloud journey. I think that brings an entirely new perspective on sort of large scale applications and site reliability engineering. But I also think going forward there're things that I think we haven't quite yet accounted for. I think machine learning and artificial intelligence is going to bring aspects to our technology stacks that we just don't have an idea of what that means to be resilient and what that means to be operable. And how it is that we use that sort of technology most effectively, I think SRE has a role to play there as well.

TJ Gibson (10:14):

I think when we look at some of the things that SRE is bringing to the table today, in terms of frameworks and structure and accountability, I think a lot of that we will start to see baked into our applications. I think there'll become more natural for people as we're going forward. And so that's going to, I think, create an opportunity for SRE to, again, maybe step up and up level their viewpoints, similar to maybe how information security practitioners have been able to up-level and make a tighter connection to the policy or the regulatory obligations. I think SREs will be able to continue to look for those opportunities to step up and bring sort of business aspects further down into the stack. And I know that's a little bit fluffy, so I'm happy to kind of call out some context there if we need to. Okay.

Steve Cummins (11:00):

Hey, I think when you're talking about trends for the future, if you're not being fluffy, then you're fooling yourself. Right. Because who knows what's coming up, but I think some of the trends you talk about, machine learning and AI, we all know it's going to have a big impact. And I think it's just that question mark of how quickly and in what ways, but it's a fair point, SRE is going to have to adapt to as everything else is along with that. So just shifting gears a little bit. So I'm always curious about career paths and how people get to where they are. So I know you started out in the Air Force, you worked in a couple of networking roles in some other companies, and now you've been at PayPal for a number of years, and you've talked to a lot of technologists in that time, as you mentioned to me, and you have this idea of how the three phases in a technical career. So maybe you could just sort of talk through that a little bit.

TJ Gibson (11:57):

Yeah. And I wouldn't say that it's really just limited to the technology career field, but that's obviously where my experience is. And the way I try to talk to my folks about it is, and I think early on in your career, the value that the business perceives from you as an individual, from you as a technologist, really boils down to how much you know. Like what do you know? And I think the deeper you know a particular technology or domain area, the more technology or domain areas you know, the more your value is to the organization. I think you hit a point probably somewhere in that five to 10 year point in your career. And it's obviously very different for everybody, but at some point if you could call it a journeyman, I think you hit a place where it becomes about who you know. It's not possible to know the depth that you have to, it's not possible to have the breadth that you need, in order to deliver the outcomes you're being asked to deliver.

TJ Gibson (12:50):

And so the way that you provide value to the organization becomes much more about how do you bring people together? How do you find the right answers and the right resources within the organization? And I think for the most part, technologists do this pretty well. I think it makes sense. I think it's a pretty natural evolution. I think where a lot of people tend to get tripped up is in that kind of third step, but it becomes more about who knows you. And yeah, there's an element of politics there, every place no matter how apolitical they claim to be, there's always politics, but I don't think that's the core of it. I think it's really about demonstrating value. And so couple of things I try to tell my people are, what is it that keeps your boss up at night?

TJ Gibson (13:28):

Those are the things that you should be looking for solutions for. Those are the things that should also be keeping you up at night. You need to be looking for, how do you bring your talents, your experience, your responsibilities to bear on that problem statement, to help solve it. And I think that's where a lot of people tend to struggle with wrapping their heads around it. It's not that it's incredibly difficult to do, but it's certainly a mind shift. And so when I say, who knows you it's about who sees you as the problem solver who sees you as the one that's going to bring your experiences and skillsets to bear on my problem and understand the context quickly, get to relevance quickly and be able to help me find solutions that technology will solve for my particular business problem. And of course, we're doing all of these things throughout our career.

TJ Gibson (14:12):

But when I think where I see people really kind of get stuck in a rut, that 10, 12, 15 year mark in their career, is because they're struggling, maybe I shouldn't say they're struggling, where they really start to have challenges is when they start to get uncomfortable. When they start to have problems solving business problems or connecting with their peers that are stakeholders. They tend to fall back on the things that have worked for them in the past. And so they might go out and say, I've been in the career for 10 years or 15 years, the technology has moved so fast and I've been thinking at this higher level, my problem is I need to go learn the latest and greatest.

TJ Gibson (14:49):

I need to go dive in deeper into whatever technology stack is in front of me. And that is always valuable, but that's not the only answer. I think what the business is looking for people, technologists specifically, as they reach that higher level in their career is more of a connection back to how technology solves those business problems. Where very early in your career, it's very much about how to implement a particular solution that's been given to you or that has been defined.

Steve Cummins (15:16):

Yeah, it's very true. We tend to focus on the technical skills, right. But I think as you move up in an organization and you sort of broaden your influence, it really does come down to being known, knowing how to get things done. I mean, for me personally, this always flags up. When I move companies, as I've done in the past and you're in a new job, and you realize at that point, how important it was to you in your previous role, because you knew everybody and you knew how to get things done and people knew who you were and if you change companies or organizations, you often have to relearn that. So I think the three phrases you describe make a lot of sense.

TJ Gibson (15:57):

Yeah. And I think the important thing to call out is that it's not that you're leaving the technology career path, right? It's not the technology becomes secondary to the value that you're bringing to the organization. It's about being able to layer on that relationship skill set, that business understanding skillset, that translation skillset in a lot of cases to get from business outcomes into technology solutions. And so it's not about leaving technology behind or ignoring it or not being hands on keyboard anymore. It's about being able to make stronger connections back to the organization, back to where your customers are and back to where the value that you're delivering to them is coming from.

Steve Cummins (16:37):

Yeah, for sure. I think that all makes sense. So having just said how important it is to look at the non-technical side of it, I am curious about the technical training side of things. So in terms of that sort of technical... there's a number of certification path, like I guess CCIE is sort of the most well-known one. And recently Cisco added a DevOps section to it or DevNet section to it. I'm not aware of anything similar that focuses on this SRE or a liability or even site operations part of it. Is there something out there that a certification path that people would follow or is that something that you think needs to be worked on?

TJ Gibson (17:25):

There are some folks that are offering different certification paths for SRE, but I haven't seen anything really rise to the top as kind of an industry standard. And I think if I look at kind of how the industry is handling the learning and the training around SRE, USENIX has every year a conference they call SREcon. And I think if you look at some of the agendas or the seminar schedules, excuse me, schedules for these events, these conferences, you'll see the diversity there, right? You can have some very deep tracks on how to use machine learning to bring better insights from your observability platform. Some very deep tracks about how do we build networks for resilience and how do we handle this massive global scale? How do we deal with hybrid cloud?

TJ Gibson (18:10):

But you'll also hear a lot of higher level things. How do we do problem management? How do we do root cause analysis? How do we actually understand where our customers are coming from? How do we determine business logic failures differently from systemic technology types of failures? So I think in a lot of ways, the industry is still trying to figure out what SRE looks like. I think there's still a lot of definition of discovery that's happening. It's much more solid than it was in years past. But I think, again, even if you look at some of those conference schedules year over year, you will see kind of the focusing or the narrowing down of their scope or the solidifying of the way that they're viewing that particular industry career field. So I think when it comes to SRE, if that was of interest to some particular technologists, I don't know that I could point you in a particular direction.

TJ Gibson (18:58):

I think what I would say is breadth at this point in time, I think is extremely important. I don't think it's enough to have only an application development background or only a networking background and be able to step cleanly into an SRE career field and be successful day one. I think you really have to understand all of the sort of constituent parts and how they play together towards the business outcome of reliability, of operability, of performance. And so I think what I would suggest is go get some breadth, right? And this is happening a lot in our data centers already. We're already looking at how do we build our on-prem data centers to be more abstracted, to have the hardware, the physical infrastructure more abstracted for the users, very similar to how you would see an application deployed in AWS.

TJ Gibson (19:46):

So I think if you can kind of start to spread out a little bit and start to look for how can I bring in some of these elements of development and some of these elements of business awareness and some of these elements of observability and response. I think if you can bring those things into your purview and into your skillset, that's where you really start to have kind of the building blocks for being a good site reliability engineer. And I think the questions that I would be asking for, the opportunities I would be looking for internally, is how to leverage all of these things together to deliver reliability as a service or operability as a service or observability as a service kind of thing. I know that's very fluffy, but again, I think the industry is, at least this particular career field, is still in a bit of early stages.

Steve Cummins (20:33):

I actually, I think it's spot on. The pendulum the last couple of years seems to have swung back a little bit. Everybody wanted to be a specialist in something right. Wanted to be very niche-y. And the word guru and ninja was used far too often for people. But it does seem as though the pendulum has swung back. And there's an understanding of the value of a little more of a generalist approach where you can bring in parts of different skillsets or different backgrounds to combine them and get something done. And I think the world of SRE is probably a perfect situation when that make sense.

TJ Gibson (21:13):

Yeah. And I don't know if I would go so far as to say SRE is really a generalist career field, but I do think that having that generalized background that's really the context you need to be successful in SRE. It's not enough to understand how to build the most reliable, most resilient most scalable network, if the application on top of it doesn't know how to consume and use that benefit. And so really understanding at least at a base level, how all of these things come together and where the strengths of one particular domain compensates for the weaknesses in another domain. So I think you have to have that generalist mindset, even though SRE, I think, is a very specialized skill set or a very specialized role. So maybe in a way it's kind of a mix of both of those worlds and it survives those massive pendulum swings over time. Again, maybe the correlation back to information security.

Steve Cummins (22:05):

There's a really interesting book called Range, I can't remember who wrote it, but I'll put it in the show notes, that talks about exactly this. And the point is get a broad background first because it'll inform on anything you do in the future. And it will also let which is the area that you really should focus on, whether it's something you good at or something you have an interesting or a passion for. As opposed to specializing too soon and then realizing you can't broaden out. So it's an interesting book I think might be worth digging further into that topic. So you mentioned resilience a couple of times and an open gear that's really our focus is this idea of network resilience. So as someone who really spends their days, I would imagine thinking about the reliability of the operations, what does the phrase network resilience mean to you?

TJ Gibson (22:57):

I think to me, it really means that we're capable of surviving faults and that's an overly simplistic answer, but really it comes down to our ability to meet the business needs to meet the customer expectations in a way that is efficient and effective and allows us to continue to innovate. And then there's a lot wrapped in that. But really to me, that's really kind of what it comes down to is just being able to respond and react and absorb and grow to what we need. I think sort of the anti-pattern there would be a very specialized I think about the olden days of mainframes and the AS400 sitting in the basement that I'm sure a lot of banks still have sitting around, you build this thing that does one thing so well that it really can't evolve and grow with the needs of the business or with technology as they continue. So to me, it comes down to flexibility maybe is the best way that I can describe it.

Steve Cummins (23:55):

I think you may get the award for the shortest definition of network resilience we've had on this podcast, surviving fault. I mean, it says it all. I like it. Makes a lot of sense.

TJ Gibson (24:06):

Like I said, maybe overly simplistic and not what you were driving at. Maybe that'll drop my stock value a little bit.

Steve Cummins (24:12):

No, because then like old good interview guests, you then went into an explanation of what you meant by those three words. So that's spot on. I like that. 

I’ve got to believe, in your role, you have a hundred “oh crap” stories where stuff went wrong and it's kind of why this podcast called Living on the Edge. Right. Because everybody has those stories of living on the edge. So do you have one that you'd like to share with us?

TJ Gibson (24:40):

Yeah. And I'll set it up a little bit. I was actually putting quite a lot of thought of this. And like you said, there are many, many, many examples. I had some advice early on in my career when I was doing some consulting and I was very nervous about standing up in front of a boardroom full of executives and trying to tell them where their vulnerabilities are. Not so much that I didn't know or it didn't have the data to back up what I was saying, but I just had no idea where the questions were going to come from what the angles and the agendas were of the people in the room. And so I was really kind of stressing out in this big readout that we were going to do. And one of my colleagues sat down and he said, "Look, consulting is like 10% technical and like 90% people."

TJ Gibson (25:21):

The more I've kind of internalized that and I've grown in my career, I feel that's pretty broadly true for technology in general. So a lot of my sort of oh oh moments or my hairy late nights really come down to people things kind of mixed with technology. So the one that comes to mind and maybe I'll get a little bit of trouble for even sharing this story, but was actually in the Air Force. I had my boss at the time had pushed a configuration file to every machine on our network, relatively small network, but very geographically distributed. So 180 nodes or so spread across 40 countries. That file set the same IP address on every device on the network. And this is in the mid '90s, so a lot of our automation that we take for granted today just wasn't there. It just did not exist.

TJ Gibson (26:08):

And so spending three days trying to talk pilots and people loading airplanes all over the world into how to change the IP address to something that matched, what was, we're going to work on their local network. That to me was one of the hairiest moments, not just because it was so complex and so people focused to resolve, but really just highlighted how fragile sometimes some of these things we take for granted really are a one simple human mistake and essentially shut this thing down for three days. And I think maybe the reason I bring that story out is just the irony at the end, that may resonate with a lot of your people, but my boss got a medal for fixing that problem. So I think that was kind of my funny story I wanted to bring is, we made a mistake we spent a whole lot of time trying to correct for it. And ultimately as the cause of that mistake somebody was rewarded.

Steve Cummins (27:00):

Yeah. Unfortunately that's how life works. They forget the mistake, but just remember, oh, yeah. But hey, everybody bounced back. And let's be honest, I guess, that's also what we get paid for. Right? It's not about mistakes happening. It's about how you react to it, but it seems a shame you didn't get the medal instead of your boss.

TJ Gibson (27:19):

Well, look, he was definitely involved as much as I was in putting things back together. And I think in a lot of ways I've carried that with me. I think for a long time, there was a little bit of bitterness. Darn it, you caused a problem and we had to go jump through hoops to fix it. And you got a pat on the back, but in so many ways that leads to where we are today, right? This blameless culture that we're looking to learn, looking at every failure as an opportunity to get better. I think that maybe was an early step in that direction. Maybe I didn't appreciate it for what it was in that moment, but there's so much value in being able to fail miserably. And even because we screwed up, but when you can recover from that and you can learn what happens and you can look for opportunities to build automation or gates or controls that would prevent that mistake from ever happening again, that's a win.

TJ Gibson (28:08):

The fact that we found it when we weren't getting shot at is a tremendously great thing. The fact that my site went down last week or maybe even we took it down last week at 3:00 AM. That's amazing because it didn't happen during the holidays when we weren't expecting it, we actually pushed the button. We knew exactly what happened. We're able to fix it quickly, that kind of stuff. So I think there's even some lessons in that kind of on the edge moment that I think we're putting into practice 15, 20 years later in the industry and it's making us better.

Steve Cummins (28:36):

Well, and it does put it in perspective a little bit. Right. We all think that our problems are big, but you just used that phrase. "Well, at least it didn't happen when we were getting shot at," which is not normally what people have to worry about. So yeah, I guess there was a life lesson than that for yah.

TJ Gibson (28:55):

A little bit, yeah.

Steve Cummins (28:55):

So talking about life lessons, I always like to give people a chance to give a shout out to somebody that's helped them in their career. Somebody who's influenced them or been a mentor. So anybody you'd like to recognize?

TJ Gibson (29:09):

I've actually got two, if that's okay and I'll try to be quick.

Steve Cummins (29:12):

Yeah. Go for it.

TJ Gibson (29:14):

The first one was my boss in my first people management role, it was a role that came to me through a little bit of duress, a little bit of emergency and a need in a crisis. And I had told them, I don't know about people management. I don't know if this is the right step for me. I don't know if I'm going to like it. I don't know if I'm going to be good at it, but I'm going to give it a shot. Just promise me that if I suck or I don't like it, I can have my old job back. Yeah, yeah, sure. No problem. So I took this job and about six months into it, I went into my boss's office frustrated and I just couldn't, there was too much, I just couldn't get it all done.

TJ Gibson (29:47):

And he sat me down and he gave me some of the best career advice I've ever had. Overly simplified, but it means a lot to me. And he said, "I'm not paying you to get to the bottom of the pile. I'm paying you to put the right things on the top." That's what management's about. And so that idea that the value you see from me is in how I make decisions and how I prioritize and how I respond. That's the value I'm driving, not in my task list, not in getting to that bottom of that pile. That's been some of the best career advice I've ever got. The man's name is Roy Santarell. And I hope at some point he gets to listen to this and know how much that's meant to me. But I think secondly and maybe this will sound or feel a little brown nosy, but it's actually my current boss.

TJ Gibson (30:26):

I've known the guy for 10, 11 years now. He's been a cheerleader from the sidelines forever. He's been in my direct reporting line for about the past four or five years now, but he's really just showed me what it means to... just boundless energy. He showed me what it means to care for people. He showed me what it means to set hard expectations and challenge people, to give them the autonomy they need to be successful. Give them the support they need when they fallen down or when they've hit a challenge or a roadblock. And he does all of this in a way that's productive. It's not ever in a way that's letting me get off the hook, right. There's still strong accountability there. There's still high expectations, but it's done in this supportive way where I know that he's got my back.

TJ Gibson (31:16):

And I think for any leader, that's really the most important thing I could ever ask for from a boss. And I think any technologist should ask for, from a boss. Give me the autonomy to do hard things, give me the challenges that are going to help me get better and stronger and grow. But give me the support. Don't throw me out there to fail. Don't throw me under the bus and all of that. So [inaudible 00:31:38] as my current boss and longtime mentor is another one that I would tip my hat to.

Steve Cummins (31:45):

So both Roy and West will be very happy to hear that. Last question for you because people are always trying to work out where do I go look to learn more about whether it's general networking or cyber liability, whatever it might be, any resources, any websites, podcasts that you would recommend?

TJ Gibson (32:05):

I wouldn't even be able to point to one or even a handful, but what I will say is where I get the most value is from very obscure sources. And what I mean by that, the Freakonomics book had an impact on me. And it wasn't in the things that they were talking about. It was in the way that they viewed a problem and the way that they went about finding solutions or finding insights from that problem. I think anything you can do to look for ways to think differently, to think broader, to bring a different perspective to your work. I think that's where you really get the most value, the bits and bytes, the particular configurations, the new technologies, that stuff will just continue to evolve and change. If you have the framework, if you have the baseline, you can continue to stay on top of those things just naturally, at least in your role as those things come to you.

TJ Gibson (32:59):

But I think that unique perspective or that insight that you get from looking at things from a different angle, I think that that's rarely where I see people shine and succeed. And if I could make just a quick plug, I would say, this is one of the big reasons why diversity is so important in the technology career field. Just because I think for too long, we've had a bit of a myopic view. We've only had a limited number of perspectives brought onto our problems. And that's resulted in a lot of things that we probably take for granted today that in my mind are actually fundamental flaws in the way that the internet works.

TJ Gibson (33:32):

You could look at DNS, you could look at IP space, there's a lot there that constrain us going forward and we don't even realize it. So I think that would be my call to action, or that would be my suggestion is let the technology specific pieces just come to you as they need to, as you bump into them, as colleagues mentioned them, as you learn about them in conferences, but where you're going to really find your success is being able to bring a unique perspective. Either from conversations with other people or from completely obscure sources that really don't have a ton to do with technology. And that's the best way I can non-answer your question. I hope that's okay.

Steve Cummins (34:09):

Yeah. You should be a politician, not answering questions. I actually, I agree. And I have to say Freakonomics is one of my favorite books. I don't know how long it is since I've first read it, but this idea of unintended consequences is something that often rattled around in my head of, "Okay, we're going to do this, but what is going to happen that we never planned for. Right. And how do you be ready for it?" So, yeah, I agree with you, you have to step outside of reading just the regular stuff and bringing some broader perspectives to it. So, I'm not even going to say it's a non-answer, I think it's a really good answer. So I like that.

TJ Gibson (34:47):

For what it's worth, that mindset of unintended consequences or a mindset, again I come from a security background as well, so that mindset from security of always thinking like the hacker, that's what makes a good site reliability engineer is how is this going to bite us? How can this be better? Where are the flaws and the weaknesses in this whole flow? So, yes. I mean whether you get that from Freakonomics or you get that from colleagues, or you get that from other experiences, I don't know that you get that exactly from industry training or industry technology learnings that are pretty typical.

Steve Cummins (35:22):

Yeah. So on that and in terms of diversity, would you hire a former hacker?

TJ Gibson (35:31):

I think it would depend upon the way in which they were communicating to me. I maybe have this flaw for a technologist that I really try to get inside the mind of people, where are they coming from? What's driving them, what are their motivations? But I think your background really has nothing to do with who you are today. We all change. I mean, if we were held to account for things that we did when we were 17, Steve, I'm not sure how old you are, but if we had social media when we were in high school, you probably are second guessing a lot of decisions you made. So I think your background has very little to do with who you are today. And I would be looking more to assess the talent and the capabilities being brought by any particular person.

Steve Cummins (36:12):

Yeah. That's a great point.

TJ Gibson (36:13):

Them being an ex hacker has nothing to do with it in my mind.

Steve Cummins (36:15):

No. That's fair. And it does fit nicely with your point about diversity, right? You have to bring a lot of different pathways to really get a broad understanding of things.

TJ Gibson (36:27):

Absolutely.

Steve Cummins (36:29):

I tell you TJ, this has taken us in a couple of directions I wasn't expecting, which is always fun for me when we're doing these interviews. I do really appreciate you taking time to share your thoughts with us on the podcast and look forward to talking again soon. Thanks very much.

TJ Gibson (36:46):

Thank you so much. It's been a pleasure.