Ben Rogojan is a data engineering solutions architect with expertise in data architecture and statistics. He focuses on developing end-to-end data solutions that help take data from raw format into data products and analytics. He has delivered on projects for clients across healthcare, finance, SaaS, and technology and previously worked as a data engineer at Facebook. He maintains a strong online presence creating content for data engineers and those curious in entering data engineering career paths and is known as the “Seattle Data Guy” online.
Like in some cases, your data engineer might be all of it, right? Like your data engineering will be the one that's like modeling and architecting the data as well as building the pipeline because you don't have kind of very like defined lines. And in other cases it might be very, very well distinguished where it's like, Hey, you've got your ETL developer. You've got your data architect. That kind of plans out from a high level, like how we're going to manage all this data, how it's going to look like. And then the ETL developer kind of is just going to push that data to match what they're kind of modeling. This episode of Ken's Nearest Neighbors is powered by Z by HP. HP's high compute, workstation-grade line of products and solutions. Today, I had the pleasure of interviewing Ben Rogojan. Ben is a data engineering solutions architect with expertise in data architecture and statistics. He focuses on developing end-to-end data solutions that help to take data from raw format into data products and analytics. He's delivered on projects for clients across healthcare, finance, SaaS, and technology, and he previously worked as a data engineer at Facebook, now Meta. He maintains a strong online presence, creating content for data engineers and those curious in entering data engineering career paths and is known as the Seattle Data Guy online. In this episode, we talk about how Ben went from having aspirations of cooking, fine food for a living to eventually working as a data engineer at Facebook. We also touch on the origin story of the Seattle data guy, YouTube channel, coupled with his consulting and about how data engineers can get out of their own way. I loved the talk with Ben and I'm really happy to share it with you. Ben, thank you so much for coming on the Ken's Nearest Neighbors Podcast. You've done some pretty incredible work as your alias, the Seattle Data Guy on YouTube. You've also had a pretty interesting career going from food to technology, working at Facebook Meta. I'm really happy to have you on to talk about your experiences, you know, kind of on your own journey. And I'm certain that a lot of people can learn about, I mean, your own progression as well as about data engineering through this conversation. So again, welcome. Yeah, no, thanks so much, Ken, for having me on, I'm really excited to be on the Ken and Ben show. I like it. And maybe, maybe we'll spin off a podcast called the Ken and Ben show. It's got a good ring to it. So you know, the first thing I was asked people is where did you first get interested in data? Was there just a pivotal moment, something cool that happened or was it a slow progression over time. Yeah. I mean, I definitely think it's kind of a combination of a slow progression plus, you know, like catalyst point, right? Like I I think some of the initial points of like me starting interest in data was like, honestly, you're just watching a show called numbers. Which is all about like people finding criminals through math, right? Like through, through different sort of like algorithms and basic stat, essentially. So that was like the initial point of it. And then initially I was like, thinking about how to apply that similarly to where I was working. So I was like working in the food industry as you referenced earlier. And I was like, oh, I wonder if we can like calculate how many people show up per day based on the day of the week, you know, weather time and month what's on the menu, things of that nature. And then eventually even drilling down further and be like, Oh, I wonder if you could predict like what they'd order based on some similar factors or something. So I think that that was something that was where it started. And then eventually kind of came more to a head, like when I started going back to college and was doing like an epidemiology course. And you'll learn about the original John Snow and kind of the stuff that he did some of the database that he put together and how that actually like drove impact and causes. To kinda figure out where, you know, cholera was coming from. So, you know, I think that's kind of, what's like that catalyst point, right? Like up until that point, like data share was interesting, but I think that was kind of that like, okay. I think this is a super interesting subject. I'm already learning programming and I'm learning more about data. You know, now I want to learn how to combine these things and, you know I think ,I hadn't yet read the Harvard Review article about sexiest job being data scientists just yet. So I didn't know there was a term essentially for this, but eventually came out into the industry and found kind of a, yeah, like there's a lot of people already doing, doing this. So yeah, that was kind of like my initial start through data. Awesome. So how does someone go from working in the food industry? If I recall correctly, you didn't go to culinary school, correct. You know, how did you make that transition? From there to, you know, working at like a Facebook or a big technology company, what does that progression look like? It seems like a lot had to happen for you to ready to go that path. Yeah, I was laughing at the of wisdom of my parents. You know, cause I, you know, at 16 I decided I was like, you know, I want to work in the food industry. I wanted to be in fine dining. And so, you know, in Washington there's running start. And so I went and got into the program, basically, you can do college early. And I was like, Okay, I'm gonna do the culinary program. And my parents were like, yeah, that's cool. But maybe also get your business AA. So at the same time, as I was like, you know, getting my Culinary degree. I got my business AA. And so then after about from 16 to 20, I worked in the culinary field and didn't, I saw my salary go from like $10 an hour to $15 an hour within like four years. And I was like, well, mathematically, this is not going to work out with the lifestyle I want. So... Four years of though, like, you know, it's like, man, that, it depends if it's linear or if it's, I'm going to keep going up at a 50% rate. So yeah. It would depend on how you look at it. But yeah, I kind of saw because like, you know, I was looking at people above me who are maybe 10 years older than me and they were making about the same as I was. So I was like, okay, if I'm gonna be 30 and making this much, you know, the lifestyle of that. Doesn't equate. Right? Like, I think that's how I like to think about life. It's like, what do I want to do with my life? And it's like, okay, I will be working 60 hours a week and, you know, making at most 40 to $50,000 a year, which is plenty of money, but it's definitely, you know, working that much. And if that's like your career, like kind of pinnacle, it's going to be really hard if you want like a family and things like that. So. I think that's what made me go back to school. And I did two more years around 20, so I finished at 22, so I wasn't that far behind. Right. Like if anything, I was probably right on point for a lot of people and yeah, kinda kinda wrapped up around 22, 23. And yeah. Then I got my first job at a hospital kind of doing like data analysis. Awesome. And so, you know, it's funny. Yeah. I had a very similar experience, not with food, but I, after college, I tried to play golf professionally. Right. And there's this huge opportunity costs that you have to face. It's that? Okay. How long does it take me to like start earning money or to be successful at this? And if I don't become supremely successful at this, what are my, what's my earning potential? What are my career opportunities? And let's say I played golf for five years and went down that. Honestly, the majority of my career options would be like selling insurance or going into real estate or doing a profession like that. And those are great professions, but they weren't exactly that interesting to me. Right. I was like, oh, if I, you know, I tried this and I don't think the upside is, is like probably stiffly likely, like I have to look at it in terms of expected value and maybe not just expected value of earnings, but also expected value of happiness. Right. If I work in a profession that I work in now, right? Like I can play golf on the weekends. I can still enjoy the sport. I can go play an amateur tournaments. I can do this and that, but the amount of money and the quality of life that I experienced is significantly higher doing this than if I had tried to, to go down the golf track as well. And so I think it's beautiful. Like you can still enjoy cooking, right? You can still enjoy fine dining more, maybe more from a consumer end, but you know, a lot of people think it's this or nothing. And it's like, well, you know, for better or for worse, I've been able to do like everything outside of playing in professional golf tournaments. I've been able to have all of the golf experiences that I've ever would've wanted to have and more through the work that I do now. And so, like, that's kind of an exciting thing is that through data, through some of these other professions, there are so many opportunities available to you. So going back to your to your story about working in the hospital, how does that transition or where do you get the seed planted that you want to work at a larger tech company? And also what was the nature of that work? Was it data science, data analytics, data engineering. What does that look like? Yeah, I mean, so, you know, I was working at a hospital in, in basically like I was working on a financial analyst team, so like everyone else was basically working with Excel. But, and, and before what I, when I kind of jumped on there was a more senior kind of like a data analyst would, if you wanna call them, that was like super technical. They were more of a programmer, honestly, they just kind of came into this role. And they were kind of already working on like a data warehouse and like doing all this other kind of dashboard building and website building and all these projects. And so, you know, I kind of came on as an intern initially to kind of like help this, that other person. Which of course, like, I think plenty of people have happened that person left like two weeks later. So like, everything was like, all right, like you're the only technical person on this team. So you get to learn all of this stuff. And so yeah, so like, I think like, You know, I think I always kind of wanted to work like at a FAANG. I think most people like, right. Like, it's you want to work at one of the like big tech companies, whether it's like Airbnb, Spotify, a thing, like there's, there's always going to be a point of pride that you got there at the very least. Right? Like you got through the interview. So I think if you ask why I did that, it's like, well, who doesn't want to work at a larger tech company? Right. Like just to see what it's like, right. Like you, you want to see like, okay, what do they do? Is it different? Is it the same? Like, is it better? Is it worse? So you can kind of understand that. But yeah. So like working with. Kind of that hospital eventually started working at a healthcare startup from there, like, cause I already kind of had a general idea of some healthcare data. And so this healthcare startup thought, you know, why not try out this kid? So I started doing work at a healthcare startup and working for them. And that's where I really like to say I cut my teeth in terms of learning more about kind of data engineering and like system design and things of that nature. Because like, basically when I, when I started working at this healthcare startup initially it was more like an operational kind of role where I was just kind of loading data. Like they had some commands you'd run and just make sure that re loaded correctly into the database and things of that nature. But eventually I was like we're doing the same commands over and over again. Maybe we can automate this. So like, I kind of like built this website that was completely overkill. Like there's no need to do what I did, but I did it and like, Oh yeah, you can just press this button now. And I'll just run the file. Like it'll find the file and run it automatically. And I kind of showed that off to the, my director of engineering at the time. And she, I think in her wisdom was like, okay, maybe this is overkill, but clearly you can develop and do a lot more than just operationally manage these kinds of file loads. So then she kind of let me kind of do a lot more development and that kind of, I think spurred on a lot more interest in that area as well. And then eventually, you know, which I was trying to interview around after about whatever two and a half years. Cause I was like, okay, now I'd like to kind of take another shot at, you know, your Amazons, your Facebooks and whatnot. You know, well, and behold Facebook did end up giving me an interview and I, I actually went through two different ones and passed one. Didn't pass the other, but, you know, that's kind of, I think how it goes for interviews and yeah. Then they offered me a job. So that was, that was kind of where I ended up. That's awesome. I think that that's a really important point. Is that a lot of people who I've talked to that do work at the fan companies. And, you know, one of my friends, he, he applied to LinkedIn four times before he got his first job there. Right. It's not uncommon to repeatedly apply. A long time ago, I applied to, to Facebook on his Facebook and I got rejected. And, you know, I decided not to apply again, but like, there is plenty of opportunities and recruiters reach out to me from Facebook still. Right. And it's not like, just because of one opportunity. You don't just miss an opportunity to work with a lot of these companies, unless you do something like super racist and interview and they won't ever call you back. Right. But something else I wanted to, to kind of drill into is around. The difference between data engineering or like the data infrastructure at a startup versus one of these big companies and some of the things that you learn there. So in a video I made recently, which I believe you've seen as a, you know, like why is data engineering so hot right now? One of the ideas is that like a lot of these big tech companies they've built really good data infrastructure. They would not be able to succeed at what they do without the data infrastructure that they built. Their entire businesses are built almost primarily on data. Right. Whereas in a startup, a lot of companies, they either have the idea first or they've, and they're like having to build out the infrastructure or they've built the, or they are building the infrastructure out with the future in mind, but you're still at a very different place. Right. You have to build the infrastructure rather than work in this ecosystem. That's already. Very well curated for the problem you're trying to solve. Was there some, like dichotomy there? Like what new skills did you have to learn working at a big company versus a startup? Yeah. Like what was, what were those two experiences like and how did they differ? Yeah, I mean, so, so working, like working at the startup that I worked at you know, we definitely were using like, I'd say a very bare bones approach to data engineering. Right. And in a way, like that's something I almost agree with. Like you kind of mentioned earlier, like, Oh, some startups like look at all the new tools and they want to use all that. But oftentimes they fail to ask questions, like, do I need all that? So I, in, in my case for this startup, like we were honestly using very pretty simplistic ways to get data, to get processed. We, we knew what our problem was, or we knew what our kind of thing was. We, we really only process data about once a month or like we needed, we had like a report that was due every month that were kind of like selling out. And so there was tons of companies that we could do it for, but we, you know, that was the one thing. Yeah. So it's very batch focused. You don't need, like, because it doesn't have a huge amount of dependencies. If you kind of run it into three chunks on its own And so that kind of led us not need to not need a lot of fancy tools, right? Like we didn't need Airflow. We didn't need all these kinds of extra layers because it's like, well, it's really just, you know, these SQL scripts that need to get run an initial load script and that, and then that's about it. And then we just used SQL Server. So it was pretty, pretty straightforward in that regard. Like it wasn't super complex. But I think it's kind of normal. And, but I, in some ways it like, cause we built, we also built out like two products while I was there that required a lot of like systems. I'm thinking, thinking about dependencies a little more deliberately. So like thinking about what needs to run before, what and things of that nature. And kind of designing the system for the different use cases that we wanted to kind of provide at the end with this data product, because it was much more of like a data product company than it was anything else. It was like we had kind of some end products that were developing whereas Facebook, you know they're, they have a mature data ecosystem. They have a lot of this stuff ready to go, like everything from how your version controlling your system or your data or your, not your data, your code and your pipelines to the pipelines themselves. Right? Like they're using DataSwarm, which is similar to Airflow to every other layers is already kind of there, right? Like it's already there. It already exists. You're really just coming in and kind of working within the system they've already developed. So for some people, I think they don't necessarily like, like the fact that maybe it is to that state. But for me, I think it makes the most sense, like, right, like having a fully matured system and not having to do a lot of like the data infra that's around that system, because that's so time consuming is how you're going to move fast when it comes to getting to data insights quickly. Right? Like if you're having to spend two weeks just setting up your pipeline infrastructure and not even getting to the actual pipeline, you know, people aren't going to get scenario insights for a long time. So I think that's one of the kind of differences like that they're already matured. They already have kind of systems in place. They're always little improvements to be made, but overall I think, you know, they have a system that works and I think you brought that up earlier. They, they built it forever ago, right? Like they're the people that built Presto and hive because they realized we need something more than just what we had in, in for options. And it's similar to Google and Big Query or Amazon Redshift. They knew we needed better data infrastructure, so they built it. So they've just, they've just built a lot of that stuff already internally. And I'm like, give me a lot less fun for some people it's like, they might want to like start using some of the fun stuff that exists outside, but it can also make a lot of sense if you understand, from like a business operation standpoint, like, Hey, we need to start delivering data as fast as possible. We don't need to spend tons of time, you know, building up all this. This episode of Ken's Nearest Neighbors is brought to you by Z by HP. HP's high compute workstation-grade line of products and solution. Z is specifically made for high-performance data science solutions. And I personally use the ZBook Studio and the Z4 Workstation. I really love that the Z line who comes standard with Linux and they also can be configured with the data science software stack. With the software stack, you can get right into the work of doing data science on day one without the overhead of having to completely reconfigure your new machine. Now back to our show. Yeah. I mean, that, that makes a ton of sense to me. Something that, that always comes up in my mind. I mean, I think I talk about this quite a bit within the broader data science domain, but what are some of the specific use cases of data engineering versus, or like what a data engineer does versus what like a data architect does? And I think historically, like with the, even within like the data manipulation or database administration domain, there was a lot more clarity around what these roles did. And now that quote unquote data engineer has been introduced as a term that sort of added confusion to the mix. I might be wrong about that, but like, can you explain to me about like the different roles within like, sort of like the data engineering data manipulation sphere? Yeah. I mean, like, so I think, you know, a lot of. I think become morphed and changed over the last whatever few decades. Right? Like, cause before it was like, I think you, you had more distinguishable pieces at some point where there was like, this is an ETL developer, this is a data architect. This is, you know, different kinds of roles we have. Yeah, yeah, yeah. And they were, they all played very distinct roles, but I think, you know, we've kind of seen some morphing and, and, and, and depending on the company you work for. Like in some cases, your data engineering might be all of it, right? Like your data engineering will be the one that's like modeling and architecting the data as well as building the pipeline because you don't have kind of very like defined lines. And in other cases it might be very, very well distinguished where it's like, Hey, you've got your ETL developers. You've got your data architect that kind of plans out from a high level, like how we're going to manage all this data, how it's going to look like. And then the ETL developer kind of is just going to push that data to match what they're kind of modeling. Cause right. Like they're, they're trying to basically set up the data to match all the use cases that people are going to like ask questions on because there's tons of use cases that come up. And if you don't manage your data correctly, sometimes you can't join across the different tables because you didn't think like, Oh, there needs to be a relationship. But you didn't set it up. I would like to give the example of like, imagine you like are doing some sort of data warehouse. You have your inter your like fact table for order line items. You've got your menu and then you've got your kind of like restaurant as kind of like dimension tables. But if I were to ask you the question, like, okay, well, what items aren't ordered? You can't answer it because with those three tables, if there's an, a key that somehow tells you what. Is connected to what restaurant and the only place you connected to is that fact table. You can only tell me what has been ordered and you can't say what hasn't because you don't know where those items go. If you don't have a key across or some sort of bridge table or something that creates a relationship one way or the other. Yeah, it just causes that kind of issue. And so I think, you know, that's kind of the way that I think a lot of data architects have been used, like how are we gonna plan out kind of how that data is managed so that we can ensure all of our use cases. Are we, how are we gonna track dementia are slowly changing dimensions, like things that change over time. Cause that's another issue is if it's not tracked, it's like, Okay, I want to report that. You know, it tells me the number of, of even users per city that we, that we manage. And if you only have like an update statement, right? Like, okay, use your A's now in New York, but they were in whatever Cincinnati last year, but you don't know that they were in Cincinnati last year because you just updated that table to say like, okay, now in New York, you've kind of lost information and now you can't properly report. So managing things like that, I think is more on that data architect, data model or role. And then on the other side for ETL developer. You know, it's kind of piping that data into those tables correctly. But I think a lot of that's been merged sometimes into data engineer, depending I think at larger, more traditional companies, they have, I think I still have those things a little more separated. But I think, and in some companies they just want it to move faster. So they're like, well, if we have this in one person they can manage the whole pipeline. And then we don't have to spend as much time, you know, doing as much big architecture. And I think that's another thing is like a lot of people are switching away from like doing this, like ADW, like as like one central place and like maybe doing some more like decentralization there as well. It, it just depends on how you've set up your kind of company as well. Yeah. I mean, I, I think that at least for me, when I was first learning about any of the data engineering type work. I was really astounded by how much nuance there is. And just like what data science, how much a difference differs by company and tools used and like the, the strategy that you're approaching it with and the amount of ad hoc there are to tasks or like structure and planning around them. Something I would ask is like, do you think data engineering can be done and an ad hoc way? Or is it something that like, it really loves month itself to like planning and strategy? I mean, some of the ETL stuff, like probably yes, but like at a broader level is, I mean, do you need structure on this is, or can some companies get away with like, Hey, we're using a NoSQL database for this. It's going to be relatively unstructured and we'll structure on maybe. I mean, I think you're always going to need to take the time to plan. It's just unavoidable to some, some degree, like at least building those core layers, like I always like to think of data, right? Like you try to build kind of this, like pull from your core systems, create some core layer of data. That's like. This is your truth. So to speak it's it's, it's rarely like, Nope, no one ever fully gets to like perfect source of truth, but you know, that destination of trying to constantly get to source of truth. So you kind of treat this as a correlator, the closest thing you can get to a source of truth. It it's kind of your, your finest grain you can get to. And then from there, people can build on top of that. I think that's the place that takes the most planning. I think the layers on top of that, you know, maybe we can start getting less and less planning as, as you go along, right? Like you've got ad hoc requests that come in and, you know, maybe you just build that query for somebody. Or, you know, you're just trying to see if something's worth kind of building. But I think for that correlator, I think there's an unavoidable amount of. Especially as companies become more complex with their data sources, right. More data sources, which are even like they're even further integrated than they've ever been. Right. Your Workday talks to your Salesforce. Talk to all of these other things causes the risk for a lot of issues in terms of like pulling the same information from multiple sources that is not sinked at the correct times. And I've got people reporting on the same field, but from different sources. And thus, when you report on that, you know, well, maybe this wasn't sinked, you know, and you you're slightly off. So your, your, the number of people that are under a person is wrong because you were off by a day from when you pulled. So I think that, that, that's why there's always planning involved because. Well, where's this data coming from what's going to be the source of truth for this field, you know, is this the actual source of truth for this field or is this getting pulled from a different table? So I think a lot of that will always take a little more time. And yeah, you can apply some as much as people hate, hate it, like Agile. Approaches to it to try to like iterate faster so that you're not taking as long to get to the data that you need. You know, right now. But I think over, over a long period of time, you're going to constantly be refactoring it. So it's going to take time one way or the other either you're going to be taking time refactoring it because you need to constantly change it over time. Or you're spending a ton of time doing analysis upfront to see if this matches all the use cases. Fascinating. So I guess if we're thinking about. Like ad hoc tasks, or we're thinking about like the systems that we're putting into place, how should data engineering do you think interface with like the data analysts or data scientists? Like what, what does the handoff look like in an effective organization? And that's probably an impossible question to answer in like one solid way, but like, you know, do you have a preferred method or, you know, what do you think makes the most sense from what you. Yeah. I mean, it, like you said, there's no solid answer here, mostly because it's going to depend on your, your team skillsets, right? Like to say like, Oh, I want my analysts to write queries is assuming that all your analysts can write queries, which is true in some cases at some companies and just not true in other companies. I think this is a gap we continue to try to, to jump over, right? That's like the whole I'm you like, they have like Power BI, but not only Power BI, but like Power Query, right? Like an Excel, right? Like if you've ever used power query, that's kind of the thing, like, can we create a gooey version of joining tables so that, you know, an analyst doesn't have to write a query to pull it into Excel. Right. That's kind of their goal. There is like, how do we remove the data engineer or someone who's a little more technical from this picture so that they can get to that data quickly? I think obviously in a perfect world, it'd be nice if data engineers could like focus more on just creating those core layers, the data, maybe a few analytical layers as well. And some like key KPIs and then ad hoc queries was, was always taken care of by like analysts like that. That would be like I imagined a better world where we can kind of all focus. Obviously now you might've heard the term analytics engineer and that's kind of a layer in between even then, where they might be doing that more the data engineers might be focusing more on the core data layers after that and the analytics engineer might come in and then kind of create some more of the aggregate tables, meet some more of the complex dimensions or something that requires a little more logic development. That way they can kind of be more specialized, but you know that that's, if you have the, the team members, and oftentimes I find that data engineers tend to split themselves anyways into a natural grouping of people that like doing more coding and then people that liked doing maybe a little more SQL and creating a lot more of that business logic. So that, like you said, there's, there's not a perfect mix. I think often that what works is, if you can create that correlator with data engineers and then figure out what the gap is from there, right? Like if you need your data engineers to kind of develop the query. Themselves, because you don't have analysts that are capable of writing SQL, then that's, what's going to happen. If you have analysts that can write SQL, they can do a lot more of the ad hoc work. Maybe even try to take some attempts at some more complex business logic. But at the end of the day, I think it generally, generally will still kind of fall into either an analytics engineer or a data engineer to kind of like just solidify it and productionize it and then kind of work from there. I think at the end of the day, like analysts and data engineers are just different in mentalities of what we're trying to do. Data engineers, I think, try to build things that are going to be here for a long time. Analysts often just want to answer questions. And that answering questions is sometimes not permissible. If you keep having to do like version control and all these other different kind of extra steps. And you're just like, I just have this ad hoc analysis. I don't want to spend, you know, three days just to write one query because I have to do version control every time. And, and you know, I'm just trying to answer this quick question for my boss that we're never going to use again. You know, it's funny, I remember quite a few times where, like, I would just write my own queries and then one of the data engineers would be like, why'd you why'd you use this column? Would you this column what these aren't useful? And I'm like, why don't you put it in the database if we shouldn't be using it? Like, what's the, you know, and so there was all, I mean, this is not a common occurrence where we know have those, those issues, but it's like, you know, part of me is seeing an inefficiency and how I would work. Is that like, okay, maybe it's better in some organizations, if data engineers just do it, if like the infrastructure with what we're supposed to be wearing, isn't clean enough for us to be like using it. And then in other places where like, data engineers are focusing really heavily on like building the proper infrastructure. Like yeah, like a data scientists and data analysts should be doing as many, you know, the majority of the querying, if it's light stuff or dashboarding or whatever it might be. But you know, that's something that I I've seen in a couple of organizations now is it's like, like I don't want to piss off. Engineers by like wasting their time because I'm perfectly capable of writing a query. But at the same time, there's been circumstances where, when I've written the query, it was not correct. Or it was not correct because of not the SQL, but it was not correct because of the infrastructure we're using and some of the semantics there. And so I think that's like where a good manager can make a huge difference or a project manager or whatever might be. Or that analytics engineer that you described, that sounds like a perfect fit for there. I wish I wish I had one of those back a couple of years. Yeah, no. And, and it's always this gap, right? Like data, like even the underlying data changes so quickly that, you know, data engineers who might not have time to like, be like, oh, we should remove this column. But there, there are consequences to doing that. So it's like, sometimes we're like maybe we'll do it later. And I, and obviously I think like long-term, the goal is to always kind of try to remove things and everything in that correlator should only be accurate information that is usable. But, you know, I think sometimes between time constraints and everything that, that doesn't always get done. And I think that's a constant problem, you know, everywhere. There's just, there's never enough people. And especially when it comes to like data engineers and there's always more questions that the business asks. So yeah, I'm trying to figure out a way to like, manage that is, I think people are trying to do that with like projects, like data hub, where it's kind of a place you can kind of start searching for, for columns and things like that, or tables that might exist and, and get a lot of metadata. Which then maybe you can put something in it in, into that, into that like metadata store that says like, oh, this is a deprecated field. At the very least, like you could do that. That's something we had at Facebook, but we could like mark a field as deprecated. So maybe you didn't have, maybe you didn't want to remove it because, you know, removing, it means maybe 30 other pipelines. But you can put this deprecated thing and then like if you've got a really fancy sink SQL lyncher it'll yell at you and be like, Oh, actually this feels deprecated don't even use it, but that that's, that's a very you know, not everyone's got that. The company I was working at was very, is more on the startup side and like the, like the schema is and all that stuff, it wasn't a lot of the metadata associated with that was not well built out. And so I was like, all right, you know that that's, that's like a function of where additional layers of structure, I think could create a lot of value. You know, I want to change gears a little bit. You know, something that I think is pretty fascinating about you is, you know, you were able to land one of these big tech jobs. You know, now you've left that tech job and I've been working in consulting. You also do quite a lot of content creation, blogging, YouTube videos, things along those lines. I'd love to learn where that sort of started and what lessons you learned from that. And if I recall correctly, your sort of blogging, your content creation journey started still with food. Good memory you remember. Yes. I had a blog, I think could a, I don't know why I did it. Had free time, right? Like you get out of college and you're like, I have time. I don't have to do homework anymore, so let's go do something. So yeah, I like put together ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,like a website there and started writing about like food that I was eating that I enjoyed. And yeah, cause I was like, I still wanted to, I think be a little more attached to food. Cause it's, you know, as he, when he did you, I did commit a decent amount of my life at that point to, to food. So it was like, oh man, I'm kind of sad that I'm leaving this, but yeah, it was like one way to kind of stay connected. So yeah, that's where blogging. As far as consulting goes. Honestly I had someone who reached out to me after like we'd worked together and they, they wanted some help on a project. It was like converting an access database to SQL server database. So they could do some analytics for a company. I don't remember why I came up with Seattle Data Guy, but I did. And then another friend asks something very similar where they asked like, yeah, Hey, can you help us out on the project as well. And they were very into like, kind of doing social media analytics and like, that was kind of their. So basically like with that came, like I was already blogging from my site, then I learned about medium. So I started writing on medium. So I started just putting out tons of content there and have been putting out content for a little bit. And I think this year I started to notice that, like, I kind of missed the boat on YouTube. I was like, Oh man, I should have started back in like April, apparently, because that does seem to be the, there seems to be. And interesting influx around that time. I think, I think I recall you noted that as well. Right? Like somewhere in that time we were all home. We had to, we had the free time and a lot of us are really trying to learn. Yeah. We wanted to figure it out. We want, everyone wants to learn from the side of like watching, but like also creators had time as well. Right. Like suddenly you're like home and it's like, Oh, I like 30 minutes, like put together a video, like, okay. Yeah, I can do that. And you know, I don't have this traffic that I have to go through so I can put together a video. So yeah. There's. Interesting flux of like data and programming YouTubers that I came around, I think in, in that wave you know, we, you've probably, I think you've got a few of them on here and like Tina Wong and a few of the others. But yeah, I think they all started somewhere in that range if I recall. So that was something that I was like, I think this year in June, I was like, oh man, I should have started way long ago, but I was like, all right, let's let's finally start this. Let's just try this. And just started filming. Yeah. Using my computer camera and in the corner of my kitchen. And I was like, Hey, I got a decent amount of use. Let's try improving this a little bit. Still have not bought a 4k camera, but you know, my phone is currently working fine. So yeah, technically the content side camera. Technically your phone is a 4K camera. Oh, I see. I wouldn't even know because I like this. This is not mine. This is not my area of concern. Though, if you need any camera recommendations, I've got you. Perfect. Perfect. Awesome. And so, you know, it's interesting to see how that's evolved. Obviously I'll link all of your, or your content sources in the description below. So anyone interested, please check those out. What is the nature of a lot of the content you're producing? I know you have a lot of information about data engineering in particular. Can you give us sort of like a SparkNotes of, you know, some of the, maybe some of the highlight videos that you'd recommend people starting with? So just get started out, understanding you as well as the data engineering. Yeah. So I think, you know, a lot of my content currently is definitely around like the intro data engineer. Like, what is it, you know, do you want to become it? And how do you becoming concepts? So that's been a lot of my focus early on. So like one popular video is like, you know why you shouldn't become a data scientist, but become a data engineer, which is mostly just pointed out like, Why someone might prefer data engineering. It's not to say one or the other is better, but merely why oftentimes, sometimes people do switch, right? Like, and I've seen people switch both ways, right? Like some people start as data engineers and like, no, I like data science and some people go the other way. Right. Like, I think you, who was it that you had on recently that did that? Yes, yes. So, so she went, she went data, engineers, data science, and then, you know, you've, you've, I think you've talked to Joe rise as well, and he's gone data science, data engineer, or data infrastructure. And so, you know, I think people would try to figure out like, what, where, where do they fit? And if you've got some programming and some knowledge of stat, you can kind of in kind of toggle between both for a little bit until you find the one you, you enjoy. It just depends on what you prefer doing. So I think that was kind of what. Decent video. It's got decent views. Not too much, not too many downvotes as much as you can. No one can tell, you can tell anymore. No one can tell it. Well, no, we can tell we, you, you guys hurt our feelings with every downvote. No, I actually, I encourage like, so downloads are, they're counted the same in the YouTube algorithm. Oh really? Which is weirdly enough. So they're just both counted up votes and down votes are counted as engagement and people like YouTube, the algorithm is designed. To optimize for engagement and watch time. So it's kind of like this weird paradoxical thing where stuff that gets like, even like super downvoted. It was still getting showed a lot because like. Engagement, engagement. Exactly. Yeah. I don't know if you saw this guy. I don't remember what his YouTube channel was. He's like, Oh, I hack the YouTube algorithm and his video. 5 million views. And basically what he did was he sped up his video. It's only 30 seconds or 25 seconds and he sped it up to like times four. And then he tells you, before you start to put it two times, 0.2, five, as you run it. So it runs at normal speed. And so that causes the YouTube. YouTube doesn't understand that you've still only watched a hundred percent the video. It thinks you've watched 400% of the video if you watch the whole thing. And so basically he showed how basically by doing that. 4 million views. It's all those other videos don't have that many, but it was, it was very, very interesting in terms of like watch time. It's like, you can kind of, if you can figure out a way to game that thing in theory, I guess you can get tons of views. Well, that's the thing. If people watch a video multiple times, you bump watch time and I'm experimenting with some stuff like that. I'd love to do something at the end of each of my videos that sort of like a scavenger hunt, or like, did you notice? And so people will go back through and it's like fun for them, right? It's not a, it's not you know, if they don't want to, they don't have to, but it, you know, to me, that there's unique ways that you can optimize for an algorithm. Not detracting from the experience of, of the individual viewers. And obviously has didn't you switched the, you switched the toggle from, you know, to slow it down and it works perfect. But I also think that it's a little bit disappointing that the YouTube algorithm optimizes in the way that it does. I mean, it's definitely not optimized for wellbeing. It's optimized for people. They'll just go down rabbit holes and, and watch as much content as possible. And like, I am subject to it. Right. We have to make content that, that we think will be interesting. And you know, that's a, trade-off I think yeah. Day to make, when you focus on one platform in particular and you know, that's one of the reasons why. You know, LinkedIn, I like these other platforms, as well as that I can tell different stories based on the different platforms that, that were a part of. Yeah, go ahead. No, go ahead. Okay. We've been on this marketing kind of speak a little bit, so happy to switch. I would love to hear a little bit more about the consulting and, you know, obviously it's, it's going well enough that you felt comfortable leaving your job at Facebook. What has that journey been like for you and what are some of the, the opportunities, but also the challenges with, with going sort of on your own like that? Yeah, I mean, so, you know, I had a consulting company for a long time being like, I think like four or five years. And like, I would take like many little projects here and there put out calls. And I think, I think this year, there's just a lot of like things that came to a head where it was like, you know, I just started, started putting out more content started, I think, synergizing more platforms you just referenced like 10. So like trying to like really get like everything to go across everything. Because right now it's like, if, if one of my pieces of content blows up, I see it impact almost everything. Right? Like. My I saw I had a YouTube video blow up. Well, now my websites visited more. Now my newsletters visit, you know, I get a hundred new signups for my newsletter and that day. So you know, it just all kind of fly wheeling at this point from a content standpoint, because again, I've been building it for the last five years on different platforms and on just trying to make it for each platform, right? Like that's always the hard part make each, each kind of piece of content for each platform. But yeah. So let's start the flywheel. Suddenly I forgot where I was going. In terms of like consulting. Yeah. So, so from the content that, that started to flywheel, which also led to more like engagements consulting wise. So I think that's also changed. Like I've gotten a lot more requests for proposal. And then the other big thing that I changed this year is like how I would deal with projects. Like before I think I would be very open and kind of take any project almost like it's like, Oh, it's $500, a thousand dollars a month, which. It's great to have an extra thousand dollars a month, but then when you look at your Facebook salary, you're like, wait, I would need like 40, you know, ridiculous maladies projects to manage a similar salary. Not to mention I'm not going to have health insurance and all of these other issues. And I can't manage 20 to 40 projects, you know, in my brain, even if it's only like three hours per project. You know, you, you, you take up a certain amount of of memory per, per project, regardless. It's just a default. So yeah. I think switching that up as well and making sure I'm focusing on projects that are more impactful for a business that in, in marketing more towards that, even with my content and trying to hit those kinds of concepts has been very helpful. So that's kind of been the switch it's like, Okay, how do I, how do I re reposition myself to make sure yeah. Being very impactful at a business. So that way I can charge accordingly, right? Like I want to provide the value, but in order to provide the value, you have to kind of market towards the problem you're trying to solve. So I think, I think that also helped. Yeah, so there's, there's a lot of opportunity. Like there's, there's a lot of opportunity in, especially the data engineering space right. Data science. I know everyone says data engineering is hot right now. And I think in some ways it is right. Like we're seeing billions of dollars go into funding for bill for data engineering systems like Fivetran and, you know, billions of dollars of a stock valuation and all of that. But I think data engineering is just not as enticing or as exciting, right? Like, it is kind of more of the plumbing. We're just doing the infrastructure. And for some people, I don't think it's as sexy. But that means there's a ton of opportunity for solving this problem. And if you've been around a lot and work with different tools work with different solutions, kind of get how things work from a high level and not just from like, you know, Oh, someone told me to do some solutions, so I put it in, but like you can kind of come in and be like, I'm going to implement this whole thing for you, right? I'm not just gonna implement one piece. I'm going to do all of it. There's a lot of value you can, you can add. And, and that's really what I've been trying to focus on. And that's something like, honestly, if you, if you want to talk about like skills you learn at Facebook, it's like, that's, that's the mentality you start developing. It's like, what project can I do? That's the most impactful. And that requires the least amount of my time. Right? Cause it's like every pro, like I have to make sure that I look really good at the end. And if I'm doing like 10 projects, but all of them are kind of baby impacts it. It's not one person did one project and that one project got them a promotion and this 10 projects didn't even get me a promotion. So you just start thinking about that and like, Hey, what, what really drives value to business? So, yeah. In this case, it's a lot more clear cut. It's like what drives revenue to your pocket? Even more clear. Yeah. I mean, yeah. That's and, and you know, it it's obviously I'm to remember it was it van who was talking about like the different salary or like different ways engineers can kind of charge like day scientists can charge for what they do consulting wise. And it's like, yeah, there's, there's definitely a lot. You can, if you're, if you're good at consulting, you can charge a decent amount because what people are expecting is not a pair of hands, at least not only a pair of hands but someone who comes with the experience, the knowledge should do a lot more, right? Like they, they want you to come in. All right. Like a normal worker. You come in, you spend the first two weeks getting up to speed. Maybe you do a few tasks after that and it's, it's a very slow process, but oftentimes when they hire a consultant, it's like, Hey, week one, you, you gotta start already, almost delivering something. It doesn't have to be the actual thing, but like start showing how you're gonna deliver value day one. And it's a very, I think it's a very different experience and that just takes time to figure out how to do. How, how is it managing projects by yourself? Or have you hired anyone? Like what does that process look like? I mean, I think for me, technically I work in consulting also. And one of the biggest challenges is that the bigger, like more high profile clients that we bring on, the more people we need to bring on, because I simply cannot handle all the work. And you need to find people with like the specific skillset and you have to find people that are like you know, especially when you're small, like you have to find people that are willing to work potentially like without benefits or like those types of things. And are you limited by just being, you know, by yourself? Like how do you make that? To hiring people, bringing people on. What do you know? Maybe you haven't thought that far ahead. Maybe you have, but I'm interested in that progression kind of selfishly as well. Yeah, of course. So I, I did, I did experiment with hiring kind of like an intern earlier this year. And it's data engineers are definitely hard to hire for in interning role just because they're. That skill is not something taught in school at all. Right. At least data science has like some, something of assemblance of courses. I think more moreover, even in schools, right? Th there's like master's degrees in data science these days. So at least they have some ideas that the tools are gonna use. So like for example, I hired some that was more of a data science background. And then I remember I asked them to do a pipeline and I provided way too much ambiguity. Maybe they came around and gave me a Jupiter notebook. And I was like, you know what? That's on me. This is on me. I. Should have seen this coming, you know, asking a data scientist to build a pipeline like this, and that's what I would have done. It was like, Oh, here it is. You know, here's the pipeline it's like. Okay. But yeah, so I'm still figuring out how to do that best. I do have people that I work with that, you know, are more on the contracting side of work because you do tend to need to find people that are very like somewhat already pretty skilled. I think, you know, obviously everyone wants to hire a more skilled person, but I think. In, particularly with data engineering, it is hard to, to get people to have the right skillset. You know, I think Joe is doing brought Mr. Murray Joe rise. I think he's, he's kind of getting up more of a system to, to get people to that right level. But yeah, you kind of have to set up a process to, to teach people especially if you wanna bring them in from an interning level or even a beginner junior level to a more advanced level. It's like, you, you kinda. Be responsible for that because there's, there's, there's not a lot of context on how to become a data engineer. It's really just piecing together things and then relearning things because you learned it wrong because it was from some blog article from 20 years ago. And now we're doing everything completely different. So yeah, it's, it's it is definitely a challenge because, you know, I think the best way you can do is like develop a process. That's that's honestly, what I tell people is Facebook and any big tech companies. They have a process to create really good, like programmers, data, engineers, data scientists, right? Like they, they they've, they've, they've developed that. Right. They know how to drive the right incentives that people become that. Whereas most companies that you hire a data scientist, and if there's only two of you there's no process around best practices. Right. The wonder for you was some PhD who maybe has some experience, but maybe not that much specifically in data science. And so there's like a lack of best practices. And so I think that's, that's, that's an interesting next step for me is like trying to create some sort of program to like onboard someone and immediately get them to a point that I need them. Well, I mean, at least for me, I really enjoy those types of struggles or enjoy that process of building. I mean, I'm having, I mean, it's a headache, but it's a lot of fun building out infrastructure for like my YouTube channel and the community and like editors and those types of things. Right. You know I'll be fairly direct. I'm, I'm interested in like your thought process for leaving a big FAANG company and to do consulting or to do this other. Approach, you know, what are the main motivators for that? I mean, I would imagine like financially there you'd probably have to be making similar at least to make that transition. But is it that the freedom that you enjoy is the idea that you have control or more control over your own destiny? What are the things that really appeal to you to make that next step and transition away? I mean, I guess you can always go back, right? Like there's no, there's nothing. That really prevents that, but I'm just genuinely interested in kind of your process on that. Yeah, no, I think some of them you've hit on the head. I think one, I, I can always go back. I think that was the biggest fear. It's like, ah, I almost feel like they make interview so hard, so we don't want to leave. Right? Like, it's like you had to go through this process. You don't want to go through it again. The problem is like, you know what? I'll, I'll figure that out. Like when that happens. I think that's part of it. I think another part is like, yeah, financially in theory, this could and could not be more lucrative. Right. Like when you bounce all out, it might just end up being the same. At the end of the day like that, I've definitely seen people who, who become very high level engineers at Facebook and there's articles about them making a million dollars a year. So it's like, it's hard to beat that. Consulting wise, right? Like that you'd have to do tons of hours. But I feel like that also might be a smaller, like, there's, there's a very small group of people that do that. Well, that was like an E7 , which is like, I think technically the tech lead being a tech lead was like E5 or E6, so imagine like several levels above that to make that salary. So I think most people probably cap out at that. Like so I'm like, okay, I think it's possible for me to make that salary consulting. I think the other thing is I really have, I really do want to make more content. I think that's going to be a huge driver. I think that's gonna be something a little more scalable. And I think that's my biggest goal. I think my biggest goal long-term is figuring out how to become more scalable as a, as a person. That's whether it's content, whether it's product, whether it's, you know, consulting and having more people. You know, working with me. I think that's like my long-term goal, like both from a standpoint of like, whatever you want, call it financial freedom or, and just in terms of like being impactful, it's like at a certain point, my skillset as a person hits a wall and I need to figure out how to get beyond that. And that's either you're getting hiring people through building something that is scalable. Content, you know, YouTube, I think is like one of those cheat codes where it's like, yeah, you want something scalable, make a decent YouTube channel. Like you can impact so many people with. 30 minutes of work maybe more than 30 minutes, but you get my point. It's like your, your eight minute video that you spent, you know, an hour or two prepping for maybe eight hours, if it's really well researched. And then you paid someone else to hopefully edit it, if you're me. Cause I do not have time, but yeah. And then. 2000 people. And that's really cool. Or maybe if you do a good video, a hundred thousand people or a million people, and yeah, I think that's kind of that next step, right. It's like, okay, how do I, how do I just become scalable from an impact standpoint? Cause that's what drives value to people. And that's, that's interesting to me. So... Yeah, I've had very similar conversations with myself. You know, I, I taught a university course this past year. Right. And I sit there and I, I teach three hours of content once a week. To 30 students, right? Those same 30 students over the course of a whole semester. If I make a YouTube video, if I make a an online course, I can reach what, you know, 50,000, a hundred thousand people with with literally less effort, because I don't have to go through and teach it. I can teach at my own pace, do whatever. And I think that that's like a, a really important thing is to say, Hey, how do I. How do I get more from the hours that I put in? Or how do I like have control over the hours that I've put in as well? So I think that that's a really good note to end on. Do you, what are you working on now? How can people get in contact with you? How can they find out more about you? Please let let everyone know. Yeah. What I'm working on now. I mean, obviously content I've got, I've got, I do want to put together some stuff. That's a little more like. Helping people think mentally about data engineering. Like not just tool-based. I think some people approach learning data engineering from a tool-based approach. I think I do have some stuff that I'd like to put out videos on that a little more like high level. It's like, okay, when you develop a data pipeline, what are we thinking about? Right. Regardless of tools, agnostic of that, like how do we really think about it? What are kind of the key points? So that's kind of one thing. Consulting obviously is another thing and, and figuring out how to continue to grow that is the other big thing for me internally. Yeah, I think, I think those are kind of the key areas. I'd like to read more, that's something I'm hoping we do more this year. I didn't get to read much last year. So I'm like, I feel, I feel a little bit drained sometimes. Sometimes, you know, you need to drink, drink from a well and not just pour out information. You know, you get a little bit you start getting stale. So you need to kind of refresh yourself. So yeah, I think that that's something I'd really like to do this year as well, but yeah, if anyone wants to contact me you know, you can find me under the Seattle Data Guy on most platforms. Ping me, LinkedIn me, message me whatever on whatever platform and yeah. Thanks. Thanks so much for all your time. Yeah, of course. You too. I'll make sure to have all of those links below in the description as well until next time. Yep. Thank you.