Build, Repeat. (A Paces Podcast)

Energy Data Science with Kyle Baranko - E123

May 15, 2023 Paces Episode 123
Build, Repeat. (A Paces Podcast)
Energy Data Science with Kyle Baranko - E123
Show Notes Transcript

In this episode, James McWalter interviews Kyle Baranko, a data scientist at Paces, about the intersection of energy and data. They discuss:

  • How renewable energy companies are using data to optimize projects and the challenges of working with legacy energy systems. 
  • The types of data that renewable energy project developers care about, including interconnection, geophysical and environmental constraints, permeability, and financial incentives. 
  • The challenges of data collection, data maturity, and modeling maturity in the energy industry.


Paces helps developers find and evaluate the sites most suitable for renewable development. Interested in a call with James, CEO @ Paces?

This transcript has been lightly edited.

James McWalter: Today we're speaking with Kyle Baranco, data scientists at Paces. Welcome to the podcast, Kyle.

Kyle: Thanks James, it's a pleasure to be here.

James McWalter: Could you tell us a little bit about your background?

Kyle: Sure. I've been working at the intersection of energy and data for several years. My first job out of school was with an agency that specialized in clean tech. Through that work, I got exposure to a lot of data projects, specifically, machine learning applications at the grid edge. I became interested in data science and have been working in a technical capacity since then.

James McWalter: What did the energy focused startups have in common?

Kyle: They were all working on data solutions at the grid edge. They were taking advantage of the transition from a legacy analog, centralized one-way system to a more flexible paradigm.

James McWalter: Why is the data picture becoming more complicated as part of the energy transition?

Kyle: There are a lot of challenges with normalizing and comparing data across different markets and regions. ISOs and utilities have different ways of reporting data and different systems that providers have to integrate with, making it difficult to scale technical solutions. Additionally, running a transmission system and distribution network is very high stakes and dangerous. Up until recently, there hasn't been enough data quality to really understand what's happening where.

James McWalter: How can the software technology community better interact with these entities?

Kyle: The talent base of the data industry, and people who have been working in tech for the last ten to twenty years, are interested in working in these types of problems now. However, it's important to have a sense of humility about how complicated it really is to transition this analog system that is really based on electricity, which is physics. You have to reconcile the digital world with the physical world. The best thing we can do from the data community is to have a psychosensibility. We have a lot of expertise, but we also have to understand how complicated it is to run a transmission system and to run a distribution network.

James McWalter: What are the big buckets of data that renewable project developers specifically care about?

Kyle: Interconnection, geophysical and environmental constraints, permeability, and financial incentives and specific adders that may be available.

James McWalter: Where are the real challenges for data collection, data maturity, and modeling maturity?

Kyle: You can think about it as a two by two matrix: simple versus complex models on one spectrum, and high data maturity or low data maturity on the other. The highest value data science products are really with simple models and low data maturity, or complex models and high data maturity. For the four major buckets, interconnection has higher data maturity from the developer's perspective compared to some of the other buckets like permeability. Permeability is more qualitative in that people have hypotheses about what makes a workable project. Geophysical and environmental aspects, as well as the financial and political revenue side of the equation, have high data quality.

James McWalter: Zoning and permitting is super interesting. Can you speak to some of the guidance that developers have around what makes a good place to build?

Kyle: Developers have hypotheses about how communities may react to a proposed project or where the optimal place to develop is. Some townships are more friendly than others, but these are a lot of hypotheses that have yet to be backed empirically.

RAW DATA

James McWalter           00:03

Hello. Today we're speaking with Kyle Baranco, data scientists at Paces. Welcome to the podcast, Kyle.

Kyle    00:08

Thanks James, it's a pleasure to be here.

James McWalter           00:10

Brian, to start, could you tell us a little bit about your background?

Kyle    00:15

Sure. So, I've been working at the intersection of energy and data for several years now. I originally got my start in the energy industry through marketing, actually. So my first job out of school was working for an agency that specialized really in everything under the clean tech umbrella. But through that work I got exposure to a lot of my clients were working on data projects. So uh, specifically like machine learning applications at the grid edge and getting a sense for how to leverage all of this data within the energy industry to really just translate those insights into more value and kind of harmonize this complex dance between renewable energy resources and demand. So, through that work, I just decided data science was something that was too interested to just kind of be on the outskirts on and dove full speed ahead. So I've been working in a technical capacity since then.

James McWalter           01:06

And so those first couple of, uh, energy focused startups, um, I guess, what do they have in common across the couple of companies? Um, before Paces, I'd say they were.

Kyle    01:17

All working on data solutions at the grid edge. So kind of taking advantage of that transition from a legacy analog, centralized one way system where you've got these big bulk power plants to generating electricity according to a fixed schedule and just kind of operating with demand as this fixed side of the function. Demand is not flexible to new technologies where you're putting batteries and the smart thermostats uh, into homes and businesses and industrial sites and using the demand side of the equation to optimize based on cost. So it was really about unlocking the demand side of the grid.

James McWalter           01:55

And why is the data picture, ah, becoming more complicated, uh, as part of the energy transition?

Kyle    02:04

That's a lot, um, I think overall there's a lot of challenges between normalizing and just kind of making these comparisons of, uh, data across different markets in different regions. So ISOs and utilities all have different ways of reporting data and different systems that providers have to integrate with. So there's a lot of challenges and scalability of making sure a technical solution works from utility to utility, from ISO to ISO. Because a lot of the times it's almost like doing business in a different country, or based on just how they regulate the industry, or just the different history of the institutional processes that they've set up to manage that data and how they make it available.

James McWalter           02:48

And within those organizations, because we've dealt with some of those organizations, you've dealt with them in previous roles as well. Uh, how can we, I guess, as a software technology community, better interact with these entities? So, as you mentioned, these are very old entities, these are fairly risk averse entities, and they're also typically not the most, uh, technologically sophisticated entities. Um, and there's this challenge where, both from a culture and a needs point of view, you have people kind of often talking across purposes between these large historical entities and new fast moving companies. Not just software companies like us, but also the renewable developers themselves. Um, how do you think we can, I guess, better divide or better, uh, bridge that gap?

Kyle    03:37

So, I think, at a holistic level, the talent base of the data industry, and a lot of people who have been working in tech for the last ten to 20 years, mostly in media and communications and consumer tech, there's a lot of interest in working in these types of problems now, which is incredibly exciting. And I think what we've seen with a lot of the grid is that it's this big analog system that has been around for a long time. So they've seen each successive wave of digitization that's occurred since the you adopt computers, these big mainframe computers, but it's not like a digitally native industry. And I think from the opposite perspective, with the most innovative data companies, they're digitally native. They're like Netflix, right? It's completely in the cloud. And that's where a lot of the data science talent has really come from. So I think naturally, we're already seeing the shift of focus back towards looking at the utility and grid data infrastructure from a fresh lens, and really leveraging a lot of the expertise over the lot that's built up over the last 20 years, um, into the grid again. And so I think what we can do as a data community is really just understand that it's a two way street. I think the fact that a lot of the data science talent is coming from a digitally native, this kind of cloud native environment where, yes, we're working for an ecommerce and a lot of applications where the stakes are. Lower. Just have a sense of humility about how complicated it really is to transition this analog system that is really based on electricity, which is physics. You have to kind of reconcile the digital world with the physical world. And a lot of applications that we're seeing now are really struggling with that. Uh, as we begin to integrate hardware with software for these data applications, you're kind of seeing the same thing with autonomous vehicles. We're still waiting for, um, a pure template to emerge where we can bring everything we've learned into building up, uh, the really rebuilding from a hardware perspective as well. So I just say the best thing we can do from the data community is have a psychosensibility. And yes, we have a lot of expertise to bring, and a lot of ideas about how to transition the grid to, um, a more digitized, flexible paradigm. But at the same time, we also have to have this understanding of how really complicated it is to run a transmission system and to run a distribution network. These are things that are very high stakes, very dangerous. And, um, up to this point, we just haven't had the data quality to, uh, really understand what's happening where. And you can't really do that until, um, you work together to understand where your gaps are and what you need to install.

James McWalter           06:34

Yeah, one way to think about it is, for a software company like us or bigger entities, your worst case scenario from an outage is people can't do their work right, which is still very bad. And people should have all the kind of processes internally, and, uh, data security and cybersecurity and all these kind of things to prevent those kind of issues ever happening. The worst case scenario for a grid operator is no electricity in millions of people's homes. And you have all of the knock on effects of that. Uh, you're hitting percentages off annual GDP for a given location if that goes on for more than a few minutes. And so the trade offs or the balance between risk, uh, reward, is just very skewed, um, across how both of these types of entities kind of think about the world. And so yeah, so it's like we probably, as the more open to risk, move faster side of the equation, have to probably go further to meet past the middle and actually go even further, uh, to work with those entities who are, um, understandably more risk averse.

Kyle    07:40

Right, exactly. Move fast, break things, can work in social media, but it's a little bit more complicated when you get into working with, uh, physical systems.

James McWalter           07:47

Absolutely. And so we're talking a lot about kind of this general idea of data, data, uh, at the edge, a little bit more specificity. But what are the big buckets of data that renewable project developers specifically care about? Make that concrete for us.

Kyle    08:01

Sure, I think so. You can break it down into some major variables, such as interconnection everything related to where is the grid, what are the attributes of the grid that, uh, relate to where I want to start my project and move forward with integrating into this complex system that is the electricity network? I would also say another big bucket is the geophysical or environmental constraints. So, from renewable energy perspective, you want a site where the sun actually shines. You're not on all south facing slopes. You want to maximize the potential of that renewable energy resource. And you also want to abide by low construction costs and not ruin any pristine environments, um, where it may be optimal to site projects as well. So you've got the big environmental component, um, you also have the permeability side. And this is what's super interesting from a social science perspective, but really understanding the nature of the communities in which you're trying to build a project. Because a lot of times the actual local governments have control over zoning, and they have specific opinions about where they want to build, how they want their community, what they want it to look like, really. So you need to have an assessment of what types of permits are required in order to build a project in a specific area, and what is like, the flexibility or specificity or the restrictions around those permits, too, like, what are your setbacks? A big flat field can actually look really nice on initial inspection, but the setback requirements are pretty onerous, or it's pristine, protected farmland. That makes it a bit more complicated, and you have to have a more thorough conversation. And then I'd say the last one is related to, um, if we're talking more about sticks, this is more about the carrots. So this is the financial incentives and specific adders that may be available where communities do want to redevelop a given area, potentially, like, have tax incentives build on brownfields, or tariffs are particularly high in a given area. Um, the financial component or the revenue side of the equation is, of course, very important as well.

James McWalter           09:56

And from, I guess, data collection, data maturity, modeling maturity point of view, where are the real kind of challenges? I think in the past, you've talked about, uh, internally about this kind of two by two matrix for data. Can you speak to that and how it applies to those different buckets of data?

Kyle    10:13

Sure. So I'd say that for the two by two matrix, you can think about, like, model. Sophistication on one spectrum, where you have simple versus complex models and you have data maturity, on the other scale, you have high data maturity or low data maturity. And so for the data maturity perspective, you can think about that as high data maturity is like digitally native data. You've been working in the land of APIs, in the cloud for a long time, so you can do machine learning at scale. Low data maturity is kind of what we're experiencing with a lot of this transition from an analog grid system to a digital grid, where, for the first time, a lot of recorded data is now finally becoming available, but the stakes are higher, and you really need to navigate that landscape a bit more carefully. So the highest value data science products, um, in my opinion, are really with simple models and low data maturity, or complex models and high data maturity. So complex models and high data maturity are kind of what you see with, like, Chat GPT, where you have a digitally native ecosystem and everyone has access to the same amount of data. So the real competitive differentiator is thinking through what is the most sophisticated algorithm that I can create and how is that going to unlock the most value for my customers. Like a lot of what you see at the big tech companies right now, I'd say another opportunity for value. And this is where the energy industry can really shine. As we begin to record a lot of the new data that was previously unavailable and work together with utilities and communities to digitize previously non digitized systems, is the simple model and low data maturity scale where much more care has to be put into how you collect and model and structure the data rather than the sophistication of the algorithm. Necessarily. And so, as this applies to the four major buckets that I outlined, I would say interconnection actually has fairly higher data maturity, um, from the developers perspective compared to maybe some of the other buckets like permittability where a lot of the times, the good hosting capacity maps apps actually do a fairly solid job of directing developers to the right places to site their projects and attempt to connect to the grid. But you can still think about it from a low data maturity scale because you have all of the interconnection cost data, the queue information and, uh, these other dimensions that really paint the full picture of what it will cost in order to connect to a grid at the given area and assess where is the optimal place to connect to the grid. They're coming from a lot of disjointed sources, even within the same territory and utility perspective. And it's really, really hard to normalize synthesize that and get a complete picture of the state of the network in a given area. It's even harder to do that when you're making these apples to apples comparisons or try to assess strategic trade offs between one region or another. So I'd say for these four buckets that we've outlined, the interconnection one is a bit complicated. Permeability is also fairly low from a data quality perspective, and you really need simple transferable models. So this is one where a lot of developers or people have hypotheses about what makes a workable project. And I'd say they're very qualitative in that they have intuitions about how communities may react to a proposed project or where the optimal place to develop is. Some townships are more friendly than others, but these are a lot of kind of hypotheses that have yet to be backed empirically. Um, that's going to be really challenging, but also high opportunity to finally test those hypotheses and collect the necessary data to do so. And then just finally touching on the geophysical and environmental aspects, as well as the financial and political, um, the revenue side of the equation, say geophysical and environmental actually also have high data quality, where you have a lot of, like, satellite imagery. In the US government does a great job of doing these very sophisticated surveys of the topology of the given area and the, uh, flooding studies and mapping out flip blades and stuff. So I'd say from a geophysical perspective, the, the data maturity is fairly high and you see a lot of, like, complex, you know, satellite imagery, machine learning applications on some of the, some of the data that's being collected in that space. And then finally, from a financial perspective, again, I think of the energy industry, the markets side of, uh, the industry is actually fairly advanced from a data quality perspective, because in order to transact in an energy system, you kind of have this digital record, which is like the market price. And so there's actually been some good analytics that are making these, uh, trade off decisions between looking at a given market or not based on price. And you can actually get into some good analytics and adders um, but it's a bit more complicated because in some cases for developers like Community Solar, you're trying to parse legislation and these big documents are kind of with a price signal that you kind of have to assess whether to go into a given area. Um, so it kind of depends on the specific use case that determines the data maturity in that arena. Um, but I'd say it's a mixed bag and generally you can't get easily available price information. Um, on the revenue side of the.

James McWalter           15:19

Equation, that zoning and permitting piece, um, is super interesting, we've heard, and obviously we're not going to share any true IP from any of our developer customers. But, um, it has been also interesting for me talking to different folks about some of the heuristics that they've developed around. Is a given community going to be pro or anti large scale solar? Um, and so can you kind of speak to some of the kind of interesting different perspectives and sometimes contradictory perspectives of developers as they've talked to different communities and some of the guidance that they've had around, uh, what makes a good place to build?

Kyle    15:54

Sure. So I think the two main axes to, uh, compare how community sentiment determines favorability for sustainable infrastructure development falls along to two axes. It's, um, time and space. And so a lot of the hypotheses that developers have been mentioning to us fall along the timing hypothesis. There's this kind of idea that based on your knowledge of a given area, you can identify the sweet spots of permitting within specific communities based on the attributes and demographics of that community and existing generation. So if you look at what I like to call the Goldilocks hypothesis, where a lot of developers will say, we really like areas with one to two projects in the given jurisdiction, because that shows that there's a clear path to permeability. Like you can build solar there, and there may be an existing process within the community to approve that project. But you're not getting into the five, six, seven or eight zone where you're having so much solar built up in an area where people are a little tapped out. They're tired of looking at solar panels, and they're more willing to whip up sentiment. And you don't want to be the developer that enters that jurisdiction and tries to go through the permitting process. And you're the one that kind of sparks the changes in the legislation that make it harder to build your project. So timing the sweet spot within these communities is very important. I would also say, related to how the timing can change, the sentiment can change over time. The sentiment can also change based on proximity to nearby townships and nearby communities as well. So if you look at a given area where and this has worked both ways for a community that is pro solar and maybe like welcoming renewable energy development, a neighboring community can see a couple of successful community solar projects in one area and say, hey, we actually want to facilitate similar types of development. It's built on a landfill. It's really providing tangible benefits to our community. We want to get out ahead and begin to adopt solar friendly legislation and make it explicitly so. And you can also run it into the opposite scenario where maybe a community is becoming more saturated with lots of solar. And depending on the surface of connection between these two, AHJs, you are looking at maybe how quickly can the idea that, uh, a community is becoming saturated and getting sick of solar. Another community nearby may want to get out of ahead of it and adopt a similar legislation to prohibit it and make sure that they don't fall on the same development path. And that could really be a function of how tightly entered those communities. How many school districts do they share? Is there a lot of overlap and commuting patterns between the two? The people in these communities talking, are they similar? Are they dissimilar? So there's a lot you can do in kind of like tracking the state of the authority, having jurisdiction over time, and how they relate to one another through space to try to understand how these cultural aspects that dictate legislation may disseminate, uh, between the two.

James McWalter           18:52

Yeah, I think one of the really interesting things about this is part of my kind of influence of this coming from i, uh, was in the finance space for many, many years, and I was working with, um, traders who were building very large models to try to predict the price of different stocks. Right. And one of the really things that being in that world taught me is often they wouldn't know, uh, they would throw lots of data about and try to find some predictability, but they would very rarely have like a truly causal explanation. Right. Um, it's like, okay, uh, this stock is correlated with something out there in the world. But why is it correlated? Well, it's actually less important to a trader because they're just seeing it's correlated. And so if this leading indicator goes up and makes the stock go up, once I see the leading indicator go up, I'm going to buy more of that stock, sell it when it's high, all that kind of thing. I think what's really interesting about the space around the permeability and working off the kind of the experience and the heuristics that the developers have developed is that when I've talked to them about some of these kind of similar elements. Um, and we talk about, uh, paces about trying to figure out ways to aggregate this. There's a lot of, like, oh, Pace is trying to build maybe some sort of qualitative score. And it definitely makes sense, right, when you're kind of thinking through it's, like, is this just an amalgamation of a lot of qualitative information from developers themselves? But I guess one of the ways I think about it is we actually don't know what the correlations really are. Right. Um, a developer or a set of developers who have a specific theory of the case for why a given jurisdiction is going to be easier to permit may, um, be right, but they might be right for the wrong reasons. Right. There might be some other correlate of data. Right. So a classic example is on the demographic side, what's the average income of the community is, uh, the project near a specific type of school district. There's various things like that, that the developer, um, might intuit through other kind of qualitative information, but they don't actually, because it's a very hard data problem, they don't have access to the underlying, uh, maybe causal, uh, information or the high correlate piece of information. And so how do you think about, um, that mix? How to kind of manage, um, things where what looks like the actual reason may not be the reason, and there might be a discoverable process to find an underlying deeper reason.

Kyle    21:15

Sure. So I'd say, yeah, like the stock examples. Like if, uh, those one of those classic cases where for data science and modeling problems, you just have your straightforward input output map, you have the target variable of what you're looking to predict, and you have a lot of related data that you know there are correlated in some way, and you're just basically fitting the data to the curve. Um, I think when you're operating in landscapes with lower data quality, like you mentioned, you have to begin to assess causal models and try to paint a more detailed picture of how the process you are trying to model actually works. And that's where we're really hoping to get into testing those hypotheses, I would say. So for the interconnection landscape, um, a lot of the most innovative modeling approaches have really become these combinations of machine learning approaches and purely supervised learning and fitting the curve to the data versus also understanding the power flow. And doing actual power flow and production cost simulations to kind of, like, marry this detailed causal model and, uh, this picture of how we think the world will work with also the macro picture of learning from data at a massive scale. So as we enter this specific modeling problem, I would say testing. It's kind of this complex dance where, yes, it's very qualitative, but for the first time you can create this kind of two way street where the developer or whoever is operating at a very detailed level in a given community can begin to do the pattern recognition a bit. And then you can create this situation where you use data to test that pattern, you test that hypothesis that the developer or whoever's operating in the area puts forth. And you can do that at a rapid scale if you're collecting strategically the right data to test those hypotheses and begin to paint this picture of what is correlated, what is not correlated, and why are we non? Is that surprising? So I think product development really here looks almost more symbiotic between the person using the application versus the actual data, rather than the data like learning and providing a recommendation. It's just much more of a two way street. Much like one of my favorite products, um, on the market from a data science perspective is Spotify, because Spotify doesn't exactly have a picture of spotify doesn't have an exact picture of what your music taste is, but it does provide really intelligent recommendations and it learns really quickly when you provide feedback to the Spotify system. And it's just a perfect example of your music taste. May change over time, the nature of new music relates gets released, but the system is evolving and sucking in as much data as it can about the community. The developer is also operating the community and you're really just leveraging the strengths of each empiricism at a massive scale from the data system, uh, side, as well as the pattern recognition, the hypotheses, the creativity and intuition, um, from the developer, the user side.

James McWalter           24:08

Absolutely. And that is exactly, exactly, um, what we're kind of using to influence some of the kind of products, uh, and hopefully eventually product direction of, uh, paces. Um, this has been great, Kyle. Thank you so much.

Kyle    24:21

Thanks James. Been a pleasure.