Jef Caers presentation - Large-scale non-stationary geostatistical modelling in surface and subsurface hydrology Artwork

IAH Groundwater's Podcast

A mixture of groundwater/hydrogeology related lectures, discussions and more - aiming to to further the understanding, wise use and protection of groundwater resources throughout the world.

All Episodes

IAH Groundwater's Podcast

Jef Caers presentation - Large-scale non-stationary geostatistical modelling in surface and subsurface hydrology

February 08, 2022 • IAH Groundwater

Keynote lecture sound recording from IAH Brussels congress in 2021. Our thanks go to Jef Caers and Stanford University for giving us permission to use this presentation.

Professor Jef Caers presentation: Large-scale non-stationary geostatistical modelling in surface and subsurface hydrology.

Marijke: Good to see that all of you are back inside and didn't stay outside on a sunny terrace in Brussels because, actually, it is the weather to be outside. I'm very happy to see that you're all here, back for the keynote of Professor Jef Caers. I'm very happy and proud that Jef is with us. Professor Jef Caers is professor at Stanford University but he actually has Belgian roots. He was born and raised in the Campine area, I think, in Flanders.

He got a degree in mining engineering and a PhD in Engineering from Katholieke University, not so far from here, but now he's a professor of Geological Sciences at Stanford University, and also director of the, sorry, Stanford Centre for Earth Resources Forecasting. His research interests are in decision making for the exploration and exploitation of Earth resources as you will see. He also has written four books on geostatistics and uncertainty and he was also awarded the Krumbein Medal from the International Association for Mathematical Geosciences for his career achievements. Today he is our first keynote lecturer of this conference. We are very happy to have you here, Jef.

Jef Caers: Thank you, Marijke. As the slides are going to come up soon, I can, of course, not hold back to reminisce a little bit about a trip that I made growing up as a young boy in Tongeren which is also a good beer, in Belgium. That time we didn't have cell phones, we had black and white television, and we had two channels, BRT 1 and BRT 2. They would start at six o'clock in the evening and [id1] end at 11:00. Things have changed, and that's also something that I want to talk about, particularly climate change has become a big issue of our time.

If you live in California, which is what I do, we are dealing significantly with climate change and my mom is always worried about the droughts and the fires. I've lived now half my life in Belgium and half my life in California. I can say that over the last 5 years or 10 years, I've seen a dramatic change happening both, of course, in the long-term drought situation that has happened in California and the forest fires, which have been devastated. Particularly last year, I had decided to purchase a industrial-style air purifier for my home, simply because it was impossible to breathe inside. This is where we are today and we have challenges.

Before talking about that, I, of course, would like to recognize all my students and collaborators who have worked a lot in this field.

Climate change brings a number of challenges. Number one is we have to quantify it and the second one is we have to address it and I will talk about two such topics. The first is in predicting sea-level rise, which is, of course, a big problem, certainly, you live in the low countries but also if you live in the Bay Area. Sea-level rise can be a big issue, particularly when we're looking at melting of glaciers. I'll present a study on the Thwaites Glacier, which is also called the Doomsday Glacier. Doomsday, meaning that it's a glacier that sits in the Western Antarctica.

It's like a cork in front of a bottle of wine, that if that cork moves out, then sea level will rise by about four meters, so this is important to predict of what's going to happen with this.

The second, which is also in the topic of this conference, is around water resources. In that case, I've also a case study related to finding water resources in Australia. Again, we're looking at something that is very large scale. In Australia, they are doing significant efforts to collect data, in particular geophysical data, for mapping not just minerals, which, of course, is an important economic resource, but also water. As you know, Australia also has to deal with significant drought.

What this requires then, is understanding variation at the continent scale. Having started in my career early on with geostatistics, often you do geostatistics on a small area. Now, the task is to do geostatistics on the scale of a continent, and so the question is how to do that? This is a computational problem, but it's not mainly a computational problem. It's mostly a methodological problem. In particular, a problem of non-stationarity. That means that the variations you're going to observe are not following any of the conditions that are often required for doing geostatistics, which is stationary.

Stationarity, it's non-stationary, sorry, is something that simply means variation of statistical properties in space and/or time. Those variations can be on many properties. The mean can vary, the variance, the variogram, which is the traditional way of measuring spatial correlation, but the problem is more fundamental than that.

Let's do geostatistics 101. What is the basic model for geostatistics that's been used since Matheron? Now there's the random function model where he's, basically, saying that we can express a signal as a sum of a mean, mean that is unknown but deterministic so it's not stochastic. Any residual that is stationary has zero mean and is stochastic.

Why do we do it this way, and why do we interpolate data that way? There's two reasons for that. The first reason is that we don't want to make interpolation such that you see where the data sticks out. That means artifacts. That's going to be something that we will have to pay attention to when interpolating large amounts of data. The second is, of course, that we want to do this statistically. Statistics means that we'll need to gather information and we need to make estimates of statistical properties. That means requiring some assumption that these properties are representative for that particular parameter we're estimating.

That's a classical paradigm in statistics. That can only be done if you have stationarity. We see this problem or this conundrum that we have of estimating something statistically, while we know things are not stationary. One of these many things that had been developed early on was called universal kriging. In universal kriging, you have a data set here you want to interpolate and you see clearly in that data set there's a trend. The way geostatistics handles that is in two steps. The first step is estimation of the trend. Typically, we would, say, have some parametric model and in the parametric model you fit the function and you do what we call a least square fit.

A least square fit does not interpolate data. It just minimizes variance. For example, here you have a simple linear trend, then you take that and you subtract that mean from it and you get and calculate the residual variance. Because now we know, or at least we hope, that our data is stationary, after we have de-trended it and we can do a variogram estimation. Once we have the trend and the variogram then there's a theory that can tell you how to merge those two together and perform kriging, which as you see interpolates the data perfectly. The problem is bigger than that. There's a lot of discussion of what is the mean function? What is the variogram function?

If the variogram function varies in space and time, the problem is much more dramatic than that and certainly when you go to larger scales. The first, of course, is that we heard this morning a talk about conceptual model. We have a conceptual model ourselves is that spatial non-stationarity lies in the conceptual model itself, of course, when you go over very large areas. Mountains vary differently from valleys. Mountains are rugged, they vary a lot, valleys are flat. Also, when we do many of the physical and statistical modelling, we typically have a hierarchy and this is very important. We start from a conceptual model and you see this is similar to people who do physics.

We start from a conceptual model, we prioritize, we think about the physics, what's there, then we introduce parameters of physics such as certain parameters like permeability, porosity, and then we can do simulations with that and creating a 3D visualization or representation which is called a spatial model. The same is the case in geostatistics. There's no difference there. The conceptual model often we assume Gaussianity. Then we have parameters. Often we then have to deal with mean and variogram, and then we create multiple models and those are called our spatial models. What I'll then present is two things.

The one is, how we handle conceptual model uncertainty over very large areas. I will apply in interpolating the bedrock of Antarctica. Then the second is to do nonparametric large-scale trend analysis. One thing that we have come across as a problem, moving our geostatistics from the small scale to the larger scale, is finding that there's not that many trend analysis methods out there that are very good.

We had to design our own that can be applied at that large scale. Here is that again, going back to the Pine Island and Thwaites Glacier, and now you see maybe in more detail all those squiggly lines on there. I took out a very small area near Pine Island Glacier, and those are radar flight lines. Radar imaging [id2] can penetrate the ice and therefore you can measure topography. You could say that's a small area. I'd like to give a little bit of perspective. That's the area, okay? You want some more perspective? That's the area we're trying to interpolate. Here we see the area of the Thwaites Glacier and Pine Island Glacier that is been mapped by this radar imaging data.

There is an area or sub area of geostatistics that some of you know, some of you don't know, called multiple point geostatistics. It's a very elegant idea. It's an idea where we say, well, and this is very common, I have some information about how things should look. I have some data over there and I want to apply it over here. I get that question a lot. I come back to that question many times. You have something elsewhere. You've already done that. It's a lot of dense data. Now you want to apply that elsewhere. That's a common problem, this problem of the learning or training and applying it elsewhere. The problem of the learning and the training and applying it elsewhere is much more than the machine learning problem.

Why? Because the first question is, is it applicable and transferable? That's the more fundamental question. That's not the question that is being addressed in multiple point geostatistics. In multiple point geostatistics we just simply say, here's a nice area, and I'll show where that comes from, that I believe is relevant or representative[id3] for that small area over there. There are algorithms developed that can borrow that information way beyond the variogram and create realizations models that look like that and are constrained to data perfectly. You're mapping, imaging the flight line data.

Now, of course, that's an elegant idea and there's many algorithms developed, but, of course, over the years people to realize when you go to the real world and you come outside of academia is where do I get this training image? Where does it come from? Is there only one training image? Maybe there are many. Then, of course, the fundamental question, is this representative for something? Then how do I do that over the entire continent? Obviously, you can't just say let's find a training image for Antarctica. Maybe you can go to Mars or something, but for the entire Antarctica that seems like a big problem.

The other thing, of course, is that even if you had a training image or if you have a number of training images, you would not know to what extent they are representative for your area. In that sense, we now come to this important dichotomy of the Bayesian hierarchical model, which is there is an uncertainty that has to do with your conceptual model. That's here, probability of training image given D, and D is just flight line data. Then there is the other uncertainty which we called spatial uncertainty. That is, if you have a training image and you have flight line data, then you can create models. Of course, this decomposition is very important.

In fact, the first part, the part on the right hand side is probably the most important one. This is a very intriguing question because it's a question that has intrigued me recently, because even if you solve inverse problems of any kind, this problem occurs regularly. We saw this morning a couple of talks around, well, I don't really know what the physics says and so on. There's uncertainty on the physics and so there's a conceptual uncertainty. What's interesting is even if the data is linearly related to your model, such as flight line data, it's actually not linear related to your conceptual model. Even if you have a linear inverse problem, you in fact have a non-linear inverse problem.

That is a problem that needs to be solved, this non-linear part. Where do we get training images? We know for one that the Arctic, and this has been studied geologically, is similar to the Antarctic in terms of the de-glaciation that it has. Now, we all know that the Arctic has become iceless over the summer or sine less ice. We can go to the Arctic and we can collect topography information there. That's number one. The second one is that there are areas in the Antarctic that are outcropping. You see here on the right hand side, that there is exposed Antarctic bedrock topography.

Doing that, therefore we can go to the Arctic and create a whole bunch of nice images that could be representative for areas in the Antarctic. You end up a priori with a set that you collect. In this case, we collected 166 of those. A priori, you have, let's say 1/166 probability of that. A particular training image out of the set is representative for a particular area in the Antarctic. What we want to do, however, of course, is to model this probability of a training image in each of those four areas where I want to assign the conceptual model. A priori, I'm going to say that for all of these four areas, there's a one over 1/166 [id4] probability of this training image being present.

I use a Bayesian approach. Then, of course, I need to update with the flight line information that is available. Now, there's a number of problems here. First of all, there are many training images that could potentially be representative. Secondly, we need to do this in a spatially coherent fashion. I cannot just define a training image for area A2 independently of area A3. I have to account for spatial correlation that exist over large areas. These are a number of things that need to be done. However, at the end, what I'd like is what we call a posterior probability here of the training image, given the flight line data. Here you see, it goes from 0 to 166.

For area A, we see, for example, that training is 18 and 114, that they're all very representative. That is what I'd like to get to. Now, I'll talk a little bit about how we get there.

When you're dealing with complicated things like objects, and we'll talk a lot about objects today because our world, in essence, consists of geometry. How do we represent geometry mathematically? That's an incredible challenge because geometries are complex in they're high dimensional. Training image is, in a sense, nothing more than a geometry. If I have a training image on the left side and I have a training image on the right side, I have two objects and I'd like to know how similar those objects are. A measure of similarity.

Why is that important? If this training image would be representative for an area, then it is very likely that this other training image is also representative for that area. Knowing what that similarity is, is key to solving the problem of probability. One way to do that is to extract a bunch of patterns from this training image and other patterns from those other training image, and then define a difference. There is a difference with the bottom here, which is called the Hausdorff distance, is a difference between two sets. It's a mathematical concept where if you have two sets, you have a set of pieces here and a set of puzzle pieces there, it can tell you what is the difference between those sets of puzzle pieces by using this equation that is based on the minimax criteria.

I don't want to go into the detail of that. If you have a distance, then you have what we called in mathematics a metric space. A metric space is nothing more than saying I have something that measures differences, but I don't know the origin of the coordinate system. We don't want the origin of the coordinate system, because if you're high dimensions, then the origin of the coordinate system is very annoying. We'll get rid of that. The only thing we know is how far they are apart. However, there are statistical techniques such as multidimensional scaling that can map things from a metric space into a Cartesian space of any dimension. Here we created such map, and this is in a map now of training images.

Any two points that are close, as you notice, are similar. We have now provided ourselves a nice beautiful mathematical coherent space in which we can represent high-dimensional objects. Then we come to the Bayesian problem, what is the probability of the conceptual model in all four areas, given the data in all four areas? Then I will, of course, do that for the entire Antarctica. The first thing is to, what we call, assume conditional independence, which is not that bad of an assumption here. You can, basically, say that the data here makes those random variables independent. We write it like that.

That's not a big problem, but that is still a big problem to solve. We're going to keep cutting it up and define it as a probability aggregation problem. A probability aggregation problem is nothing more than saying if I have an unknown, and I have four data, then I can split that up into an equation that aggregates each individual component, so the probability, for example, of training image AI given D, A, I, J. How does that work? Well, it's actually quite simple. What we can do is sum, or make products. Probabilities are annoying things because they're between zero and one. To get rid of this annoying property of probabilities, you can make what we call log ratio.

A log ratio, if you take a probability, and you make a log of an odd ratio, that R term is between minus infinite and plus infinite. That's nice because then I can do summing and whatever that I want to do. I have now and it can create an equation where I simply sum up all these terms. Aggregate and the target for me is to find RI, which is my probability there at the top as a weighted combination of these aggregations. Here comes, this weight is quite important. Why? Because that weight is going to account for the fact that there is correlation in the data. Easy to go here.

For each data, I can estimate the probability of a training image in Area A, given the flight line data in Area A, probability for Area A given flight line in Area 2, et cetera, I can estimate those. Why can I do that? Because I already have these metric spaces within which I have represented my training images. I can estimate these probabilities very fast. Then what I need to do, however, is weigh these all together into that term over there. That weighing is quite important. It calls for knowing what is the correlation between these plots here. That correlation is due to the correlation in the flight line data. The flight line data in Area A1, and the flight line data in Area A2, they are correlated.

How do you express correlation between two big data sets from area A and Area 2? It's not just one value. It's two big data sets. Well, to do that, we need to extend the notion of variogram. The notion of variogram is something that we've used for many years and decades in geostatistics that tells you what is the correlation between any two points in space that are separated by some distance H. Well, it's simply the average square difference. Now, you have not two values, you have whole blocks of values. One way to do that is using this trick, where you use a kernel density smoother around this data set.

If you would like to hear about that, you can read a bit more about in this papers written by a postdoc of mine, Frankie Fletcher. Once we have that, we are done. We can say for each area, I can now draw from this thing here, a conceptual model. I can create the top row. Now, I have assigned to each area a conceptual model. However, there is uncertainty. I can do that again and again and again. See that I can get realizations of conceptual models for each area. Once I have that for each area, then I run standard geostatistical algorithms that create realizations of topography based on the conceptual models of each area.

You see, there are two levels of uncertainty; the one is in the drawing of the conceptual model, the second is in the drawing of the spatial model given the conceptual model. That is done by a technique called direct sampling, which was developed in Neuchatel. Now, we need to go to the big scale. We do the same thing that we did to the four areas, but we go over the entire rest Antarctica. To do that, we're going to divide the area into these blocks. Now, I can talk a little bit about how this division goes. This division is not a statistical division. We're not trying to find areas with constant statistical properties, because we know they don't exist anyways.

We're going to find areas with constant flight line density because the flight line is going to drive the non-stationarity of the model and the realizations. Here we see that in areas where you have more dense, which is near the mouth of the glacier, you have more flight lines, you are going to sample and you're creating blocks that are much more narrow. Here at the bottom, we see, just as before, two realizations of training images or conceptual models drawn for each area. Then you have realizations of the topography of Antarctica, at West Antarctica at a 500 meters scale grid. Here, we see that a little bit more in detail. As you see, there's a couple artifact of the flight lines. This may have to be due with the fact that the flight lines themselves, there's some error. Overall, I think the nice thing about this technique is there's little to do. There's very little tuning parameters. You don't have to estimate a variogram and a mean and go through all these procedures. It's mostly focused on the real problem, which is the conceptual uncertainty problem. This slide, unfortunately, didn't come out very well. [chuckles] I saw that earlier. What I want, of course, is what do we do with this? There's two things we can do.

Number one, is to predict the geology under the ice because there's a direct correlation between the geology, which is sediment type, could be either a soft sediment or a bedrock, that's very important for sliding of ice, and the topography. Here, actually, is something that we did in that regard. The other thing you can do is to simulate subglacial hydrology by simply applying Sheaves equation and water routing model. Here we see that we have multiple realizations of how water moves under the ice in Antarctica. That's very important because what often happens is under the ice you get subglacial lakes. They're very active and dynamic lakes. They are ongoing. They get filled, and they drain.

What that does is it has a big effect on the movement of ice, and that is very well known. You see here, if you think again, our Belgian and [id5] its rivers is that we don't know where the rivers are at the size of Belgium. That's the problem that's there, and now, that's going to be very important. The next step is to translate all that into ice sheet modeling, and then, of course, seeing what the impact is on sea-level change. The last topic I will talk about is that of the mapping of the subsurface of water resources in Australia.

In Australia here, we are in the Musgrave Province that consists of what are called paleovalley structures. Australia has not been much affected by any glaciation or deglaciation. These ancient river networks are very well preserved under the regolith. What Australians have been doing, and is not just motivated by water, it's also motivated by minerals, is to take airborne surveys. The way airborne surveys are taken, again, you see the size of the areas over which this is done, and this is very typical. I also work a lot in mineral exploration. What is done a lot is that you take surveys at a very sparse flight lines and then in some areas you take surveys at very dense flight lines. We all do that, take areas broadly, and then in some areas we concentrate. This is what it looks like.

People who are from Denmark here we will see this a little bit similar. They have varied valleys. Here we have paleovalleys that need to be mapped. Here, where we’re [id6] interested is not in estimating perfectly where the surface is, but just estimating the trend, which is being guided by our geophysical data. As I mentioned, we had to come up with some new techniques around trends surface analysis, because in fact, creating a statistical model of a surface is not that simple and particularly when your surface has complex geometry. Sometimes you see surfaces that are folding and faulting.

How do you do statistics with surfaces? Well, one way around that we have developed is called level set functions and stochastic motion. Imagine you have a surface and imagine the surface is flat. How can you represent such surface mathematically? One simple way is to go one dimension up and instead of modeling the surface, you model the distance to the surface. It is also a sign distance. That means when you're above the surface the distance is positive, when you're below the surface the distance is negative. What that does is, in this case, in 1D you can go to 2D. In fact, you can go from an irregular grid to a regular grid. If you're in 2D, you can go to 3D. If you're in 3D, you go to 4D, et cetera.

The other thing you can do is you can perturb such function or surface using a stochastic perturbation which we call stochastic motion. Stochastic motion is nothing more than pulling at this surface, any surface, in such a way that you perturb the surface, it's like a force, but you have to be careful. Why? You want to perturb in such a way that you don't diverge. You create something that blows up and goes away into infinity. You want to have a perturbation that has nice properties. Such property in statistics is called detail balance. You want to create a perturbation, that is shown here of a circle, and you want to be able to go back to a circle.

That's not that easy. Imagine you have a ball and you deform it, and now you have to undeform it back to a ball. This is what this stochastic motion allows you to do. That is very handy. Why is that handy? Because then we can solve inverse problems because inverse problems are nothing more than making perturbations and maximizing or adhering to some likelihood function. Now, we can start with this technique to impart interesting properties onto our surface that could be data driven. Here's an example of that. Imagine this is the truth, and you have some data and I just let thing go. There is no parameterization of any kind, right?

You just perturb the surface and you get a surface that matches the observation and at the end, you see it's wobbling. That's uncertainty. Keeps on wobbling. The reason why this works is because of this detail balance. You can then apply that to-- Unfortunately, this movie I don't think that's going to work. Ah, yes. Beautiful. You can then apply that to the buried valleys and start to match your flight-line data. In this case, our purpose is to minimize the within class variance of connectivity, because we have a big connectivity contrast between bedrock and regolith. You see that? You're matching your surface, but at the end it starts bouncing up and down. That is uncertainty.

Then you can start saying, okay, that sounds great. What can I do with that? Well, one of the things we can do with that is to say, what if I have a data, this is a very common question, that is very dense but small? What if I have another data that is bigger but larger area? Can I use this? I get this question a lot. Can I use this for that, please? Do you have some magical geostatistical trick to do that? Well, the way you would approach this problem is to first take your geophysical data, run the trend, that is what I just showed, subtract the trend from your data and you end up with that at the end. Now, that at the end is at a 10 meter grid size. You see you go very narrow from the beginning.

Why want to use that? Because I want to be able to use that information to go from this very course 250-meter grid into a 10-meter grid. You say why do you want to be on a 10 meter grid? Well, actually it's very important. It's important to be able to make comparisons with other data, because we often take our geophysical data, which is on a 500-meter grid, and then compare with a borehole that is this big. If you do that, you will overestimate correlation. If you however downscale to that size and account for the uncertainty, you get a much more realistic comprehension of what correlation is between these various data sets. That's why this down-scaling is so important.

The way it's done is taking your flight line data, subtracting a trend. Then you estimating the residual using that training image. You have now something you can compare, then you add back the trend and you get your down-scaled [unintelligible 00:35:50]. Okay, that's all I got. Good news. Last week we submitted a paper to Geoscientific Model Development, and this paper was accepted for being put online. This is an open journal, and I think tomorrow or the day after you can go and look up this paper, and you can go comment on it as you wish, or ask us questions as you wish.

I'd love to have questions, of course, now and maybe after the session, but if you have want to think about it a little bit more, then go to the journal and find that paper. You can also, of course, send me email. There's also a database that we have put on for the sub-glacial topography training images. Thank you for your attention.

[applause]

Marijke: Thank you, Jef, for a very interesting talk. Are there any questions for professor Jef Caers? There are some fixed microphones in the room. They don't move. If you have a question, you can walk--

[Questions taken]

Marijke: Okay, any other questions? The room is quite dark. Sorry if I don't always see if someone wants to ask a question. If not, I propose that we stop here because the next session starts in seven minutes from now. You need quite some time to go up and down. Thank you, Jef. I'm sure if you have additional questions for Jef, you can find him later today during the coffee break drinking [laughs] enjoying his coffee. Then, the sessions will continue. There is a session here in this room, and as you know the others are upstairs. Thank you.

[applause]

[00:38:58] [END OF AUDIO]