Quantitude

S4E04 Partial Least Squares: Straight Outta Uppsala

October 04, 2022 Greg Hancock & Patrick Curran Season 4 Episode 4
Quantitude
S4E04 Partial Least Squares: Straight Outta Uppsala
Show Notes Transcript

In this week's episode Greg and Patrick talk about partial least squares, a technique that resembles structural equation modeling but with a lot of flexibility, including but not limited to its ability to accommodate both reflective and formative constructs. Along the way they also mention dark RedBubble, Sheep’s Kin, snorting SEM, the Coors Light beer bong, Hotelling's ghost, Larry the Cable Guy, the Marvel Metaverse, the Evil Eye, Badluck Schleprock, fika, Abba, and pieces left on the table. 

Stay in contact with Quantitude!

Greg:

Hi everybody. My name is Greg Hancock and along with my formative Mode B friend Patrick Curran, we make a quantity food. We're a podcast dedicated to all things quantitative, ranging from the irrelevant to the completely irrelevant. In today's episode, we talk about partial least squares, a technique that resembles structural equation modeling, but with a lot of flexibility, including but not limited to its ability to accommodate both reflective and formative constructs. Along the way, we also mentioned dark red bubble, sheepskin, snorting, SEM, the Coors Light beer bong hotel rings ghost, Larry, the Cable Guy, the Marvel Metaverse, the evil eye, bad luck, schlep rock fika, Abba, and pieces left on the table. We hope you enjoyed today's episode. Have you happened to check RedBubble and see all the cool stuff that we have? I have not, there's so much cool stuff that we have. I just thought I should take this opportunity to let you know, obviously, you know, we have stickers and notebooks and all of that. I would like to thank the people out there, not just for posting your really cool pictures of you with the merch. But as we mentioned in the tag, all of the proceeds go to help donorschoose.org. And what that means is that the stuff that folks out there have bought have helped dozens and dozens of classrooms through things like calculators for math classes, hands on materials for teaching statistics, a whole variety of things. And that's all due to the folks who are listening out there. So thank you very much to everybody.

Patrick:

I may have misunderstood something. I thought we were doing this whole podcast to become rich. I've been waiting for three years for the check from you. And I've just been trying to be patient. I don't want to be a wiener about it. Did I misunderstand how this works?

Greg:

I think we can solve this if you just Google academia. Sorry.

Patrick:

But thank you everybody. And it is so fun. Then my kid drew a couple of things that are up there in the corner, toyed and whatnot.

Greg:

Now, there's some things up there that your daughter didn't draw, and that my daughter didn't draw and that I didn't draw and that none of us Drew.

Patrick:

I'm sorry. Is there like a dark red bubble that we're part of that I don't know about yet? Here's

Greg:

bootleg merch Can you not? There's more fake quantity and stuff than there is real quantity and stuff on there.

Patrick:

This is a real question. Why I yeah, I mean, there's no money involved. We have a market share of negative$1,000 a year. I mean, it's a real question. Why on earth like bots, are they just trying to copy stuff? I don't

Greg:

know the answer. I was wondering if it was bots also. But there's a bunch of stuff that says quantity food on it and different fonts and then there's a ton of stuff that says Jiffy on it which Jiffy by the way is extremely

Patrick:

happy about. Hi guys. Hi, Jiffy

Greg:

jiffy. Is a your side hustle. No, I just kicked your very tastefully done. I bought five. All right, little buddy. Thanks. Bye, guys. Have a great episode. I will make clear to folks out there that you should be able to get whatever you like. But it's only the authentic quantity pod merch that actually makes it back to donorschoose.org. And how do you know if it's authentic? It has the name of who posted it beneath it. And it says by quantity pod or something? Anyway, so I thought that was kind of interesting that we have fake stuff.

Patrick:

I am fascinated by that. I don't mean to like belabor this, but I totally get making a fake account and selling things with like a Nike swoosh or like a Ferrari logo. We're nothing. We own nothing. I am perplexed by this.

Greg:

Maybe this is actually one of the best indicators of where the economy is. Currently. I remember when I had my first car what was your first car

Patrick:

my dad gave me his Volkswagen Beetle 1969

Greg:

The one that you raced in our control episode I said Yeah, right up. I had a 1976 Mercury Capri to it was a hatchback. When I got it used. My mom got me these three seat covers, and she was so happy to do that. So she got me these and she was going there sheepskin there sheepskin. Like, you know, this was some big deal and like, okay, Mom, thanks. And so I put them on my car and they were very comfortable. And I looked at the label, and it said sheeps kin so I never had the heart to tell her that it wasn't actual sheepskin, but I thought that was like the best fake name ever. sheepskin.

Patrick:

It's so funny. You say that because just a couple of weeks ago I I tried to do my own car repairs when I can. And with YouTube now and things that you can order online is within reason. I can do a fair amount of my car repair, I needed to get this particular part for the record. I searched online and I found genuine Honda replacement parts. Because I didn't want off brand, right I wanted it actually made by Honda. And I was about to check out and it kind of struck me that it was half the price of what I'd found in other places. And so I jumped in another tab and I searched genuine Honda parts. The company name is genuine. It was genuine 100 replacement parts. Right. And I thought that was brilliant. I almost wanted to buy it just because I so admired that it was a sheep's can. Yeah,

Greg:

exactly. Well, so our topic for today is maybe in this spirit, I don't know, that remains to be seen whether or not what we're going to talk about is sheepskin, or sheepskin. What we're going to talk about today is something called partial least squares originally, when you were up visiting me over the summer, and we were talking about topics that we were going to cover, we had said oh yeah, we can go through these different estimators like ordinary least squares and maximum likelihood and to stage least squares and partial least squares, and it was just sort of uttered in the same breath.

Patrick:

It was gonna be in one episode. Okay, well, that's a whole other thing.

Greg:

But it turns out partial least squares is not actually an estimator. Did you know this?

Patrick:

I do now.

Greg:

Yeah. So partially squares, which we're going to unpack a little bit here. The funny thing is, is that there's a whole segment of the world that is just tied to partial least squares like it is their jam in the early 2000s. If you go to the management information systems, literature, or other fields, like chemo metrics, this is it right? We did a PLS, we did a PLS, we did a PLS in our world being primarily a social science world. This doesn't cross our plate a whole lot. Exactly. And

Patrick:

the reflection of that, at least to me is for you and me. Our day jobs are kind of centered around structural equation modeling. Yes, I've been in the game for 25 years. And I thought it was a method of estimation. I really did. And that is not it at all.

Greg:

That's right. And in fact, you mentioned structural equation modeling. You and I we didn't just drink the structural equation modeling Kool Aid. We went to Studio 54. And snorted structural equation modeling off the bathroom counter and came out rubbing our nose so we're deep in

Patrick:

Nope, you did I Coors Light beer bonded that's only half a joke, which is that's all there was right? It wasn't like I could pick from this or this or this or this when you and I came up through the system. bowlings book came out in 89. And I use that book in my first sem class in 1990. Right, this was the only game in town when I came up through the system. Now ironically, the foundation for partially squares was developed years earlier. Right, it had not permeated at least my instructional system that I was part of. And what I find fascinating is I think that's impart their literally continental preferences over this. Much of the work in the applications are dominated in Europe.

Greg:

That's absolutely true. Structural equation modeling and this thing called partial least squares that we're gonna start getting into. Both of those weren't just born in Europe. They weren't even just born in the same country, which is Sweden. They were born at the same university, who Uppsala University, as coming out of a guy named Herman bold Uppsala firm involed. Herman voelde was the advisor of Who do you remember?

Patrick:

Okay, I remember but you're gonna mock me for saying his name. So I'm just gonna say Carl, and then you can say his last name because Tova taught you how to say yes.

Tove:

That's right. Hey, this Toby Larson, Greg, Swedish coach and occasional quantitative linguistics consultant.

Greg:

So it's first name isn't even Carl, come on. Have you? Throw me a bone. All right, fine. Fine. His American name is Carl. His stripper name is Carl. His actual name is Colgate que je and last name is Jada scug. Yeah, the

Unknown:

pronunciation is slightly off on that one.

Greg:

So Carl, your Skog as we call him in the US was the student of Herman bold and Carl your Skog is you know the father of the structural equation modeling that you and I practice that really He had it seeds planted back in confirmatory factor analysis in the mid to late 60s, where structure was imposed upon that latent variable system, and then started carrying off into the 70s. And it didn't just carry off on a theoretical level. And I think this is really, really important. It carried off with software to, it's great when there's the mathematics of these kinds of methods. But if you can't put it in people's hands, and there's a problem, there was LISREL. And there were punch cards that went with LISREL, and mainframes and all of that kind of thing. But when we think about how structural equation modelling got a head start, the pieces were in place, not just in terms of the mathematics and the theoretical foundations, but actually the software to be able to do it. And I think that's a critical thing. But you know, after structural equation modeling started taking off Herman Gould sort of watching his academic child go off and do great things, he sort of scratched his head and said, oh, boy, cool, gay. There's a lot of assumptions wrapped up in your structural equation modeling thing in your list rel thing, and some of those things make me uneasy, and we should probably just rattle off some of the assumptions that exist, you're gonna pop quiz me on this, I don't mean it to be a pop quiz, but let's go go 30 seconds

Patrick:

within the SEM and what my understanding is of what Volt's reaction was was kind of twofold. One is it goes back to something that we've talked about on prior episodes, which is one of the key advantages of the SEM is that it's a priori. And one of the key limitations of the SEM is that it's a priori. So we have exploratory methods, we have DFA, we have Eigen values and Eigen vectors and rotation and all of these things. And then if you've had that in a class people say, yeah, yeah, but you're letting the data drive your decisions. don't saturate your factor learning matrix structured in a way that's consistent with theory. And that's a huge advantage. I mean, massive, huge, huge, massive advantage of confirmatory factor analysis in the SEM in general, except one, it's not. Some paraphrasing of volte is there are often situations where we are data rich, but theory poor, and that we need a methodology that allows us to do sem light things, but without being so shackled to a strict a priori parameterization of our model before we ever begin.

Greg:

So you'd said that there were two primary things, the first one then has to do with feeling locked into this very confirmatory very odd, priori structured kind of endeavor. What's the other one that you had rolling around some

Patrick:

of the asymptotic regularity conditions for maximum likelihood estimation? So what he seemed to be more concerned about was a rather strong assumption about multivariate normality, and that you have a sufficiently large sample size where these asymptotic conditions of consistency, unbiasness and efficiency and ask them todich normal sampling distributions, maybe those don't come online, given the characteristics of the data that we have in our sample? That's absolutely right.

Greg:

And so his thinking was, could we do something that is, and this is where I really want to be careful of my language. Can we do something that is an approximation to that? This is where some people's hackles will get raised? Not PLS, people necessarily, but sem people, right? You and I had a whole episode on principal components analysis and how like one of your first disclosures after the spider pig theme, and

Patrick:

spider fig spider fig.

Greg:

After those disclosures, I mean, the whole episode really was about principal components analysis what it is, and that it is not a factor model. And some people just completely get their underwear and a bunch over that for sure. And there's a whole segment of literature from the 80s and early 90s, where people were just butting heads over. Is it a latent variable model? Is it not a latent variable model? Can it be used to estimate latent variable models, but at its core principal component analysis is just a composite model. And it's not the only way that we have to form composites. And so wold was sort of wondering, can we bring to bear some of the ideas of compositing rather than the latent variable methods that we have, but use it as an approximation to a system that otherwise looks a lot like a structural equation model?

Patrick:

And in that spirit, you know, whose ghost that I saw on my back deck as I was reading some of this material was Harold Hotelling. Ah, nice member. He's local. He's in Chapel Hill, but he was at UNC back in the 30s, but

Greg:

not buried in your backyard.

Patrick:

Nevermind Hotelling, who was the original developer of principal components analysis, never addressed factorial rotation because he did not care about the numeric Recall values of the weights from a substantive standpoint, he had a very practical motivation, which is he had a large amount of data. And he wanted to reduce it to a smaller amount of data. And whatever those optimal weights were, which came out of Eigen values and Eigen vectors, is that allowed him to get the composites, get a cup of coffee, say hard to merge out front, and then go do whatever he was going to do with those composites. Obviously, this is not principal components in PLS, but it has that spirit of Hotelling, which is we have a problem to solve existing methods work really, really well, if we meet those assumptions very often, we don't meet those assumptions. And this is a really nice alternative that allows us to do some things that we wouldn't otherwise be able to

Greg:

imagine that we had what you and I would think of as a structural equation model with some latent variables. And we don't need to dig too deep into labeling them yet. But let's imagine that we had to exogenous factors or constructs or call them whatever you want. And let's say one endogenous, that depends on both of those. So in our head, right, we're asking you to visualize use your mind's eye. Both of those have paths coming into the dependent or endogenous latent variable. And let's just say that those two exogenous factors covary for you, in our practical people, and maybe worried about some of the things that we're starting to allude to, we might say, why don't you just put a score in there for that first factor? Why don't you just put a score in there for that second factor, if just put a score in there for that third factor? And go do yourself a regression? I mean, why not? Right,

Patrick:

exactly. It's a GITR done kind of thing.

Greg:

Larry, the cable guy here,

Patrick:

but there's an element to that. And I actually find you and me quintessentially practical. Oh, nice. We're trying to achieve a goal we got an end game. We know what the high road is. What I mean by that is, if we are able to use a multiple indicator latent factor if that latent factor is properly defined, if we have adequate sample size, if we have adequate distribution, if the model is properly specified, that's kind of the gold standard. But as the pls people talk about with the data rich theory poor and violations of these asymptotic conditions, we need a way to achieve what our goal is. And this is a promising alternative for doing that. Yeah, we got an endgame right. It's all Marvel Universe. This is like the STMS endgame.

Greg:

So is this some other part of the metaverse multiverse? metaverse? Which is

Patrick:

it? You don't know?

Greg:

I just remember that Dr. Strange was like trying to patch some stuff up and it just got kind of weird.

Patrick:

What the hell was wrong with you?

Greg:

It was so long, I had to pee. So I left came back. And it's like, what? Hey, could

Patrick:

we get back on task here?

Greg:

I'm sorry, what were we talking about? All right. So let me actually label some of these hypothetical factors, right. Now. Let's imagine that we had to exogenous factors, and one of them was exposure to discrimination. And by exposure to discrimination, I might ask, let's say kids, to what extent have you felt that you have been discriminated against will say, on the basis of race ethnicity, on the playground, in your classroom, walking home from school, when you are outside of school, in the social media that you encounter, in the sports that you play outside of school, etc. So imagine we had this construct that is exposure to discrimination in the world where you and I live the structural equation modeling world that operates with a very specific structure associated with our factors, right, the latent variable has a structure that it inherited from the confirmatory factor world, where we assume that the latent variable actually causes its indicators. But in this case, I think a pretty good argument could be made that your exposure to discrimination is like a bucket. And each one of these kinds of experiences that you have fills your bucket up a little bit more, or fills your bucket up a little bit more, meaning that the arrows might actually go from the variables into this construct rather than the other way. And if that's the case, if we would agree that something like that is a reasonable description of what's going on in the relationship between the variables in the construct, sem gets really uneasy

Patrick:

about that. That reminds me of a good friend I had in grad school, literally linguae Lily is at the University of Washington at Seattle, your alma mater, Go Huskies. She and I were out for coffee and I was showing her a new latent factor that I was working on. And it was uncontrollable life stressful events and it was in children. It was a series of things that the child had no control over. But that happened to Now, so their cat died, their grandmother ran away. Wait, maybe it was the other way. But there were this series of things. And she was really funny. She said, but does that factor like an evil lie? Like somehow somebody, this hex on you? And I was like, no, no, it's lamda aside lambda prime, it took me 10 years to figure out what she was talking about. And it's this issue that you're saying I love

Greg:

the evil eye. I have referred to it as an incredibly old reference that maybe you won't even get. In the cartoon, the Flintstones there was a character named bad luck, schlep rock. Oh, lousy. Me. And bad luck Schleper act, just anything bad happened to bad luck, schlep rock. And so that construct can be if you've tried to frame it as latent, it would have to imply that everybody has a certain amount of bad luck schlep rock in them quick, what

Patrick:

is your wedding anniversary? It is June. But you remember bad

Greg:

luck? My dad, let me watch five hours on school days. So of course, I remember that. Worry. If you model it is a latent variable, you have got to own it.

Patrick:

Yeah. And we've talked on episodes before that. If you draw a single headed arrow, you go into the saloon, you order two fingers of whiskey, you throw it back, slam the glass on the bar top. And to everyone in the saloon, you say I believe my latent factor causes my indicator.

Movie Clip:

I'm Captain Augustus McCray. This is Captain Woodrow F call. I like a shot of whiskey. And so in my companion, i Besides whiskey, I think will require a little respect.

Patrick:

If you have a series of math items, it is not unrealistic to say you have an underlying math ability, that in part determines the probability you're gonna get an item correct. I can sleep at night thinking that the reason you got a 92% on your test score is because your underlying math ability caused those responses. But as you say, you start thinking about Wait a minute, there's some underlying latent propensity that causes you to be discriminated against in social media or on the playground. And the SEM with a traditional multiple indicator latent variable is very poorly suited to deal with that to the point that you have misstated your causal process.

Greg:

Exactly. Right. So now imagine that we have a model where we really are at odds from a theoretical standpoint, with a construct being represented in that traditional latent framework, it takes a lot of trickery to be able to get the standard structural equation modeling framework to do that. And even when we can, there are all kinds of limitations on our ability to do that. That's true. Well, let's imagine now that someone has a system where there to exogenous constructs are things that no they look at and say there's no way that's latent in the traditional sense, it really seems to make much more sense that it is as we call formative that the variables come together to form that particular construct. In the pls world, the traditional latent constructs where the factor itself is influencing its measured indicators, we typically call that a reflective system. In the pls world, they call that mode A, when the variables are coming into the construct influencing the construct, they refer to that as a Mode B system. One of the beauty of the partial least squares modeling framework is that it doesn't care. I don't mean that you don't specify it. But it says, you get those however you want. And we will take it from here. And I have to say, I find that kind of attractive,

Patrick:

and I find it kind of attractive in the ghost of Hotelling. That is a very similar distinction between principal components analysis and the common factor model. And this is where people threw stuff at each other through the 70s and 80s. And even in the early 90s, right is principal components analysis, a factor model, blah, blah, blah. But what you just described in the different modes really does distinguish PCA and the common factor model. And that is the common factor model is believed to have given rise to the set of items. And the reason we observed the correlation among the items in the way that we did is because they have a shared underlying cause. One might say a common cause, while might say a common factor underlies those, the principal components we can very pragmatically think about reversing those arrows and the items that we have induce a composite that we can compute directly, we don't have to estimate that as a factor score. So that notion of do the arrows radiate out from some latent factor or construct all or are we optimally weighting the items that then induce a composite, that is principal components and common factor analysis

Greg:

in the composite variable system that is PLS, it's not going to be principal component analysis, although there are techniques that try to merge principal component analysis into this. This is not that. So what I thought we would do is talk a little bit about how it works, not necessarily getting into all the weeds, but just generally how this works. And to start, I'm going to imagine that system that I described earlier with two exogenous factors and one endogenous factor, and I'm using the word factor, honestly, a little bit uncomfortably. If I say the word latent variable, I have to say, I'm a little bit uncomfortable, the pls community might go, what's your problem? It's still, we still think of it as something that you didn't see. But the SEM that I started at Studio 54. It's just so deep inside me that I have a hard time even referring to these as latent variables, but I'm the weird one, as far as the pls community would consider. Yeah,

Patrick:

that's why they would consider you the weird. What's the deal with that Hancock.

Greg:

That's it. So let's imagine in that system that we have our two exogenous constructs. And both of those are Mode B. What that means to us coming in from the outside is that those are both formative systems where the variables are coming into form that particular construct. And then let's imagine that our outcome or dependent factor in this model is a traditional latent variable, a reflective system mode A as it would be called in this world, where the factor itself is influencing is responsible for the relation among the variables, an overview of the way pls operates, if we had to come up with something without the benefit of a whole lot of computational horsepower, we could just take the indicators, and honestly, whether it's reflective or formative, whether it's mode A or Mode B, we could just take those indicators and get some sort of proxy score for each of those factors. And that is, what happens in pls is that you start by getting a score to take the place of each of those factors. And it is a simple sum, typically, of the standardized variables that serve as indicators. So all of your indicators for whether it's mode A or mobi, convert them into z scores, you sum them up taking into account direction of things, of course, and you go, boom, I have at least a start or proxy for each of the three constructs that I care about.

Patrick:

And if you just stop there, that's a measured variable path analysis. For those of you who have taken an SEM class, very often, the arc of a story that is told is you do some review of multiple regression. And then you expand the multiple regression to a path analysis, where you have multiple dependent variables, and you have mediation and you have chi square tests, and all of those things. And then you move to a multiple indicator latent factor. Well, what I like is that first step of taking just a measured composite of the items on your construct is a measured variable path analysis. But pls adds this really clever iterative procedure. Yeah.

Greg:

So once we get these proxies for each of our three constructs, what do we do with them now? Well, if that was the end of it, we would just go ahead and do the regression using the to exogenous composites, and have them predict and just a regular old, ordinary least squares regression have them predict that endogenous composite, and we would be done. But when we do that, we do get some estimates for the structural relations in that particular model. Once we have those structural relations in there, we now actually have a series of relations among all three of those constructs. Between the two exogenous constructs, we have some estimate of correlation between the dependent and the independent constructs, we have some structural relations, some standardized beta weight kinds of things, what we can actually do in this system is we can now get predicted scores for each of these based on the structural relations that exist there. And so in an iterative process, the information that we got from this first pass at estimating the structural connections allows us to get new estimates for those proxies.

Patrick:

So when we think about how we do business, as usual, we're done right? If we're doing the least squares, or if we have a second dependent variable, that's a mediating model, and we're using maximum likelihood. That's it right? What you walk through is we take our items, we make a composite, we take the composites, we fit a model. But here what you're saying is, is Wait a minute, we've got these model implied predicted values of the composite, could we use those in some way to update how we're computing the composite itself?

Greg:

Exactly right. And so once we do that, based on the estimated relations that we have among the constructs, and there are different ways of doing that, but it involves whatever a construct is attached to Whether it's another exogenous construct and endogenous construct, again, there are different ways of doing it. But you try to use the information from that model to get updated scores, just as you said, once you do that, what happens to those updated scores? Well, you take them from that structural model in this world, it's referred to as the inner model, and then you carry it back out to the measurement model, or what here is called the outer model. And how you do that is going to depend whether or not you have a construct that is mode A, which is reflective, traditional the way you and I think about things, or Mode B, if you have a mode a system, then you can get predicted scores for each of those measured indicators. By doing just a simple regression, I can use the proxy to predict indicator one, I can use the proxy to predict indicator two, etc, I can go through one by one and do that in that reflective or mode a kind of system, when I have something that is Mode B that is a formative kind of system. That's where all the variables are actually coming into the construct. That's just a multiple regression, because I now have updated scores for the construct, and I have all of the scores for the indicator variables, I can run a multiple regression. And now I have updated relations between the indicator variables. And the particular construct. Once I have those, what I can actually do now is get new predictive values for that inner model. And there is this iterative process. And you alluded to that earlier, where we go intermodal outer model, intermodal outer model until things stabilize. And when things converge satisfactorily, then we have our final estimates of those constructs. And we just do an OLS regression boom, done and done with the idea that we have created a system whose goal in the end is to try to explain variance. And that's different from what you and I are used to.

Patrick:

That's right, and you start moving into the topic that's getting increasing appreciation in the field, often linked to machine learning, which is distinguishing prediction and explanation, we can have a prediction model, that is, look, we're going to roll up our sleeves, and we're going to do whatever we need to do to maximize our ability to make y and y hat as close together as possible. That is not as much of what we might think of as an explanation, where we keep Karl Popper's corpse happy. And we impose restrictions on the system and make those testable hypotheses. And that's all in model fit chi squares and RMSEA and things like that, which do not exist in the pls

Greg:

totally, absolutely right about fit. And I want to make sure that we're very, very clear that the system that I described was just an example where we had two exogenous constructs and one endogenous construct, that was just an example. And that's an example that very much mirrors a regression model with two predictors and one outcome. But this extends to all different types of models. If you think about all of the crazy connections that we might have, we might have exogenous constructs going to an endogenous construct going to more endogenous constructs, we might have mediated kinds of systems. So draw the hairy structural model that you're accustomed to drawing, and then attach your indicator variables in a way that is consistent with the way you believe they should be attached for each of those constructs. And some of them might be mode A the traditional latent variable way we're accustomed to thinking about things that reflective system, some of them might be Mode B, which is the formative system where the variables actually pour into the construct, and pls says, whatever, you get it and then hand it to me and I will turn the crank.

Patrick:

So I think we've got a pretty good 30,000 foot face pressed to the window on the aeroplane looking over the Grand Canyon. We've got our head around mode, a Mode B inner, outer. So where do we go next?

Greg:

Well, no more yet, because it would be a Swedish tradition that at this point, we have a fika Do you know what a fika is?

Patrick:

I already went before we started recording. So no, I'm good. You took a fika but I'm fine. He told her Can you help

Tove:

us out on the whole fika thing? So I think it can mean slightly different things but it tends to be a break when we drink coffee, eat cake or something and just relax and talk to

Greg:

people. Thank you. So Patrick, we definitely need to have a fika I have you been to Sweden by the way

Patrick:

I have I love Scandinavia is I've been to Stockholm and I've been to Uppsala.

Greg:

I have never been to Sweden. I just want to say that. So if anybody out there wants to invite me to Sweden. It is on my list. There aren't very many more places on my list. But if you want to invite me to Sweden, DM me, I totally want to go all right. This has been a lovely fika and we're back. But I have no idea what your question was before we feel good. What What was your question actually?

Patrick:

I really am gonna go to the bathroom.

Greg:

Okay, well, instead of our usual elevator music, we should probably play some ABA

Patrick:

play big red car again, I got a profanity laced text from you a couple of days ago about how big red car got stuck in your head, it was

Greg:

brutal, don't even say big red car anyway, in fact, I'm going to insert some ABA right here anyway, just to clear that from my head.

Patrick:

Hi, everyone. Alright, so I came back from my break, and Greg took a break as well, and he's not here, in postprocessing, I'm going to put a little bit more of the grid car here, just to mess with him to

Greg:

the big man. Traveling there, and we'll travel.

Patrick:

Okay, we're both back. Let's pan back and think about some of the broad characteristics of where this can and can't be applied. And then how we might use this thoughtfully in just pursuing our science. We could

Greg:

break this up in terms of different aspects of the model and modeling process. For example, we could talk about the characteristics of the data and what we need to be in place for pls to be a viable option we talked about and this was one of Volt's concerns, we talked about sem because it's based on maximum likelihood or variations on maximum likelihood, we talk about it being a large sample technique without ever being able to actually define that. But pls in the end is based on an iterative least squares kind of process. And so it puts fewer demands in terms of sample size. That's kind of a nice thing. That is

Patrick:

a nice thing. We've talked about before how so much of the field focuses on sample size in terms of power, which of course it should, you need to know whether to use a poop emoji or an eggplant emoji in your grant application. But much less attention is paid to model stability. And also having a sense of whether you have a sufficiently large sample for those asymptotic properties to come online in the way that we think that they are.

Greg:

Exactly. We already mentioned that it doesn't have the heavy distributional reliance on normality. But as you pointed out, we've come a long way since the 1960s and 1970s. As you and I had a whole episode I think about non normality, right. So in SEM, we've had ways to deal with

Patrick:

this. Yeah. And so it's interesting, because we're voelde was really worried about things like normality, assumption, independence, assumption, things like that. We actually have really good ways now to deal with that. So we have robust standard errors, we have corrected test statistics, we have very well developed ways for handling ordinal items. It's not to say that pls is not still advantageous, but ml isn't quite as handcuffed as it used to be

Greg:

totally agree with that. And MPLS. You know, formally, it's tied to the same distributional assumptions that regression is, and we've talked about that quite a bit, especially back when we were doing OLS. But it also will use a bootstrapping technique to get the standard errors that it needs, which helps to work around some of the other issues, including dependence, right, to some extent,

Patrick:

as long as dependence is a nuisance variable. And this is what we argued in a court ordered setting with McNeish, was about the unnecessary ubiquity of multi level models is Dan is exactly right in everything that he said in there is that there are very good ways of correcting for violations of independent residuals and a whole broad class of models. As long as you don't have to disaggregate effects, as long as there aren't within group in between group effects. Because then what a corrected standard error does if you don't disaggregate as it gives you a proper standard error for the incorrect effect. Yeah. And so it's just another stealing Bolens line, fine print, that pls is really well suited for addressing violations of independence by using the Bootstrap. But it's still assuming that there's an overall effect that's of interest and not an effect that needs to be disaggregated. In some structural way.

Greg:

I think that's not only a really important point, but a point that gets glossed over in the pls stuff that I have seen.

Patrick:

And I think it gets equally glossed over in the cluster corrected SEM. Oh, you have kids within schools and multiple schools will just say cluster equal schooling you're fine. So I don't think that's unique to pls

Greg:

missing data. Now, I feel like in the structural equation, modeling world we have kind of nailed missing data. This is a regression based world though. What do you tell your students when you teach regression, what do you tell them about missing data?

Patrick:

Well, in standard Oh, well, less, it's less wise deletion. Anybody who's missing anything We'll see you. Thanks for coming out tonight. You don't have to go home, but you can't stay here. Yep. What I do say though, is well, you can move even with the regression framework to a maximum likelihood estimator and useful information, maximum likelihood under certain assumptions that is missing at random, we have an episode on this, we won't get into the details of that. You can also use multiple imputation. And you can do that within a regression framework as well. And that is you build a model for missingness impute values, estimate your regression model, you do that 10 or 20 times gather all the things together and combine them using formulas that exist. What I almost always tell my students in my teaching, though, is you should probably move to a full information maximum likelihood setting within an SEM framework and incorporate partially missing data there,

Greg:

well, then it wouldn't be P L. S, would it? Would be what PML? I'm sorry, by definition. You can't do that in here. That's exactly right. But yeah, I mean, to the extent that I have found any information about it, usually it sort of revolves around, we'll just don't get too much missing data. And, you know, like, try to keep it below 5%. And if you really have to maybe throw some values there in the holes, and it shouldn't matter. All right, you're good to go. Godspeed. And in other worlds that we inhabit, we would go. I'm not so sure about that.

Patrick:

Yep. Although what you just described is what sem did for half a century.

Greg:

It's true.

Patrick:

A lot of this stuff is not unique to PLS, right? ML had a Don't Ask Don't Tell policy on missing data in non normal data and dependent data for a lot of decades, not just a lot of years, a lot of decades. But yes, in the pls in its current. And we want to stress this because this is another area that is ripe for a lot of dissertations or postdoctoral research projects, in its current form. To my understanding, there is not a widely available method to handle missing data in the way that we would with maximum likelihood estimation. That

Greg:

is my understanding as well, for sure. This can get at model complexity in a variety of other ways, but not actually that way. You know, there are things that the original version of structural equation modeling has expanded into, you know, things like higher order constructs, or latent class kinds of things, or mediation, all of those are kinds of things that fit very, very nicely under the structural equation modeling umbrella. A lot of those things represent the methodological developments that have been going on in the last 10 years or so within the pls community as well, trying to expand the type of models that you can address through pls. So you can still have that wonderful benefit of having different types of variable systems mode A Mode B, while having the structural models that are of interest, these inner models that really represent the theory that you care about.

Patrick:

And that's a really good point you're making is there a lot of ongoing developments in pls as we speak? So historically, and you indicated this earlier in the conversation, you just standardized everything? Yeah. And again, that's what we did an EF a for 100 years, right? You standardize your variables, you have the correlation matrix. That's exactly right. But what is the byproduct of that? What you have means have zeros and variances of one? Well, who cares? Nobody cares? What means and variances are? In less you want to study change over time? Yeah, unless you want to do a multiple group analysis in which you examine weak or strong or strict or partial invariance, unless you want to do an MNLF a like thing. And so but I think a lot of people are saying, Okay, we've really figured out the core parts of this. Now, how can we build this out in ways that generalize the applicability in practice?

Greg:

Yeah, it's a very exciting thing, right? And there's the whole pls methodological community is developing these methods, trying to get them incorporated into existing software, whether it's proprietary software or AR based packages. So there's a lot of cool stuff that's going on. And what I hope that we're getting out of this conversation is that pls isn't meant to be sheepskin, although voelde really wanted it to be this thing that does some of the same stuff as SEM, but maybe under different restrictions. I think it complements sem really, really nicely. I don't think that people necessarily should say, Well, I'm doing SVM unless I can't, and then I'm gonna go over to pls. I think it's nice to be able to think about, alright, what's the model? Let me get everything laid out on the table. What's the model in terms of the inner model, that structural portion that we care about the outer model, the measurement part that we care about? What characteristics do I have in my data, and then what might be the appropriate technique to try to draw from and even though we sort of joked about sheepskin, I I would like us to think about this as just another option that people have.

Patrick:

One thing I love about this is it's very clever in saying what let's take a deep breath. And let's think about what is another way that we could approach this. So maximum likelihood has these chips that we believe are over the horizon, but we can't see them. And so based upon the characteristics of our sample data, we want to get the best estimates possible. This is not again, to reiterate an earlier point, this is not a method of estimation, this is a fundamentally different way of approaching the entire modeling process. It is not without limitations in the kind of structures that we can do. There's a difference in path models, it can be a full sem or a path analysis. But there are models that are called recursive or non recursive. It is one of the few things in quant that is the opposite of what you would think it is. And that's how you remember it. And that's how you remember a recursive model Colloquially speaking, the influences move from left to right, and there are no correlated residuals, that's a recursive model. All right, there are no feedback loops. There are no bi directional effects. And there are no correlated disturbances in your dependent variable, a non recursive model, you can have either or correlated disturbances and feedback loops, right? Pls is currently limited to recursive models. I don't feel like that's a death knell, I think a lot of our models are recursive, not all of them. I think in cross sectional data, we don't often have feedback loops, I gotta tell you, it's very common in my own kind of work, where I do have correlated disturbances. So say, I have two mediators. And so I have several exogenous predictors. I have two mediators that are like separate mediators is that there are two specific indirect effects. It's very common to correlate the disturbance of those two mediators. And currently, we're not able to factor that into the pls approach. And so

Greg:

that might be one of those branch points for you that says, I better keep it in the SEM world, as Eric covariances in your measurement model might be as extreme cross loadings, ones that are really non ignorable, that's harder in this particular system. Or honestly, just if you want an assessment of fit, right, the world here doesn't tend to emphasize global fit, because the whole goal is to maximize variance not to explain the variances and covariances behavior as part of a larger system, it's driving at trying to explain R squared for things that are endogenous within the part of your model that you actually care about. So if you say, and I think this is part of where voelde was getting at, you know, if you have a model that you have a strong theory about, but you really want to put it to the test, pls isn't necessarily this global testing framework. It emphasizes more about the estimation of things rather than the actual explanation, per se.

Patrick:

I really like the approach. I mean, I like conceptually what it's trying to do. What it's conceding, right. We've talked before about how you have to concede certain battles in a war so that you can marshal resources in another part of what you want to achieve. I like all of that. One thing to keep in mind is that one of Volt's original motivations was to operate in more of an exploratory perspective, that we don't always have a really strong a priori sense of what we're trying to do with the data and paraphrasing his term as we're often in a data rich theory, poor setting. I very much see that. But it's important to realize we're still locking things down in this model, in a confirmatory kind of way. That is, pls is not going to tell us the optimal number of constructs we need, we have to define the constructs and define the items that go with the constructs, pls is not going to be kind of like a regularization method where smaller parameters leave the model and bigger parameters stay in the model, we still have to state what the structure is that we are interested in among our constructs. And so we just have to keep in mind that this is not exploratory in an AFA kind of way, but it is less rigid in a cause indicator versus effect indicator perspective on what leads to that composite in your model.

Greg:

Yeah. And you know, the world that you and I tend to live in that structural equation, modeling world is all about constraints, constraints in the measurement portion of the model constraints in the structural portion. And we might have constraints in the structural portion of this model, but we're still only estimating relations in a view Very traditional OLS kind of way that doesn't necessarily feel those constraints because things are so very partial in the way that we do things. One thing that Volvo had said in many of his writings was the equivalent of I don't know what it would be in Swedish, but it'll all come out in the wash, whatever the Swedish version of that would be.

Tove:

So there's no literal translation. But in Swedish, we say something like just gonna last to say,

Greg:

as we get larger sample sizes, as we get more indicator variables. In the end, you latent variable structural equation modelers and US PLS, people. In the end, we all just sort of come together and reach the same inferential conclusions, which he would argue and I think, maybe reasonably so is the goal, in the end is to try to understand what the relations are, or at least approximate them reasonably, even if not exactly.

Patrick:

It's all about how can we take the data that we have available to us and make a valid and reliable inference about the nature of the relations among the constructs in a way that helps us understand something that we didn't know before? This is just another arrow in our quiver of saying we have a theoretical question, we have data available to us. And we want to make a probabilistically based inference about the nature of the relations among our measures. And this is just another way of approaching that problem. And

Greg:

for those of you who are steeped in a PLS tradition, maybe you will think about some of the comparative benefits of structural equation modeling, if you haven't thought about that before. And for those of us who are in this SEM kind of world, there is this other option out there pls that might be tailored very, very well to our model, our purpose our data, and I think we should consider that, as you said, another arrow in our quiver, this is

Patrick:

target rich for dissertations and Master's in grant applications, things like missing data, how would we extend this longitudinally? What are diagnostic measures that might be available? And then one that I'm really interested in myself, and we've alluded to this before, when we were talking about to stage least squares is, might we, for a given model, have a principled way of estimating our model using full information, maximum likelihood using two stage least squares, using partial least squares, and use those results jointly to try to triangulate on a set of relations that we have the greatest confidence in? And if those three methods converge on a discussion section outstanding? If they don't, well, then it's an intellectual goose to say, we've got to better understand why are these different from one another? And in which of these do I have the greatest confidence, because if you only rock back and forth and say, maximum likelihood is consistent, efficient, unbiased, and asymptotically normally distributed, and that's all I'm going to do. You are blinding yourself to other insights into your data that might help you make a better data informed decision about the nature of your constructs.

Greg:

I like that point very, very much. So as per usual, you and I put a tremendous amount of planning into this episode, when I'm texting each other.

Patrick:

It's 930. Last

Greg:

night, 1130 last night, but you know, also you and I are not experts in pls. It's something that you and I are gaining familiarity at. And as you said, if someone out there is listening who has expertise in this, they'll say oh, but they didn't talk about this. They didn't talk about that. Yeah, that's absolutely true. At the end of this episode, there are going to be a lot of pieces leftover on the table.

Patrick:

Just like when you buy something from IKEA. Team first dish team Dorkin FIRFER.

Tove:

That's not a real word

Greg:

Good night, everybody.

Patrick:

Thank you, everybody. Take care. Bye bye. Thank you so much for listening. You can subscribe to corner today on Apple podcasts, Spotify, or wherever you download your cacophonous noise to drown out midterm political ads. And please leave us a review. You can also follow us on Twitter we're at quantity food pod and check out our webpage at quantity and pod.org for past episodes, played live show notes, transcripts and other cool stuff. Finally, you can get quantitative themed merch, get the real stuff not the fake@redbubble.com We're all proceeds go to Donors Choose to support low income schools. You have been listening to quantity dude, the only official pumpkin spice podcast for Fall corner today has been brought to you by the double asteroid redirection test in which NASA launched a rocket 7 million miles to intercept a lump of rock 500 feet across at 14,000 miles per hour with the sole intention of making the rest of us feel bad about our own contributions to science by corner tunes Tova Nadir 3000 A newly available download that allows everyone to have their very own personal Swedish interpreter, and by the House of Windsor who proudly anoint quantum tude, the Royal podcast consort. This is most definitely not NPR.

Greg:

In fact, I'm going to insert some ABA right here anyway just to clear that from my head

Patrick:

Hi, everyone. Alright, so I came back from my break and Greg took a break as well and he's not here. And so, in post processing, I'm going to put a little bit more of the grid corner just to make no

Greg:

no no