The HPS Podcast - Conversations from History, Philosophy and Social Studies of Science

S5 E5 - Cristian Larroulet Philippi on Measurement in the Human Sciences

HPSUniMelb.org Season 5 Episode 5

This week, Thomas Spiteri is joined by Dr. Cristian Larroulet Philippi, who joins us at the University of Melbourne this year as the inaugural RW Seddon Fellow in the History and Philosophy of Science program. With a background in economics and a PhD in History and Philosophy of Science from the University of Cambridge, Larroulet Philippi was previously a Junior Research Fellow at Gonville and Caius College, Cambridge. His research explores the development and justification of quantitative concepts, the role of measurement in the human sciences, and the intersection of scientific objectivity and values.

In this episode, Larroulet Philippi:

  • Traces his path from economics into philosophy of science, and how encounters with psychometrics and measurement theory reshaped his research direction
  • Explains why measurement in the human sciences is perhaps more philosophically complex than in the physical sciences – highlighting issues of conceptual vagueness, causal complexity, and limited experimental control
  • Discusses the difficulties of treating concepts like intelligence or depression severity as measurable quantities, and what kinds of evidence and theory would be needed to justify this
  • Examines the risks of treating indices like depression or wellbeing scores as overly objective or precise in policy contexts, and why we need a clearer grasp of what such numbers are meant to represent
  • Reflects on why clearer thinking about measurement matters across philosophy, psychology, sociology, and policy — and on his efforts to build cross-disciplinary dialogue 

Relevant Links

Thanks for listening to The HPS Podcast. You can find more about us on our website, Bluesky, Instagram and Facebook feeds.

This podcast would not be possible without the support of School of Historical and Philosophical Studies at the University of Melbourne and the Hansen Little Public Humanities Grant scheme.

Music by ComaStudio.
Website HPS Podcast | hpsunimelb.org

Welcome to the HPS podcast, where we explore the history, philosophy, and social studies of science.

Today I'm joined by Dr Cristian Larroulet Philippi.

Cristian first trained as an economist before turning to philosophy. He then completed a master's in the Philosophy of the Social Sciences at the LSE and a PhD in History and Philosophy of Science at the University of Cambridge.

We're delighted to welcome him now to the University of Melbourne as the inaugural Sedon Fellow in the History and Philosophy of Science program here.

His research examines how quantitative concepts are developed and justified, and the challenges of measurement in the social and human sciences. In this episode, Cristian unpacks what it means to measure in these fields and why the task might be more philosophically complex than in the physical sciences.

We discussed the longstanding debate over whether attributes such as intelligence or depression can be meaningfully quantified, and the kinds of theory and evidence that would be needed to support such claims. Cristian also considers the political and ethical stakes of reducing human attributes to single measures and reflects on the role of wellbeing and depression indices in policy.

Finally, he points to the importance of clearer thinking about what numbers actually represent in the study of human life and why a greater dialogue between philosophers, social scientists, and practitioners should be encouraged if we are to make sense of the role measurement plays in shaping our understanding of society.

____________________

Thomas Spiteri: Cristian, thanks for joining the podcast today. 

Cristian Larroulet Philippi: Thanks, Thomas, for having me.

[00:01:49]

Thomas Spiteri: How did you find your way into the field of history and philosophy of science? 

Cristian Larroulet Philippi: I did my undergrad in economics, but from day one I was a bit disappointed with the lack of, I don't know, reflection on the methods and on the politics of the discipline.

Although I continued doing economics and did my master's and worked as an economist for some  years, I was from day one sort of taking courses in other areas: in sociology, anthropology, until I found philosophy of science, basically. I took a course on that and on cancer epistemology, and from then it was pretty clear that the kinds of questions that I was more interested in studying were happening there.

At some point, and after doing the master's and doing research as an economist, I did a master's in philosophy of the social sciences at the LSE and that really sort of settled things. Reading the work by Nancy Cartwright, Helen Longino, Philip Kitcher, Ian Hacking; that kind of work was where I felt more at home. 

[00:02:50]

Thomas Spiteri: You've spent a lot of time looking at measurement in the human sciences. Was there a particular problem or moment that pulled you in? 

Cristian Larroulet Philippi: I was actually going to write a PhD on something else. I applied to do a PhD in Cambridge to work on values in models in economics. I was very inspired by the work – I'm still very inspired by the work – by Elizabeth Anderson, actually.

But I had a pretty rough first term personally, and at the same time, just before moving to Cambridge, I took a course in the education department in Boulder. I was doing the coursework for the PhD in Philosophy in Boulder, and there was Derek Briggs, who ended up then writing a book; he was working on this book on the history of psychological measurement.

He was doing archival work and was a standard psychometrician, but he got interested in issues of measurement basically by hearing Joel Mitchell, the psychologist in Sydney, criticising quantitative measurement in psychology. There I saw a standard psychometrician, very prominent psychometrician actually, in the educational sector in the US really trying to grapple with conceptual questions, methodological questions about how do we get to know what we are measuring is quantitative or not, when that makes sense or not. He even made us read RTM, the Representational Theory of Measurement, something very much out of his space, in a way.

In the last week, we even read David Sherry, where he thought he gave the best answer to Joel Mitchell. All this course was, in a way philosophy of science done internally, you know, by a psychometrician, really trying to understand the issues – and that grabbed me completely. I think philosophers of science, whenever they see actual scientists puzzle through about something, it's very hard not to think, okay, there's a real thing here, and not just like, abstract philosophy. 

From then on, I had ideas in that seminar actually. The term paper that I wrote is the paper on validity, that I ended up publishing. But many of the ideas that I was taking on, like between ordinal and interval, that kind of thing, came from thinking in that course.

Luckily, I was in Cambridge, where Anna Alexandrova, Jacob Stegenga – they both had work on measurement in the human sciences, and of course Hasok Chang. All the students with Hasok, some of them were working on measurements, so I was actually in the right place to change topics, so to speak. 

[00:5:14]

Thomas Spiteri: For listeners less familiar with this area, what does it mean to measure something in the human sciences? Is it more philosophically complicated than the physical sciences? 

Cristian Larroulet Philippi: I think arguably it is more philosophically complicated there, but let's start acknowledging that there's many things that go under the name of measurement, right? You can think of classification. Whether you know someone is unemployed or not, it's sort of a binary thing, but that can lead into measurements of the rate of unemployment. So, although things are part of, in a broad sense, the term measurement but, also of course, more paradigmatically quantitative measurement, which is the assignment of numbers to objects or persons, where those numbers are about amounts, and not only about order.

We can also distinguish between the measurement of more abstract concepts, so say temperature or perhaps depression severity, versus more everyday kinds of concepts like whether someone is unemployed, the rate of unemployment, or the length of a rod. We can always debate about the details, right, in which sense these are more abstract or not; but I think those distinctions are still useful. 

If we narrow the question to quantitative measurement, many philosophers are methodologists. It takes as basic, the idea that there's just one kind of quantitative measurement. So, if there is quantitative measurement in the human sciences, it better look just like the one that is going on in the physical sciences, this activity where we are assigning numbers, and those numbers actually are representing relationships that are there in the phenomenon. It's assigning numbers according to rules; some people have defined it, but it has to be whether it's justified.

The debate is of course, what justify means here. Of course, not any random rule would do, but also people lie there in a spectrum of more demanding or less demanding. Now, even if we assume this picture where it is just one thing: quantitative measurement is just one thing; it's still the case, I think, that measurement, quantitative measurement in human science is more philosophically complicated, because it inherits all the puzzles that feed the literature on physical measurement – which is vast – and it adds the complications that many people think that are unique to the human sciences.

You can think of, for example, large degrees, of course, of complexity, or the conceptual fussiness that is prevalent in the human sciences, the value-ladenness that we associate with human measurement. Also, the constant changes that happen in the social and personal world, they need to make do with very minimal degrees of experimental control. All these things perhaps are not, you know, unique. Even if you don't think they're unique to the human sciences, they're clearly more accentuated in the human sciences, and that makes quantitative measurement in the human sciences more philosophically problematic, I would say, or complex.

[00:8:14]

Thomas Spiteri: I wonder, to what extent you think we're justified in treating things like intelligence or depression as measurable quantities. What's at stake in assuming that they are?

Cristian Larroulet Philippi: I think it's a tricky question actually, because when are we justifying treating anything as a measurable quantity?

One easy answer could be, well, whenever we have lots of evidence and background theory supporting it, we can speak more about the kind of evidence and background theory required, but I don't think it's too controversial to say that we don't have the right kind of evidence and background theory to support the claim that intelligence and depression severity are measurable quantities. Say we don't have different ways of measuring intelligence or depression severity, all giving the same result, let alone the same result to high levels of precision, or across different environments, and so on.

We don't have theoretical equations that link amounts of intelligence to other quantitative variables, where those equations would be clarifying the meaning of these concepts that could be used to successfully predict phenomena and so on. It's very different from the picture that we can get by looking at – to use the more standard case given always, the case of temperature. Now, the question is tricky because scientists have treated other concepts as measurable quantities, well before having good evidence and theory for this. I think for example, David Sherry in his 2011 paper on thermometry makes this case very nicely, but it generalised. Basically, scientists must assume some phenomenon to be quantifiable, so they must treat it as quantifiable, right? If they're going to arrive at the evidence and theory, that would justify talking of the phenomenon as quantitative. We don't start the work already having that evidence and theory.

Any time that we have a recently introduced concept and we are trying to think of it quantitatively, it's a working hypothesis; it has to be seen as a working hypothesis. So, someone could say, 'Why can't we say the same thing for intelligence and depression, [in the way] that people deal with temperature?'. Say that we currently don't have the evidence and theories that would justify talk of them as measurable quantities, but we we're still justifying treating them as such, for the purposes of doing research, of discovering whether in the end it would make sense to talk of them quantitatively. If it is a bet that physicists took, why can't psychologists take that bet? 

I don't think most human scientists think much about this, to be honest, and they wouldn't, if pressed, they wouldn't necessarily give this forward-looking answer. But some methodologists do say this. I think most people would rather think, 'Oh, measurements are fine, it's just that they're more noisy than that of the physicists', say'. Historically, when people are pressed, or whether they think about it, it's common to say: refer to the case of temperature. Von Neumann and Morgenstern, in the book on game theory in the thirties, when discussing what it would make sense to quantify utility, they refer to temperature. Cronbach and Meehl in the famous 1955 paper on construct validity also referred to the case of temperature, as illustrating this path forward.

[00:11:28]

This is a common theme. Even if we grant that it's in principle available, this response, it's fine; we'll take a bet. We'll think that the phenomena is – if we try to quantify it, we'll actually succeed, say. I think even if we do that, we can still place constraints on this justification. For example, one thing is to say that if we take a bet, another thing is to work it out, right? Working hypotheses make sense if they are worked out, that means that we put them to use. We try to make predictions at the quantitative level, or articulate theories that speak at the quantitative level. But if researchers are not doing that, then I don't see much of a justification for treating, say, intelligence or the patient severity as quantitative if that is not part of a long-term research program, where it's an open question whether it would make sense in the end. I think things are even more complicated in the human sciences, in the sense that we might also think more of political constraints.

After all, we think of intelligence a highly valued skill. To treat it as a one-dimensional kind of causal force, it lends support – and historically, has been linked with very hierarchical views of society. Think on the work of Jay Gould on eugenics but also Elizabeth Anderson, on how it is very easy to move from a one-dimensional conceptualisation of an attribute to a very hierarchical view of society, when that attribute is very valued.

I think in some social contexts it might be more sensible to pursue a more multidimensional or even qualitative representation of the attribute. The claim is not that because of political consequence we should never pursue quantitative research, but it is to acknowledge that a lot of idealisation and obstruction will be involved anyway in treating a complex phenomenon like intelligence or depression as one-dimensional quantities; it's sort of stretching things out there. We might do it for the sake of discovering things, but since it would never be straightforwardly obvious that this makes sense, I think the response that moral and political aspects are relevant because we are only caring about the truth, just does not work here. There's a more complicated picture there. 

[00:13:45]

Thomas Spiteri: There's a longstanding philosophical divide, maybe at a bit of an impasse at the moment between people who think human attributes can be meaningfully quantified, those who remain sceptical. Where do you stand in this debate, and importantly, what I want to get at is what it would take to shift your view.

Cristian Larroulet Philippi: There's many things that go under the name of measurement. Let's go back to that distinction between the rate of unemployment and, what I would think as more of a theoretical concept like intelligence, or reading comprehension, perhaps even utility or depression severity; things that are being postulated to explain something else, but isn't obvious, they're not directly measurable. 

The rate of unemployment on anyone's view must be quantifiable, right? We just need to count the number of unemployed and divide by the number of people in the workforce. What is straightforward here is that this is quantitative, because it's just a ratio. But once we get into the details, of course it's not straightforward because we need to define what unemployed means, what is it to belong to the workforce, and that's why we get debates and the different ways of measuring unemployment.

For example, some measures would include discouraged workers as part of the unemployed, other ones would not; and this does make a difference actually, and there's good reasons on either side. But, if we put these issues aside, I think as long as unemployment is clearly defined, I don't think there's much to debate about, whether it can be meaningfully quantified – which doesn't mean, as I said, that there can't be a lot of debate on the details.

The contrast here is with things like intelligence, where in a way postulated to make sense of, for example, the fact that test scores tend to correlate. Originally, we can think of general intelligence as that thing that explains why all these tests are correlated; so it's meant to be a more veridical concept that we introduced, just as someone would introduce temperature. The standard picture for seeing these concepts and to justify the measurements is more or less like dispositional concepts, like causal forces. We don't directly observe them, but we measure them by measuring their effects. Just as we do with temperature: we're measuring the volume of mercury, and that we are using to infer amount of temperature.

Now, if you ask me whether these kinds of concepts can be meaningfully quantified in this kind of way, I actually am doubtful. I'm hoping to change, if I see the right kind of evidence and theory development. But, as I pointed to before, the right kind of theory development would be quantitative theory that defines what amounts of intelligence are clearly linked to empirical procedures, and we find convergent numbers in those empirical procedures. In the way that, more or less, Eran Tal has described the practice of measurement with Miguel Ohnesorge in a joint project on measurement and seismology.

We also try to give an account of what this could look like in the human sciences, where there's no experimental control. I'm doubtful, and perhaps I can't offer a brilliant argument to settle this. But what I can do is to suggest that at least for some of these concepts, on the face of things, it makes more sense to think of our practices of putting numbers to objects in a different way than the standard picture coming from physics, of this sort of causal force that we measure by measuring its effects.

[00:17:06]

This is in unpublished work, but hopefully it's gonna get out. There, I contrast this standard picture of causal forces, where the task is to make sure that the effect varies only when the cause varies: no other thing is making a difference to the volume of the mercury, so it can reliably indicate changes in temperature. I contrast that picture with a very different picture of what researchers might be doing when putting numbers to things. Which is, to value state of affairs constituting the phenomena. This is inspired by Dan Hausman's work on overall health. If you think of overall health as the efficiency of your various bodily parts and processes, then overall health is just multidimensional, because we just have various bodily parts and process; it doesn't make sense to put one number to it.

You might be doing better in your lungs, your respiratory system. I might bet doing better in my heart, basically the cardiovascular system. It makes no sense to say that I am healthier, or you are healthier; these are just different regards in which we can be doing better health-wise. In that sort of world, why do people still put one number to overall health is a question that Hausman asks; and why do everyday people answer surveys asking who's healthier? How are they able to answer that question if it doesn't make sense? 

What Hausman suggests is that when someone says that a patient with a heart dysfunction is less healthy than someone with a knee problem, what they're actually answering is what is more or less valuable: what is worse rather than what is less healthy, from the standpoint. We are able to put a number because we are changing the question to what is more or less valuable, than what is more or less healthy. The idea is that even though overall health is multi-dimensional, we can't put one number to it. If we are forced to put a number in a context where we're choosing, we may be able to choose between different health states. 

In that sense, we may be said to be valuing more some states of affairs than other ones. I think this picture of what might be going on when we put numbers to things, actually it's more compelling in some cases. Take depression severity. Depression severity surveys are basically combinations of symptoms, you can think of it, and it's very hard to claim that they are all sort of being moved by the same causal force. Also, they're sometimes criticised by giving too many scores to less relevant symptoms, like insomnia, relative to more important symptoms like suicidal thoughts and behaviours.

If you think of that criticism, it doesn't make sense if the picture is a causal one, but it makes a lot of sense if the picture is the valuing one. If what those measures are doing is to put numbers to how bad it is to be in a given depression profile, then it makes sense to challenge a measurement instrument that gives too many scores to insomnia, relative to suicidal thoughts and behaviours; because in some, it's just less significant when it comes to valuing the situation.

So if you ask me whether some human concepts like depression severity or perhaps even reading comprehension, which would be more controversial, may be meaningfully quantified in this valuing sense – which is different from the physically-inspired causal sense – then I'm less doubtful. We still need to get clear on why would we do this; is it a good way of, say, basing policy and regularity decisions in this way? But at least I find this as a promising change of perspective.

[00:20:42]

Thomas Spiteri: Obviously, measurement tools like depression and wellbeing indices are increasingly used in health policy, public policy. Can you maybe expand on some of the risks perhaps of treating those numbers as more objective or precise than they really are? 

Cristian Larroulet Philippi: First, perhaps one step before, there might be a need to use numbers just for bureaucratic reasons, for efficiency reasons, for time constraint reasons. I think one can think of Porter's work and other work in sociology and history of science as highlighting other reasons for using numbers. Porter highlights this more political reason, that politicians can blame the numbers: 'Not me', right? But I think JC Scott would highlight this more like efficiency constraint, or bureaucratic aspect.

I'm interested in articulating this alternative picture of what going on, in part because I think unless we are clear on what the numbers are meant to be representing, it's very hard to have ground rules for how to criticise and engage with these numbers. You can say the numbers are not objective enough, or not precise enough, but relative to what? What's the point of these numbers anyway? What do we mean for them to be objective enough or precise enough? 

I think if we tell ourselves the story that depression surveys measure depression severity, just like thermometers measure temperature – just with a little bit more noise, then it would make sense to think of depression scores as objective, as in stable across context; doesn't matter how we measure it, we get the same number, right? Of course, all what I've been saying here is that we can tell that story with a straight face. 

But if the story is rather that those numbers are more in the spirit of the valuing approach, then we wouldn't even expect those numbers to be objective in the sense of context-independent, because how valuable states of affairs are depends on the context. Not only on the subject, but on their circumstances. Also, we don't expect them necessarily to be so precise because we ourselves don't have very fine grain valuations, arguably. If you think our valuations are shaped by our biology, by our psychological needs, those need not be as fine grained as the real line is.

I think having a clear sense of 'what are we doing with this', 'what are we meant to be doing with this number' sort of helps us put into perspective whether we expect them to be objective in this context-independent sense, whether we expect them to be precise, and what it means for them not to be that way.

I don't think this answers fully the question of what's the proper role of these numbers when it comes to public policy and how that depends on the objectivity and precision. But I think it points to ways of making progress. I think for me, progress here means going beyond a conversation between those who say, 'Oh, the numbers are fine, they're a bit noisy' and those who think, 'Oh, these numbers just do violence to the phenomena'. I think we need to think through more of the middle ground there.

[00:23:42]

Thomas Spiteri: What would you like to see in the dialogue between philosophers, psychologists, social scientists, when it comes to measurement? Do you think we need a kind of greater measurement literacy among researchers and policy makers?

Cristian Larroulet Philippi: I think that's a very good term: measurement literacy. I think clearly we need, funny to say, but yeah, we need philosophers. It's not straightforward what the meaning of these numbers are; this requires careful, like conceptual thinking – which of course is not the exclusive privilege of philosophers, but that's what we do, basically.

Now at the same time, it's not only conceptual issues what are at stake here. It also matters how numbers are used in practice, what the real work requirements of policy are; these things are contextual, they are historically intimate. So, we also need sociologists, historians of science to forge this broader kind of measurement literacy, as you nicely put it. We need more dialogue there also, with the practitioners. 

I think part of the reason why I very much like this topic is because I mean, I told you how I got into this: seeing psychologists engaging with philosophy, actually, and I've seen that continuation there. I've seen a lot of psychologists in the Netherlands, for example, engaging with this philosophical literature, and I'm keen in fostering that. 

I'll mention one event series that I'm starting here, which I took the idea, the name, the whole thing from Cambridge HPS – which is this thing called Coffee with Scientists, where we invite a scientist to talk about research problems and challenges in an open-minded way. They don't need to defend themselves; it's more about making constructive conversations about this. I think these kinds of spaces, where we bring researchers, scholars of science, philosophers, sociologists, and so on in a constructive way, I think are very much needed in general; but I think in measurement, it would be very good to have more of this. 

[00:25:43]

Thomas Spiteri: I agree. Before I let you go, I want to ask you, on measurement, at least, which questions are maybe keeping you up at night? 

Cristian Larroulet Philippi: Good that you qualify that it's only on measurement, because it is my children that keep me up at night! But I've been giving a speech here that is very traditional in the sense that I'm contrasting physics, temperature and psychology. One thing that I'm involved now with, as I mentioned, Miguel Ohnesorge, is broadening the case studies that we use for thinking through measurement. We are particularly studying the quantification of earthquake size in 20th century seismology, but also going to 19th century.

That's been fascinating. It's been very relevant also to put into perspective this idea that we always need high degrees of experimental control, for example. One thing that keeps me up at night, if you want to put it that way, is to move beyond, kind of toy examples of physical measurement: temperature, length, rigid, broad; toy examples are very highly precise, very theory-supported kind of measurements that have dominated the literature – that's one thing.

The other thing is very much in the spirit of what I was saying: I'm very much taken by the question of what the measurand is; what is that we are trying to measure, and the distinction between these sorts of causal falls and valuing. It's meant to be just a starter, actually, in this program of 'what is the measurand'. 

I also think the other sort of paradigms to think through, not only value, not only at this position of causal force; I'm interested in working out whether probability might be another way of thinking through what the measurand is, because it's also a case where we are lumping things. 

There, my example is survival fitness: you can increase your survival fitness by changing the size of your legs, but also the features of your heart or your lung, seems to be multidimensional on the face of things. But we still manage to bring all that into one number: how does that work, and whether that provides a different way of thinking about the measurand, what we are trying to measure, at least – that keeps me up night, I think.

[00:28:00]

Thomas Spiteri: Where can people find your work?

Cristian Larroulet Philippi: I have my webpage here at the University of Melbourne, and as well in field papers, I have all the papers there. If they're not accessible there, you can just open the CV and there will be a link that goes to an accessible version of the paper.

Thomas Spiteri: Thank you so much for your time today, this was a great discussion.

____________________

Thank you for listening to the HPS Podcast. If you're interested in the detail of today's conversation, you can access the transcript soon on our website at hpsunimelb.org. 

Stay connected with us on social media, including Blue Sky for updates, extras, and further discussion. And finally, this podcast would not be possible without the support of the School of Historical and Philosophical Studies at the University of Melbourne, and the Hansen Little Public Humanities Grant Scheme.

We look forward to welcoming you back next time. 

 

Transcribed 

Contributor: Christine Polowyj

Christine Polowyj is an undergraduate majoring in History and Philosophy of Science at the University of Melbourne. She is interested examining how knowledge is undermined or obscured when appeals to logic are made in particular social and historical contexts.


Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

The P-Value Podcast Artwork

The P-Value Podcast

Rachael Brown
Let's Talk SciComm Artwork

Let's Talk SciComm

Unimelb SciComm
Time to Eat the Dogs Artwork

Time to Eat the Dogs

Michael Robinson: historian of science and exploration
Nullius in Verba Artwork

Nullius in Verba

Smriti Mehta and Daniël Lakens
Narrative Now Artwork

Narrative Now

Narrative Now
On Humans Artwork

On Humans

Ilari Mäkelä
Simplifying Complexity Artwork

Simplifying Complexity

Sean Brady from Brady Heywood