The science intersection

From Good Ideas to Better Outcomes: Testing Social Policy in the Real World

Rachel Melinek Season 6 Episode 9

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 30:36

Send us Fan Mail

In Part 2 of my conversation with Professor Michael Sanders and Julia Ellingwood, we look at what evidence can show us that good intentions, professional experience and common sense might miss.

We discuss why it is not enough to ask whether a policy or intervention works we also need to know how much it works, for whom, and whether it offers value for money. Michael and Julia talk about examples including cash transfers for care leavers, education interventions, family group conferences, behavioural science and “nudge” approaches.

We also explore why measuring outcomes in social policy can be difficult, especially when those outcomes include wellbeing, safety, educational progress or housing stability. Julia explains why survey design, consent, power dynamics and validated measures all matter when researchers are working with people whose lives may already be under pressure.

Later in the episode, we ask how research can move beyond simply describing social problems. Why do some issues get studied again and again without changing? What makes evidence more likely to influence policy? And why do researchers need to stay involved after the paper is published?

We end by talking about what gives Michael and Julia hope: better practical guidance, more accessible tools for policymakers and researchers, and the people working to make evidence useful in the real world.

Support the show

SPEAKER_02

Welcome to the Science Intersection. This is the second part of my conversation with Professor Michael Sanders and Julia Ealing Wood about evidence-based policy and how we work out what actually improves people's lives. In part one, we talked about evidence, ideology, and citizens' assemblies. In this episode, we move into the practical side. What trials can show us that good intentions or common sense might miss, why it matters not just whether something works, but how much it works and how researchers measure difficult outcomes like well-being, safety, education, or housing stability. We also talk about behavioral science, ethical research with people who may already be under pressure, and why describing social problems is not enough if we do not also test what might help. We end by asking what gives Michael and Julia hope that evidence can still make a difference in the real world? So that point about citizens' assemblies being informed by evidence raises a wider question about the evidence itself. In real-world policy settings, what can trials and causal methods show us that good intentions, professional experience, or common sense might miss?

SPEAKER_03

I can kick it off. I feel like Michael, you'll have a lot to say about this. So I think what trials give us the opportunity to do is to test a lot of our assumptions. So I might have a really strong idea of what works, particularly like in classroom instruction, right? Because that's that's the area I'm from. But trials gives us space to be surprised at the actual measured effect of something. You know, it gives us an effect size and it gives us um some sense of statistical certainty or uncertainty with respect to that. So it's not so much that like our intuition is wrong, it's just that a randomized control trial can give us a much clearer objective sense of the value of something and how much it moves the needle and and how certain we are in that direction. So if you have really strong professional experience in an area, like we want to talk to you and we want to develop interventions that are based off of your own professional experience and intuition. But like importantly, we need to go into the trial with the attitude that like we might find a surprising result or no result. It's actually one of the more common outcomes of trials as a null result.

SPEAKER_00

Yeah. So I think what I would add to that is like different people can have equally good intentions but come to completely different conclusions, right? An example of that might be the work that we've done around giving carelivers unconditional cash transfers, right? I, who was in favor of this as a as an idea, think that giving young people money with no strings attached is an efficient way of spending public money because it doesn't involve much administration. And I believe that it gives them agency and the ability to choose for themselves. And because I'm like slightly in favor of the existence of markets, I think they'll spend the money better on themselves than than government will. People to my political left might say, well, you know, government knows best we should we should get government to administer this money so that it's not badly used, etc. etc. People to my right might say, oh, you can't give people money because they'll spend it all on drugs and they'll all die. I believe that you can, in good faith and with good intentions, hold all three of those positions, although not necessarily simultaneously. And so good intentions does not give us a way for deciding between those, between those conclusions. And everybody involved in that discussion believes that they are exercising common sense. Right? So again, common sense, there's nothing so my mum would say there's nothing so uncommon as common sense, but I think everybody thinks they have common sense, and that doesn't actually help us decide between these things. That's the first thing. The second thing is there were lots of good intentioned interventions, and what I would I draw attention to is Achievement for All, which is a program which was developed after decades of research by an Oxford professor that aimed to improve grades for young people in schools in England. It was very, very popular. So schools paid a lot of money, millions of pounds, to have this professor and her colleagues come into their schools and improve them, right? Everybody here was acting in good faith. Everybody here had good intentions. What it turns out happens is it makes things worse, right? So it actively reduces grades. So good intentions, common sense, lots of research yields something that's actually in effect, not just ineffective, actually harmful. We wouldn't have done that without a trial. The final example, which I think goes back to a lot of what uh Julia was saying, was around family group conferences. So this is a way of working with families to try and keep children out of care. And a lot of academics believed that it worked, and they were correct that it worked. And by it by it works, we mean it reduces entry to care. But it's not enough to say it reduces entry to care, because there's a big difference between a 1% reduction in entry of care and a 20% reduction in an entry of care. And that difference is value for money, right? We don't need to know just the direction that something moves something in, but how much it moves it in that direction so we can work out whether or not we should be spending tax on it. And that precision, which comes out of the statistical rigor that Julia was talking about, is absolutely uh crucial.

SPEAKER_02

So that's interesting because you're saying we need to not just know whether something works, but how much it moves the needle on the outcome we care about? In social policy, those outcomes might be well-being, educational progress, housing stability, safety or opportunity. What makes those kind of outcomes difficult to measure?

SPEAKER_03

Which one? I think it's all simple. Oh gosh, it's all it's all terrible. It's all good. No, data collection's hard, especially if you're trying to measure well, okay. I guess I'm of two minds of it. One, it's it's really difficult to follow up with people. It's just hard to, you know, have them make the time to answer a bunch of survey questions. Like survey fatigue is a real thing, and so you can't measure everything under the sun because people have limited time and attention spans. Also, if you're trying to get gathered data from children, it makes it 10 times harder, right? Because how do you manage getting informed consent from a child? How do you ensure that the survey or whatever data collection method is going to be accessible to them? How do you ensure that they're not going to be pressured into answering in a certain way if they're with their teacher, their social worker, their parent? There's a lot of power dynamics that you need to navigate with that. And then there's the challenge of actually trying to measure stuff that is not like measurable in a sort of Euclidean distance kind of way. Like we can measure how tall people are, but we can't really measure like had their own personal feelings of well-being, right? Like we can't put a ruler up to that and say, like, you have a well-being of seven out of ten, right? We have to rely on a bunch of self-reported measures and call this latent variables. And we do this all the time, right? We ask people about like their political orientation of just like on the left to right scale, where do you orient yourself? We have all these kind of tools to measure this, and we have different ways of talking about survey validity and measurement and things, but we are still kind of making it up to some extent. And you can have a very long conversation and a lot of interrogation over the measures that you choose when you're constructing a survey and how valid those measures are. So there's loads of challenges and there's lots of nerding out that you can engage in with any kind of data collection and social sciences, but there's strategies that we have in general. Like I would recommend using validated survey measures whenever you can. I mentioned earlier looking at the Civil Service People Survey, and they use a lot of questions that are already well benchmarked across a bunch of different things. So, like the four well-being questions that are taken from the ONS that allows comparability between Civil Service People Survey and other surveys that use a similar measure. So we have strategies to try and get around this, but they all are imperfect and they all have trade-offs.

SPEAKER_01

The point about measurements being imperfect but still useful makes you think about behavioral science more broadly. Once you understand how people are actually behaving and where the barriers and decision points are, what role can behavioral science play in policy?

SPEAKER_00

Yeah, so I guess I would say the role of behavioral science is as a force multiplier, right? So the role of the three traditional tools of government policy making, which we think of as being taxation, regulation, and information, that get that gets you most of the way towards most of your policy outcomes. And then behavioral science can help you get closer, right? So an example of that at a very straightforward level is around tax collection, where the UK collects almost all of its tax almost automatically, and then some people are late in paying their taxes. And so if you use behavioral science, you develop a nudge, and it turns out that if you say nine out of ten people have already paid their tax in your area, you're in the small minority of people who haven't paid, that increases the rate at which people repay on time by about five percentage points, right? So the behavioral science is helping us get that last that last inch or so. The same is true in more complicated policy areas, right? So we care a lot about young people going into higher education, particularly people from disadvantaged backgrounds. And almost all of the work there is done by schools, right? So schools spend 14 years teaching people, getting their grades up to where they need to be. And then if you look, it's like one of those games in an arcade where you put a coin in and the coins get sort of stacked up at the edge. And what you see is there's there's thousands and thousands of young people, about 30,000 of them in each each cohort, who come from lower income backgrounds and get the grades to go to a really good university, but will go to a less selective university in part because of some sort of measure of social distance, right? They think places like, for example, Oxford and Cambridge, indeed places like King's, aren't for me. And so behavioral science can look at those places where you've got a bunch of people stacked up, like those coins in that arcade machine, really close to the edge, and it can help to tip them over the edge. And that that I think is a really valuable thing that behavioral science can do to improve outcomes, whilst also remaining humble enough to say, actually, we've done the last little bit, and teachers, parents, social workers, everybody else in the child's life, and indeed the child themselves, so a great deal of effort, they've gotten them up to that margin, and we're just helping them that last little bit across the road. Philip So man calls these last mile problems, but I think in most cases they're last inch problems rather than last mile problems.

SPEAKER_02

That makes you wonder how do you decide what success looks like in a policy trial? Once you're testing an intervention, how do you choose the outcome you're judging it against?

SPEAKER_00

So the the cop-out answer is that is a job for our political masters, right? So ministers were elected by the people to govern the country and they get to decide whether or not what the what the thing we're trying to aim at is. In practice, there's pretty broad consensus about what we're trying to do. Like no nobody is in favor of fewer kids getting in school. You know, there might there are some people who argue the grades don't matter at all, but even they don't think the kids should actively be getting worse grades. So in general, there there is consensus there. For a specific trial, we will develop a theory of change, which Julia is more qualified to talk about than I am, which just says, okay, for this intervention, this thing that we are about to do, what is the outcome that we are aiming at? And therefore we can assess it again. So that's how we try and avoid ending up judging fish by their ability to ride bicycles. But Julia can give more colour on that.

SPEAKER_03

I mean, yeah, theories of change are useful for that, for kind of understanding the causal chain between the policy intervention and the outcome that you're looking to achieve. It can also be useful to lay out all those assumptions initially, because it's kind of part of the whole pre-registration process of setting up a trial. It's committing across your team and to the scientific community and to the public that like these are the things that we're going to test, these are the methods we're going to use, this is the theory, why we think it will work, and as best you can, sticking to that protocol or that pre-registration as you move throughout your trial. And this is important to do because otherwise, you know, at the end, you can just make adjustments as you see fit to try and get the outcome that you're looking for, get the statistically significant effect that you're hoping for that you can report on that journals want to see. This step helps researchers stay honest to what they're setting out to test. And it also just makes the findings that much more objective and applicable and trustworthy, you know, that we're not manipulating things, we're not inserting a ton of variables into the model to get an effect that we want. So I guess the success is not it's not so much like hey, at the end of the day, you find a statistic statistically significant effect and like it's in the direction you expect. I guess it's more just like, did you stick with what you said you were going to do? Or I mean, did you roll it out to the best of your ability? And if you did that, then I would consider that a success, right? Because you're adding to the evidence base, like your findings are valid, I guess.

SPEAKER_02

A lot of work involves people who may already be under pressure, young people, students, maybe people experiencing homelessness or people leaving care. Just how do you test interventions ethically on people?

SPEAKER_00

Carefully is the short research. We have robust research ethics processes, and also this is something which we separate to that thing deeply about. I think it's a bit of a red herring to say here is a group that is incredibly vulnerable, therefore we must be especially concerned about involving them in research. And the reason I think it's a red herring is those people who are the most vulnerable are the people who it is most important for us to understand whether or not policy is serving them or not, right? So Carolina Criado Perez has the book Invisible Women, which looks at the extent to which women are missing from our randomized control trials and from our data and are under examined. And that means you have a wide collection of interventions that we know work for homeostatic young men aged between 25 and 40, but we have no idea how they intersect with women's biology, which is different, and or how it intersects quite importantly with being pregnant, which is a you know a state of being that roughly half the population is going to go through at some point in time, and they may also be sick while doing that. I think that's a really important book. What I think is is also really important to bear in mind is the extent to which is the books you couldn't write. So, for example, I couldn't write a book about whether or not interventions that are effective in education overall have a greater or a lesser effect or indeed are harmful for people who, for example, are on the autistic spectrum, right? I can't write that book because we don't even have the data in order to tell us how much information that we're missing. Similarly, for racialized minorities, we just don't have the data recorded in most of our research to tell whether or not the interventions we're doing in social policy, in medicine, are having beneficial effects or or detrimental effects. Similarly, young people in care and so on and so on. The people who we know systematically experience disadvantage, and the people who we know society is systematically not built to accommodate are the people for whom it is most important to understand whether or not our interventions are working, because everything about the world is built in a way that doesn't help those people. And so it would be really weird if our interventions were not at the at least less effective than they are for you know the majoritarian group, which it's not even a majoritarian group, right? Straight white guys between age 25 and 40 are not the majority of people, they're just the the quote unquote default for people, right? So it's it's extra important to include these groups in our research in order to actually do our jobs and serve the public writ large. Now, you also have the fact that these people are you know they have been experimented on medically in the past as communities who they are excluded, they have good reasons not to trust power, not to trust the establishment, not to trust government. And so we have an extra piece of work to do to regain and re-earn that trust. That is work that must be done and must be taken incredibly seriously. But the reason we have to do that is so that we can include these people in the 30s and so we don't end up with another year, another decade, another century of public policy harming those whom it most needs to help.

SPEAKER_03

If I could maybe offer some additional kind of three practical suggestions to answer that question. Firstly, make sure that you're involving the people that you are doing trials with as part of your trial design. So assemble a group of people who have lived experience in the area that you're interested in, together a community stakeholders group to help advise the direction and the shape of the research. That can be very important and can also help potentially reduce the chance of harm for people who are involved in your trial if they're vulnerable. So that's the first suggestion. Second suggestion is for those who do participate, make sure you compensate them for their time. So in the form of vouchers or, you know, something that's going to be incentivizing for them to take part and thank them for taking part in your research. Um, and then thirdly, once you're done with the trial, you have these great results, don't just use that to burnish your career, right? It's it's not about you, the researcher, it's about making sure that if you find an important effect, that you're socializing that and that you're raising it to policymakers and that you're making sure that that work is actually moving forward to improve lives for the people that you're looking to serve. And all of those three things require time and they require money. And so you're gonna have to make sure that that's properly budgeted when you're putting together your your research plan. Like, don't skimp on those things. Those are very important pieces to make sure to plan for.

SPEAKER_02

There are certain groups of people, and there have been a lot of research into issues they face, but there's sort of research done every so often, nothing seems to shift. Like, for example, I don't know, visually impaired people in employment, autistic people going into employment, the amount of people who've been in care going into the prison system, certain minority groups having more experience with the justice system in general, these don't seem to have shifted. They've kind of been researched and they don't the numbers don't seem to have changed all that much.

SPEAKER_00

Yeah, so the type of research which you're describing is is inherently descriptive, right? It is a it's a description of a problem.

SPEAKER_01

Yeah.

SPEAKER_00

And that is a really useful thing to have, but it doesn't actually tell us what we should do about it. And the that's where the role of trials and also quasi-experimental approaches we talked about a little bit, that's where those come in. And that's partly why we don't see any movement, right? It's because we don't have any evidence about what we should do about it. We rely on good intentions. What I can tell you thus far is the good intentions have not gotten us to a perfect world where everything's lovely, and so we need something else. And that's why I think it's particularly important to have experimental research with these with targeted at these groups of people to try and build an evidence base about what actually works. We do make progress. So we ran the first ever randomized controlled trial in children's services starting in 2014 and running through 2016. Before that, we had had no research in that population at all in the UK. Since then, we've had more than 80 trials launched, nowhere near as many as we'd have in a medical field. And in fact, the number of randomized control trials to like slightly make Viagra-like products better conducted over the last 15 years, is way more than all of the research we've conducted to support young people in in contact with children's social care. So I would say we have our priorities slightly askew, but we have seen many, many more of those studies. Those studies have started to have a real impact on policy in the form of the government introducing legislation this year which will mandate family group conferences, an intervention that's been proved to be effective via RCT for all young people at the age of care. The government announced last week 8.4 million pounds to scale up an intervention called Lifelong Links that our research showed reduces homelessness among carelivers. So we are we are starting to see the effect of this sustained investment and effort in randomized control trials and quasar experiments in this space. This is slow going and it requires researchers to be. Serious, not just about describing a problem, but being a part of the solution. And as Julia says, investing in doing something about it after you've published your paper, right? It's not just a case of publishing the paper and then you're done. Hooray, you get promoted. It's about sustained engagement with policymakers, with the people you're trying to help, with the local authorities to make things actually happen. But that is gonna be that's gonna be a long old slog, and I believe we can get to a better place. It takes time and effort.

SPEAKER_02

And what do you think is behind the issues with kind of implementation policy? Is it partly to do with the way the evidence is translated, how much notice people in the political system take of policies, sort of ideology funding? What what do you think kind of influences that?

SPEAKER_00

So certainly all of the above, but I think it's really important that researchers don't let themselves off of the hook, right? It's very easy for me to say, well, you know, I published this great paper in the British Journal of Social Work. It's really government's fault that they haven't taken up the the recommendations of that policy. You've got to expend serious shoe leather in working with policymakers. You also got to recognize that, you know, I was in government during the coalition, which as it turns out looks like the high watermark for political stability in this country, right? It's not just a case of saying, okay, well, Josh McAllister is currently the children's minister at the time of writing, therefore I'm gonna write to Josh. You've got to be working with the civil servants around him, you've got to be working with all of the with everybody else, and you've got to accept the fact that your ministers are gonna change and you've got and you go back to square one, but you've got to keep on doing it, right? We choose to go to the moon and do the other things not because they're easy, but because they are hard. Like we are we have to keep putting our shoulder to the wheel and not getting dispirited by it, which is really difficult. But all of the other things you're talking about also make a difference. But academics like disappearing into the landscape as soon as they get their paper published is a huge part of the problem, which will get me in trouble with a bunch of academics, but you know, they didn't like me anyway.

SPEAKER_02

If you could improve the way that governments or public services use evidence, how would you do that?

SPEAKER_00

I think resilience is probably the main thing. So we have a lot of initiatives that get set up to do evidence-based policy, and everybody's very excited about them at the beginning, right? So we had a big, big investment, half a billion pounds in improving adult numeracy, the multiplayer program, the results of which have just been published. And at the beginning, the plan was like, we're gonna, by the end of this four years, we're gonna know what works for adult numeracy. And what's actually happened is we made an incremental step forward, right? And it's a good step forward. We've we learned a lot of stuff that we didn't know before, but people expected us to have like solved the problem by now, and so they sort of drift away and it loses attention because it's not we haven't solved it. So people need to be more resilient. Policymakers, academics, researchers of all flavors need to be more resilient to the fact this is a long-term problem, and they also need to engage in more long-term investment. So the setting up of the Education Endowment Foundation, which Michael Gove did in 2012, funded them for 12 years with a very substantial endowment, which allowed them to make decisions about what the best thing to do was in order to build the evidence base. Funding similar organizations on a one-year, a two-year basis does not allow them to make that kind of long-term investment. And that leads to a lower quality of evidence generation. So we need sustained attention and sustained investment.

SPEAKER_03

I just think that's so challenging in the fractured media environment that we're in, right? I'm trying to think of some of my heroes of like advocates for like evidence-based anything, like thinking about like Carl Sagan and the Cosmos TV series that was on in the 80s. Like we we don't have a media environment that looks like that anymore, where one particularly like charismatic but also really well-informed, science-driven person can get in front of a wide audience of people and like change hearts and minds, right? Short of that, what are our options? It's a hard question to answer.

SPEAKER_00

Yeah, that's why I completely agree with you. I think there's a whole bunch of policy areas where we don't need to do that. So, you know, to go back to my pet policy area of the of the year, hedger policy, almost nobody cares about it. We don't we don't need the public to be bought in on a new way of managing hedros. We just need a change in the law, right? So from most policy problems don't get that much attention. And working tirelessly in those to try and make a bit a difference, I think you can make a difference in those if you're resilient, right? So take the example of reducing homelessness among carelivers, like that isn't never gonna be front page news in any newspaper, but we can we can make a difference using ev a difference using evidence.

SPEAKER_03

So we just need to get into the hedgerow subreddit.

SPEAKER_00

Yeah.

SPEAKER_03

That's how we can do it.

unknown

Yeah.

SPEAKER_00

That's my entire policy agenda.

SPEAKER_03

And what gives you both hope that things can change? I think we're doing a fairly good job of documenting our successes. And I think in doing so, you're giving people more options to kind of come to you and to to like come to evidence-based policy with something that's going to draw them in. I also think, you know, something that Michael and I have worked on is just creating more practical guidance that's going to be accessible and reachable to policymakers and civil servants who are implementing new programs. So last year published a book called Designing and Delivering Randomized Trials on Social Policy, a bit of a mouthful. It's easy-to-access kind of starter kit for anybody who wants to get started in this space and thinking about ways of like how we socialize that book, how we make it more accessible. We developed a card game off the back of that, which haven't really fully leveraged up to this point. But we're not the only ones, right? Like government has come out with very high quality and thorough thorough guidance on how to do how how to generate valid evidence. So I think the more that we make that accessible and the more that we make it relevant for these different contexts, like hopefully people will start to see like, okay, like I don't have to just leave this to chance. I don't have to go through this without a roadmap. There are resources you can draw on. I do think like in generative AI times, that makes it the challenge a bit more challenging, but yeah, maybe that bubble will pop soon.

SPEAKER_00

So I guess my main source of hope is it is people. So I've been doing this for 15 years-ish now, and I feel like we were kind of stumbling and mumbling our way through the first few years of that process, and we made a great many mistakes, and we probably weren't the right people to be doing it in lots of ways. I feel like I I see the generation of people after me, so people like Julia and many of our colleagues, and they're so much smarter and better than we were, and they are doing amazing things, and those things are going to make the world a better place. And so I I the human element of this, the fact that there are so many wonderful people who are doing really important work day after day, is what gives me a lot of hope.

SPEAKER_02

Thank you again to Professor Michael Sanders and Julie Lingwood for joining me for this two-part conversation. If you enjoyed the episode, please like, comment, subscribe or follow. It really helps more people find the show. And if you'd like to support the podcast more directly, you can make a small donation through my Kofi link in the episode description. I'm trying to make the science intersection sustainable and every donation helps with the time hosting, editing, and promotion that goes into making these episodes. Thanks so much for listening, and I hope you'll be joining me again next time.