Connecting Society: How everyday data can shape our lives

Bonus episode: All your data questions, unwrapped

ADR UK (Administrative Data Research UK) Season 2 Episode 1

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 21:57

Time flies - and yes, we have returned with another festive episode. After a hiatus, hosts Shayda and Mark return for a one-off Christmas bonus ahead of Connecting Society series two launching in 2026.

This episode flips the format and hands things over to you. Shayda and Mark answer listener questions and some of the most common queries that come up during ADR UK’s public engagement work, from favourite administrative datasets to data security, skills, and how the public can get involved in research.

A festive, long-overdue catch up, a behind-the-scenes look at administrative data, and a warm lead-in to series two.

We'd love to hear from you! If you'd like your question answered in a future episode, send us an email or a voice note with your question to hub@adruk.org. 

If you're a member of the public looking to get involved in research, take a look at the ADR UK Working with the public page, or check out these opportunities with our friends: Join HDR UK Voices - HDR UK and Our Data, Our Say - DARE UK.

Connecting Society is presented by Shayda Kashef and Mark Green, our producers are Eleanor Collard, Holly Greenland, Laura Mulvey and Shayda Kashef.

This podcast is brought to you by ADR UK (Administrative Data Research UK), a partnership transforming public sector data into research insights and policy evidence to improve lives. We are an investment by the Economic and Social Research Council, part of UK Research and Innovation.

Shayda (00:13)
Hello and welcome to Connecting Society, a podcast about how everyday data can shape our lives. I'm Shayda Kashef, Senior Public Engagement Manager for ADR UK, or Administrative Data Research UK.

Mark (00:28)
And I'm Mark Green, Professor of Health Geography at the University of Liverpool. We are your co-hosts and guides around the wonders of administrative data.

Shayda (00:38)
In this podcast, we are exploring all the different ways in which the information that is collected about our everyday lives, from our interactions with health services to our voting behaviours, police and crime reporting, educational achievements, and more, is used by researchers and policymakers to make better decisions to support society and make the world a better place.

Mark (01:01)
We're back, we're back, get in, get in. We're back for a quick festive episode, a bit like Christmas pudding. We're coming for a one-off for this year, just to tell you a bit more about administrative data, as if you couldn't get enough of it. Shayda, how have you been?

Shayda (01:17)
I've been good. I'm so excited to be back. Christmas has come early for our listeners. But yeah, I've been good. I'm really excited that we're gearing up for episode two. And I'm excited for Christmas too, I'll be honest. It'll be nice to have a break.

So, besides chit-chatting about Christmas, this episode we thought that we might take some questions from you. After we published our first season, Mark, you said you got a few questions from our listeners, right?

Mark (01:47)
Yeah, and it's great. People message me saying nice things as well, asking us questions. And we thought it might be quite good to bring some of those onto the podcast and respond to them that way. And also, I know Shayda and the ADR UK team have been doing some great public consultation work talking to members of the public, and some of the same questions crop up. We felt like a good kind of frequently asked questions episode was about timely.

Shayda (02:15)
Yeah, definitely. And if a question that you've got doesn't get covered in this episode, then you can email us. If you go on the ADR UK website, there's a tab at the top on how to get involved. And when you click on it, there's another page on working with the public where you can email us your question. So Mark, I'm going to hand it over to you.

Throw us a question.

Mark (02:43)
Sure. So Adrian — thanks, Adrian — they got in contact and they asked us, what's your favourite AdminMazing dataset? I like that, there's a little bit of a callback to episode one there. So Shayda, what's your favourite?

Shayda (02:56)
Nice, he's like a true fan.

Okay, so I love all my children equally, but I definitely have a favourite, and it's probably the Ministry of Justice linked to Department for Education dataset. I'm just so fascinated by this dataset. So it covers education records.

It's de-identified, of course, and it's linked to police and crime records as well. There's just been so much interesting research that has come out of this looking at the experiences of children in care. And they are one of the most vulnerable groups in society, so learning more about their experiences — and sort of interventions that can help support them in living lives where they can flourish — has been really interesting and inspiring.

Mark, over to you. What's your favourite AdminMazing dataset?

Mark (04:03)
Well, I obviously, as a very boring person, was going to go with mortality records, because they're not the most cheerful of things, but they have a kind of special place in my heart because that's where I kind of cut my teeth as a researcher — where I first got my hands on admin data and kind of developed a lot of my skills.

I also think of cause of death data that we get, deaths registration data that we use through the Office for National Statistics. It tells us so much about society. Cause of death tells us something about an individual in terms of their life, the opportunities and maybe some of their experiences through their lives. Many different types of causes of death are socially and geographically patterned. So again, this tells us something about why certain communities end up dying of different things at different ages than others.

So whilst it's a bit depressing and a little bit morbid — and I definitely get told off at home for bringing up morbid conversations about mortality data — it's by far the thing that kind of excites me out of all of those things. Apologies to other datasets that I do also use.

Shayda (05:17)
All right. Well, I'll throw a question at you now. So what's the most surprising thing you've learnt from or about administrative data?

Mark (05:28)
Okay, okay, right, yep, okay. So not mortality-related, but it will be health-related, because that's what I spend a lot of my time doing. So a couple of years ago, I was involved in a project that was using electronic health records. So specifically, this was using data from hospital admissions. So when people go to hospital, that is recorded on a computer system, and those data which are de-identified, so you don't have anything personal about the individual — you get the reasons behind why they're hospitalised and maybe a little bit of information about age group or sex.

And we were going through this kind of phase of cleaning the data. Part of that included cleaning some of the reasons behind why people were being hospitalised, and this was covering the whole of Cheshire and Merseyside, where I'm based.

And it's really interesting because, for me, when you start to look at the reasons some people get hospitalised, and look at some of the more obscure reasons why people get hospitalised, you start to pick up some interesting things. So in Cheshire and Merseyside as a whole, more people end up being admitted to hospital for skiing accidents or having been bitten by a spider than have been shot or stabbed.

And I just — it kind of — I didn't realise this was going on. There were so many people hurting themselves skiing, and it persists through the year as well. We still get people attending A&E or being in hospital for skiing accidents in the summer. And I just think, I don't know what they're doing — whether they're skiing down hills or something. I really don't know what people are doing. But that was just — I just thought it was all surprising and made me think that this is a sort of pop science book I should be writing: the weird and wonderful ways in which people hurt themselves.

Shayda (07:29)
You mentioned something around cleaning the data. Can you explain to us what that means?

Mark (07:33)
So cleaning the datasets is kind of an informal term for basically when you get hold of an administrative dataset, it can be quite messy, unstructured, and not ready for analysis. Cleaning, in this sense, means taking these data in their raw format and improving them, refining them, and essentially getting them into the format that's ready for what you need for your analysis.

Shayda (08:00)
Okay, so what I think I'm understanding is that when the data is originally received, you can't just analyse it straight away. It's not tidy. I've heard that there can be duplications. So you're essentially tidying — you have to do some housekeeping before you can actually start going straight onto the data. Is that right?

Mark (08:31)
Yeah, yeah, yeah. And it can be as simple as, you know, I'm really interested in the health of people aged 16 to 24, so I just need to extract those records from the data. Or it could be as complicated as trying to count the number of times a person has visited their GP over a time period, for example.

Shayda (08:58)
Wow, okay. A lot of work.

But kind of on that, actually, we have another question. So say if you're a beginner and you want to start using admin data, how do you even start using it? Do you need to be good at coding? Do you need to know Excel? Do you have to be good at maths? Give us the details.

Mark (09:20)
Well, I think it depends on what you want to achieve. So you don't need to be a coding wizard, for sure. It certainly helps, though, and particularly if you want to do more advanced things, you're going to have to know how to handle computer code and do some statistical analyses. But sometimes these datasets come in an Excel spreadsheet format. It can just be a case of summarising and producing graphs, or creating average statistics, looking at means or percentages of people within certain categories.

So there are still things you can do even if you're feeling less confident with your coding or maths background. And I think it's the sort of thing that once you've had a bit of experience playing around with it, it's kind of exciting — you can see more opportunities there.

The other side of this is if you don't feel confident, it's time to think about who you can work with and really building collaborations with people who maybe have more experience, and thinking about team science. Not just focusing too much on yourself, because you'll have your own skills that you bring to a project.

Shayda (10:43)
Yeah, I like that. I like the idea of bringing together people with slightly different skills and strengths and doing a project together. I think that is a really great way to approach research.

All right, I'm going to hand it over to you to ask a question now.

Mark (11:05)
Cool, thanks. We had an email from Cath, who put down that they're aged 38 and three quarters. We don't need to know your age if you do email in — thanks, Cath. They wrote in to ask us why data on young people are always presented as the age group 16 to 24. They also noted that their background is within education and that they're really interested in people's transitions out of education. Why do we always present the data within this age group? I mean, Shayda, do you have an idea?

Shayda (11:42)
I don't know. And actually, it just makes me think of when I turned a certain age and moved into a different bracket when I was scrolling for my year of birth — like, oh god. But no, I don't know the answer to that. Can you enlighten us, Mark?

Mark (12:02)
I mean, you're not that far off with your response. So traditionally — and at least historically — people would leave compulsory education at the age of 16 in the UK, as they might in other countries as well. So 16 to 24 represents this key transition point in becoming an adult.

And demographers — the cheerful bunch they are — look at the clustering of behaviour: moving into and out of education, first jobs, leaving the family home, similar mobility patterns, migration, moving around. They end up bunching people together because they see them as quite similar.

But we do need to think carefully about this, because a 16-year-old is very different to a 24-year-old. One might be doing A-levels and staying in full-time education; the other is probably entering the workplace and may have been working for a few years already.

Demographers like to categorise people using five-year or ten-year age bands — these arbitrary things — but the purpose is to reflect something relatable. So that would be like transition to adulthood, 16 to 24, something we all go through, something that's kind of relatable. 10-year age bands, quite relatable because we often talk about we're in our 30s, in our 20s, that's how we kind of relate to our life and our transitions through our life.

Shayda (13:46)
Thanks, that's very interesting. It's good to know. Makes sense.

Mark (13:50)
Okay, moving on. Another one we got in: are researchers learning personal information about me specifically? Shayda, take it away.

Shayda (14:03)
Well, rest assured that no one is spying on you — definitely not with the data we're talking about for research and statistics purposes. First of all, it's illegal. It's illegal to look yourself up. Right, Mark — you're nodding.

So for instance, I was talking about the education data linked to crime records. Mark was educated in the UK; I was educated in the UK for a period of time, so both of us probably fall into education records. But Mark’s the data scientist here, not me. But if Mark tried to find themself, first of all, it would be like finding a needle in a haystack, and secondly, it would be illegal.

The second thing is that the data is de-identified, which means personal information that could connect you back to an individual is removed before any researcher sees the data. And when you link two datasets together — for instance, education and crime — before anything can be released into the real world, whether that's a paper on your analysis, a PowerPoint slide, or an infographic, it has to be checked for re-identification. I believe it goes through people and machines to make sure there is no re-identification. Mark, jump in if I've left anything out.

Mark (15:58)
No, that's spot on. A lot of the data I and other researchers have access to make it very hard to identify people, because you have such limited information. Anything personal or sensitive is scrubbed out. And like Shayda said, they're held in these secure places that make it add an extra layer on that it's really hard to get anything out of it unless it meets certain standards. There's a lot of things put in place to really prevent the misuse of these data.

Shayda (16:30)
So, you, you mentioned something that's really important, Mark, that the data is also analysed in these safe settings or through a secure connection. So earlier, when you mentioned that the data might be in an Excel spreadsheet, you know, I think the mind automatically goes to like the Excel that you just have on your desktop, that kind of thing.

But what you're referencing there is that the format is an Excel, but it's only accessed under these very specific secure conditions. It's not like you can open up your Excel file on your desktop, any old desktop that's connected to the internet, for instance, that's not permitted, right?

Mark (17:13)
Yeah, yeah, yeah, yeah. Absolutely.

Shayda (17:17)
So, what happens if there's a data breach?

Mark (17:21)
Well, I guess this is where there's laws in place for dealing with these things. So, if a data breach is detected, then there is a legal requirement for that to be reported. There'll be an investigation and, depending on that, legal action might be brought. So there's very stringent rules in place, rightly so, that if researchers misbehave or do things inappropriately, they are essentially punished. And this could go from banning a university from accessing these data towards prison time and a fine.

Okay, moving away from security, we've got one more question. So how can I get involved or have a say in how my data is used? So Shayda, I mean, there's so much that goes on at ADR UK.

Shayda (18:09)
Yeah, so that's a great question. There's loads of ways to get involved. There's something called public panels, which are groups of general members of the public or public contributors who are people who are members of the public but regularly engage in research and they meet frequently and regularly to advise on research. Typically, you need to wait for the panel to undergo recruitment. But a more common way of being involved is through that public contributor route.

Everyone has lived experience. We've all been children, we've all experienced school, maybe we have a particular type of life experience where we live in a particular type of area, or maybe we have a particular type of health condition, or we are of a certain gender or identity of some kind. And that gives us a really unique perspective into what it's like living life from that lens. And so that information is really valuable because when researchers are analysing data that's related to that certain type of group, it's really important that they capture those experiences sensitively and accurately.

And sometimes the data can't tell us everything. Sometimes the data will be missing information about a certain type of experience. And also, sometimes the way that we describe people and groups and circumstances might not be the best possible or culturally sensitive way.

So, for all of those reasons, engaging with research via your lived experience can be extremely valuable. So if you want to be involved in how our data can be used for research, you can sign up and be a public contributor. And there's loads of ways to do that. Information about that will be available in the show notes, including some existing schemes from our friends over at Health Data Research UK or DARE UK, which is another programme, and even ADR UK. So yeah, please check out the show notes if you want to get involved.

Mark (20:34)
Yeah, I mean, that's great. And I think also, we had a wonderful episode in season one that kind of covered the benefits of that as well. Worth checking out that episode if you missed it or just for a refresher.

Shayda (20:47)
That's true. Actually, yeah, we had an episode towards the end of the season on care experienced young people getting involved in research. And I think it was one of the earlier episodes we had with a member of a public panel as well. So it was really interesting learning about research from her perspective too. Yeah, check them out. And if you're convinced, drop us a line.

We're getting the signal to wrap it up. So that's it for today's episode. Thanks everyone for tuning in. It's been really nice to be on the mic with you again, Mark. I'm really looking forward to season two.

Mark (21:29)
Yeah, but I can't wait for more interesting conversations about administrative data.

Shayda (21:37)
And beyond. Well, thank you for listening everybody and we wish you a joyful winter break.

Mark (21:43)
Until next time, stay curious about how your everyday data might shape society.