Who Manipulated Who? How Data Manipulation Can Skew Your Reality. Artwork

Cream City Calculation

Three friends talking about data and how it impacts our lives and the lives of others.

Cream City Calculation

Who Manipulated Who? How Data Manipulation Can Skew Your Reality.

May 01, 2024 • Salim Fadel • Season 1 • Episode 2

Sal Fadel, Colleen Hayes, and Frankie Chalupsky discuss how data can be manipulated or skewed to show results in a biased way. They discuss the following:

-Data poisoning and how it can be used for good and evil

-Purposeful data manipulation recent news

-Accidental data manipulation, how it can be handled, and the impacts it may have on a company or an industry

Thank you to Continuus Technologies for sponsoring us and allowing us to use their space and equipment!

Data Pulse Sources:

Tesla Raising Pay for AI Engineers To Counter Poaching, Musk Says (investopedia.com)

Aldi Is Overhauling Its Checkout Process with the Help of AI (foodandwine.com)

LINGO-2: Driving with Natural Language - Wayve

Colorado Bill Aims to Protect Consumer Brain Data - The New York Times (nytimes.com)

Other Sources:

How Misleading Statistics Works | Whatagraph

https://www.wired.com/story/jeffrey-epstein-island-visitors-data-broker-leak/

Colorado Bureau of Investigation finds manipulated data, incomplete results in DNA testing process - CBS Colorado (cbsnews.com)

How Novartis’ data manipulation case is a cautionary tale for transparency | Pharma Manufacturing

https://www.wired.com/2016/10/big-data-algorithms-manipulating-us/

Harvard professor who studies dishonesty is accused of tampering with data : NPR

Stanford head to resign after data manipulation probe (bbc.com)

https://www.wired.com/story/synthetic-data-is-a-dangerous-teacher/

As Generative AI Takes Off, Researchers Warn of Data Poisoning - WSJ

0:01

Welcome to the cream city calculations podcast. We're three colleagues and friends that love data and to talk about how data is impacting our lives. I'm Colleen, I'm Frankie and I'm Sal. Today's episode is who's manipulating who, how data manipulation can skew your reality. So this week we're starting a new segment called the data pulse, and this is going to be your quick hit source for this month's most impactful data news. There's been a couple of really interesting things in the news lately. Elon Musk boosting pay for AI engineers to prevent poaching from open AI. There's another one around your brain waves are for sale. And we're going to probably talk a little bit more about that. Yeah, that one really stood out to me. I've got some thoughts. Absolutely. Also, Aldi is getting rid of their scanners, cashiers and checkout lines with the help of AI. And then finally, lingo is using driving natural language processing to help automate their cars. We'll add links to all these articles in our show notes if you would like to read more But let's talk about the brain wave article yeah, this one is really interesting. The governor of colorado signed a bill for the first time in the united states, Aimed at protecting your brain waves so it actually Is designed to protect your biological and your neural data That some of these app companies are collecting While you're using their apps. Yeah, the item Of question here is like more wearable tech right there. Talk about a headband that It scans your brainwaves as you do things like swipe through like a dating app, for example. And so this company is collecting all of this information. It's trying to get to the root of how you really feel about things without asking you. It's using your brainwaves to infer what you prefer. How do you feel about that? I think it's, yeah, no, I think you have thoughts. I think it's really crazy. Cause one of the concerns that I had, and obviously I go to concerns, maybe that's the type of person I am is what if they're start selling this information and they start to track like political leaders or CEOs or people of high influence, right? Yeah. And they start to see Oh, this person had bad sleep prior to a meeting, right? A really important meeting, or they're really stressed. That about large decisions so that like you can now play the market in that aspect of Oh, are they going to make A specific type of decision or not, right? And so like I think selling this data is going to be the biggest driver and be like wow this should not just be shared across. Yeah, everyone and I think the article cites an example of being able to use as with any of this wearable tech stuff being able to infer Things about your health and your mental health based on the patterns that they see in the data my initial thoughts are like why would you do this? You This just seems too invasive to me. Overall, I feel like any time I like, I've just feel like it's crossing the line. Like I don't want this tech inside my body. It's bad enough to have to worry about what happens to your data. When you self report something I guess it's maybe not that much more of an extension over like your Apple watch and all the heart monitoring it does. But so one of the interesting things that is included in this act is that it gives consumers the right to access, delete, and correct their data. And it's really huge because you think about that now and like these companies, like we talked about Fitbit last time and Apple watches, cause we all have one, but. I mean they have all this information around me and my body and then I can't even see Most of that data. I can't see the data that they collected. I don't have access to it So if I wanted to do my own statistical analysis, I could not. Yeah So just make sure that when you are This is only in Colorado. This bill has only been set. So federally, this is not covered with any health care law. And so we want to make sure that this is only being done in Colorado. But yeah I really would not want my data to be shared with anybody, Yeah, I feel like maybe it's a good start, but it's maybe not enough and I want to say, my next thought is naturally then why wouldn't this extend to other health information that isn't like from brainwaves, for example, like why wouldn't something, I don't know, in a period tracker app or other health related information that I'm entering, why wouldn't that data be covered as well? And then from there I want to ask the question so what, like, how are they going to enforce this? And I actually went and read what the legislature in Colorado wrote up. And they basically state that in this law empowers the attorney general and district attorneys to access and evaluate a company's data protection assessments to impose penalties where violations occur and to prevent future violations. What does that mean in practice? Are, were they going to get a fine? What happens to them if they don't? And what are the thresholds? How do they measure that you haven't done enough to keep your data from being hacked? Yeah, absolutely. As we go through some of the stuff that we talk about later in this podcast, you'll notice that there's unique way of hacking data in some of these aspects, so that I think there's going to be a lot of things that are going to come from this. So with today's topic being data manipulation, we wanted to dive into what that concludes. Sometimes I think it's as simple as a data error. And if you're working with data and you make a mistake, that can be considered data manipulation. Other ways of manipulation is like using a bad sample or a misleading graph, selective data display, or omitting what the baseline is. So like one of the main things around just thinking through data manipulation, is it purposefully done or is it. Accidentally, right? One of the things I see out there is maybe you just have a bad statistician or maybe not even a person that is putting this, it's a journalist that's putting this information together that has no idea how to actually do any statistics. And so really understanding what is a Z score, right? Is really important and could be really misleading for a lot of articles and visualizations that they put in these articles. And you actually look at the sample size of the this data and it's five people and you're like that's not a true representation of the people out there. Yeah. And we always talk about unintentional bias, right? So bad sampling is a really good example of how you could end up with misleading data. If you only target certain people with your survey, for example, you're only going to get results from that portion of the population. So if you're. Intention is to present data on let's say America as a whole, but you only go to people who live in the Midwest. Your results are going to be skewed toward people who live in the Midwest. Another example would be like using an illegitimate model that maybe only explains a small amount of the data. You could easily just neglect that r squared value and be like, oh, yeah, this model is great. Yeah. I think, too, it's really easy to skew data in the visualization layer. You were touched on that, Sal. If you're, if you don't have a zero axis and you start your axis at a million and you only go to 1. 2 million. It can, you can really easily make two numbers look very different when really meet their percentage wise, not that much different at all. Yeah. Like one of the main things that came up from an article that we all read was really regarding the flatline look of global warming. You can make it look like a, not a problem because it's really going up. A half a degree, right? But if you have a scale between negative 10 and 110, that half, it doesn't move much. So you really have to make sure that you're applying the right scale to show that. But you can also use that scale to go the other way when you want something to look bigger gap than it is, another thing that I thought was really important too, is like correlation does not imply causation. And it can look like the data tells you one thing, but there may be something completely unrelated to the factors that you're looking at that affected those numbers. I think the example they gave was when they were looking at lifespan and people's diet types. And it seemed to imply that people who had, consumed less carbohydrates, had a shorter lifespan. But when we're in reality, it may have something to do with the quality of the food those people were eating. So all of us work in data, how do you guys make sure that you're not putting misleading statistics or misleading facts out there? What are the, what are some of the ways that you guys do it? Yeah, we have a pretty consistent validation check between like my team that I'm working with. And whoever's doing the project, we always have somebody else take a look at it and review it with another set of eyes. It's know, you can miss something really easily or something you didn't really think was an issue. And then, Oh yeah, you're right. That does look misleading or I should use some different measure there or something. Peer review, I think is the number one thing that we rely on to make sure that what we're putting out there is clear and concise and as truthful as possible. And just a self review to be aware of what your biases are and like, make sure that when you're going through something you put together, That you're being really thoughtful around what you did and acting as if it's not something you put together. And I think I've just over time formulated my own process and I kind of stick to a consistent process of looking at things when I'm making updates and I try to do a very thorough analysis of what I did and check it against source data. If I have source data and make sure that it's aligning. Yeah. I oftentimes for me for. I actually have to sit on it for a bit. So like I'll build out something and then I'll go to something completely different and then come back to it and just see where my eye goes and see how it catches it. And if I see like really contrast things, I double check those contrasts and saying like, why is this like this? And then I actually double check it with experts. So people on my team that I work with that know this data maybe more, or at least know the business side of that and say Hey, does this check to What you would expect. And that helps me self check all the time as well. Yeah, that's a really good point. I do that too. And I think the reason that we wanted to bring this topic up is that not everybody does this. So when you're looking at articles and you're thinking through visualizations, like you should be a little bit of a healthy skeptic, right? I'm not saying you should always be questioning everything. So question everything. That's exactly what she said. Yeah. You have to build trust with whoever's working with your data and Or if you're the one working with the data, you have to build trust with your business users or whoever's looking at your information. But be very aware when you are looking at visualizations that are maybe published in public journal or something. Sometimes, it could be skewed in a certain way or I've seen information like set up with a, an axis that was misleading. And I would say, be very aware anytime you see a data visualization where they don't label the axis, it almost seems like they're trying to hide something. If they're not putting that this is zero or this is 1 million showing what the starting and ending points are, they're probably trying to hide something. They're telling a story. You're trying to get you to think a certain way, just be cautious to that. With that being said, I think that's really important also to think through and even in when they put facts in the written format, right? If they're not backing it up with some kind of, how do they come through that, right? That's going to be really important to understand and be cautious about some of the facts, especially as we start to move more and more into this political game or the political realm of election coming up. I think it's going to be more and more important to actually look through the details, right? Yeah, where is that? Where did that data come from? Yes, like what's that source and you know If they're not providing information around the source, you should probably find something else Yeah That's great. So it leads us into a couple other topics that we wanted to talk about today. I think the main one or a really good one that we talked about is around data poisoning. It's actually a really fun and creative term, but it's also extremely scary. I really liked some of the examples that they gave in this article about how you would poison data. Do you want to explain what data poisoning is? Yeah. The article itself is as generative AI takes off, researchers warn about data poisoning. So again, these models are trained. With open source data across the internet there. They're trained with Wikipedia data. And a lot of data, right? Like it's got to gather a lot in order to be effective. And every day they're asking for more and more And so one of the, one of the things is this data is getting trained, but who's verifying that train data is accurate. And then now is if you don't want it to be accurate or you want these models to. Display different information. Can you manipulate it? And is that the hacking or data poisoning in this case that you can add to these models? And should there be some regulatory and actually governance around all of this so that we're not going to have data poisoning? Yeah. The article makes it seem like that would be really difficult to do. Like, how would you regulate every single site on the internet? They gave this example that like people could go out and purchase old domains, things that, websites that are stale that haven't been used and they could insert images or content that is just completely wrong or misleading that could affect these AI models when they train, they could basically say the sky is green, and then as the AI model goes through and does its thing it's picking up that data that's got wrong or misleading. Or misleading information. And they call out the fact that it really wouldn't take a whole lot of this information to skew your results. And they said, for as low as 60 bucks, you could buy a few domains and really affect a model. Isn't that crazy when you think about it, but 60 to skew some data and false information, right? And make Sal is awesome. If you start seeing, ads and things online that just say, sale is awesome, you've been manipulated. Just kidding, Sal. hack the world. Some interesting examples though of like how this could pertain to you is that one that they've mentioned was you could put out false information about a public figure and then, you could make it seem like, so that it's maybe triggering something that you're also Passionate about and make it seem like that person is also passionate about that same thing kind of way I think we might have touched on that last time But you could also put out false information like that's negative about them So not just good things that like would maybe match what you feel but bad things as well and then the other one that was interesting was if you We're asking about tax documents or something. Oh, yeah, that was scary Yeah, they could use that information and then have it email them those documents or send it to a particular address so Now we have to be wary about if we're using a website Even if it's a good website or something like a good company it could still be hacked and you could still get hacked end up sending your personal information to the wrong person. Yeah Actually, let's dive into it. So one of the main things that had was They would put in tax documentation into these chatbots And then have those chatbots have a url in there that the person that the chatbot would then say hey If you want more resources on this information Click on this link and then you would add in all your information to that link. That was actually a hacker taking all your information yeah. It makes me think like there's gotta be some sort of way that you can govern that data in some way. But it's all essentially your chat bot or whatever. It's using internet resources to compile all this data. So again, like you're looking, you're basically trying to boil the ocean. Like you're trying to comb through everything and. Someone's wrong on the internet. I just pictured that meme. Somebody, I can't go to sleep. There's somebody wrong on the internet. Like you would just be. Sisyphus, like pushing that rock uphill over and over again. And I promise, I hope this podcast doesn't come up like making you a skeptic, but just, we're just opening your eyes to some of the skeptics. I will say though, the same article had a really interesting, like flip side of this and that you can use the same. Ideology to protect like artists content, for example, like you could insert things. If you're an artist and let's say you're posting pictures of your artwork online, you don't want somebody else, either a song or a, I don't know, an image that you created, whatever but you can protect your own content that you've created by inserting sort of invisible code into it. It doesn't change the image itself, but it changes it enough so that AI doesn't see it as the thing that it is. So let's say you're a designer of purses. And you have images of all the stuff that you've created in your online store. You could essentially insert things into those images that makes AI think it's not a purse, but it's a toaster. And that way, when somebody goes through and is trying to like, have AI generate. Images of purses. They're not stealing your your content. They're not stealing your intellectual property. Yeah. So that software is called Nightshade. If you're interested and want to look it up. This is one of the most amazing ways of using data poisoning, right? Like it's a positive spin to something that was such a negative thing. So I love that they took that and they made it into something good. That of leads us into as these models start to get trained and looking through this, like training on synthetic data, on data that is being created from AI and the dangers around that is have you guys seen or you have any concerns on some of that? So synthetic data is a dangerous teacher is one of the articles that we read and we wanted to talk about that because There's this idea that you can be training a another model like a another chat gpt off of the original chat gpt model and so That can get really dangerous too because as this model is progressing and Using the data that was used for another model like what? You come out of that? Will it be the same as the original chat GPT model? No. But like, how could it differentiate and could it get worse? Could it get better? I'm not really sure. I feel like it would just degrade over time. Don't you like, that's just my gut reaction to that as I feel like, with anything you, if you take a copy of a copy. You're losing something in that translation, you're losing something in moving from point A to point B, and as much as you try to make it the same as the original, I don't think it completely is. Yeah I guess what scares me is if you look at history is only made up of the documents that we recorded. And could this be misleading, or could this, could that narrative or what that story be? Or history historical story change because these models are actually telling a different story, right? Or you're just reamplifying the points that maybe shouldn't be the points that are amplified. Yeah. The, in this article, it's actually says 131 percent of a percent increase in misinformation news articles by increasing synthetic learning or synthetic modeling. So like that could drive. Misinformation that go into articles that decisions made on it. And then you can see how it evolves from there. And then that becomes history that gets retrained when these models. Yeah. And that thing that starts off as that small little like variant, like just by the time you, you're four or five models into it, like it's a huge difference. And do you think that we're ever going to hit a point of Hey, we don't, we put all this information and this massive amount of information into these models, but we're running out of more information to train these models to be more humid or whatever. You're like, that's, I think is, that's where you can get into the synthetic area as well. Yeah, that's a really great question. And I feel like one thing that could get really tricky with it is, like thinking about how some of the negativity over time, and think about like United States history. And we're feeding negativity into our models, right? And what if our model doesn't understand that over time, the United States has changed, right? And then it's utilizing poor information or, negative stereotypes or racial slurs, hateful speech, things like that. And then continuing to utilize information like that and thinking it's the right information to use. So speaking of using bad data let's talk about the Harvard and Stanford professors who were being held accountable for their data manipulation. Yeah. I think the takeaway there is that, We think of Stanford and Harvard as these Ivy League organizations, these elite institutions that anybody would, really strive to be part of, but they're no less susceptible to some of these practices than any of us. There's two different articles around this and we'll add them to the podcast, but Francesca Gino, a prominent professor at Harvard Business School who was actually known for researching dishonesty and unethical behavior accidentally submitted some reports that contained falsified information. And so there were a variety of people who came forward and were like recognizing that it was false information. And I don't know how that one ever ended up, if she like did it purposefully or if she just didn't know that it was falsified. Yeah. Yeah. Yeah. I think they're still investigating some of these stories. I think overall though, in some of these articles, they call out the fact that they didn't purposefully like misinterpret some of this information, but rather some of their testing didn't quite meet the standards to which we'd hope we'd test our results. And I think that it can be easy sometimes to cut corners and think it's not that big of a deal. Yeah. I think it gets back to the point we made earlier is Who's making it? Are they, maybe they're not a seasoned statistician, right? And so they don't know how to actually create this fact or statistics in a correct format. And so I think making sure that when they are publishing it, they should be proofing it with other people, right? This is why they go to peer reviews a lot of times. And the fact that these probably went beyond peer reviews and still didn't. And I think too, there's probably something to be said. If you're somebody who works for Harvard and you're known for writing these types of articles, how does it feel to be somebody who's asked to peer review that and go, God, now I'm the person that's got to call them out, right? Did you imagine, Oh my God, I've got to say she's wrong. That might be a really, intimidating to do that. Yeah, and the Stanford one was actually interesting as well because that one, it seemed like there may have been like purposeful manipulation that he had contained photoshopped and manipulated data within a neurological research report which is, a little bit important, just a little important, and he refused to correct The mistakes and so like that to me screams. Yeah that it was purposeful and I think that's terrible, he ended up having to resign and the board kicked him off, but yeah, just something to be aware of and Be thoughtful of like it's not always purposeful and sometimes it's just an accident and mistaking Making a mistake with data. And I think too, it goes, to say that you shouldn't be so proud and that, peer review, people are going to find stuff. It's not that you're intentionally trying to cut a corner or mislead somebody, but rather, we all get tired. We make mistakes and we're human beings. And if you send something to a colleague for peer review, like maybe don't be hurt, don't let your feelings be hurt. If they find something for sure, they take issue with it. I wouldn't know. I've never made a mistake before. I've made a hundred of them, so it's okay. I'll make up for yours. Kind of around that, like talking about Harvard, talking about Stanford. The, those schools are ranked some of the best schools in the country, right? Yeah, probably the world. Like you could probably argue the entire world. So how is that even built? Like, where does that come from? Where does the perception that they're the top schools in the country of the world come from? That's a really good question. Are we just self fulfilling? Like just believing the things we've said a hundred times. And I'm sorry, but I have to bring it up. Like it's that whole story too, of who said Trump was a billionaire? Trump did, right? And it just self perpetuated over and over again. Never saw his taxes. Like Forbes said that he was a millionaire or a billionaire. And then other articles cited that. And it's are we just cycling through the same information and reusing it? And is it biased? Yeah, absolutely. Another thing in that article that I found interesting was they talked about how Getting into those schools is so tricky. And so a lot of them have algorithms set up that like If a student is in X activity and in Y activity and their grades, or there's SAT is X, then admit them, things like that. Yeah. Like students are just elements in a model. Part of the decision tree matrix. Yeah. Yeah. It's crazy. Cause it really dehumanizes the whole, Part of getting into school and going to a college like that. But is that bad? In that situation, do you think you'd want to just have it all be black and white? I don't, is it to your benefit as a female or as a minority to just be seen as one of any of the numbers or is it more, what if that's part of their algorithm that you don't even know what is their internal bias to that? Cause they're like, this is what our typical student looks like, or if they're looking at it, it's like, We want to maximize the amount of money that a student makes after college, right? What is that salary, right? And so like men are They're gonna make more money, right? And so do are they favoring that versus not like right because then they can have their statistics and say our recent graduates earn X number of dollars per year Which again self fulfilling thing and that it just keeps churning through that. Hey, more people want to You know Attend that university because they're promising these great jobs after graduation. The article is big data algorithms are manipulating us all. We'll link it in the show, we'll put it in the show notes. But really it talks about how. How the structure and how like deans are actually changing and chancellors are changing how they want to be ranked higher than other people versus actually looking at the education, the quality of the education. And so when you know what you want, you can turn that algorithm into what you want, right? Like you can manipulate it and make it into the result that you want. Yeah. And another like example of that is that. When kids are trying to get into these prestigious schools, there is such a thing as a consultant who will try to help you get into those schools. You will have to pay 20, 000 for this. Yeah. They, this is a 25, 000 charge for that consultant. But they're using the predictive algorithm and trying to understand what is for example, we'll just talk about NYU. How is NYU deciphering when a student gets in or when a student does not get in? And so that consultant has understood and been able to break into what that algorithm is and then help people get their kids into colleges like that. Cause when you think about it, that algorithm, just think about it as like a simple algebra problem. You've got all these buckets, all these variables to fill in. Like you were saying, grades are X. Your extracurriculars are Y, your SAT scores or your ACT scores are Q, whatever. You fill in all the variables. I'm sure that consultant has an idea. Certain variables are probably weighted more heavily than others. And they probably, if they're any good, could tell you, Hey, here's the variable you need to work on. You need more extracurriculars. You need more volunteer hours. You need whatever. So that you get the biggest bang for your buck. Like what things can you change in your high school curricula or in your high school career that we're going to have the biggest effect on your ability to get into some of these colleges. I was actually watching a couple of Instagram reels that they'll go through the kids, like they're when they get accepted or not accepted and they look through their resume or their. Yeah. Resume. I don't know what you would call it for a kid. Yeah. And yeah, and so they would look through that and say like this person started their own company and did all these things and made 20 billion, but yet didn't get into Harvard and you're like, how is that possible? There are all those stories, right? Like that so and so dropped out of college after a semester or whatever, but they went on to be really successful entrepreneurs, definitely college acceptance does not equal success in life. Yeah. Definitely leads to it, I think, but not for everybody. So another type of algorithm that we might have all experienced at some point in our lives is a personality test. Have you guys had to take one of those for a job interview? I have. Yes. Not for a job interview, but I've taken it here at Continuous. We did something like that. I forget what that test was called, but it was more about assessment. Yes. Disc assessment. Yes. And it was more about like your communication styles and your leadership styles and things like that. Yeah. But some companies will, I have to do this twice. I can't believe you guys haven't ever done this. No, I have. I have. You have to take a personality assessment before they like, Give you an interview even. Wow. Yeah. What are they, what do they drive out of it? Do you fit their culture? Is that what they're trying to get? So they're trying to use a cheap way of, yeah. I think they're trying to weed out people who are dishonest. That's, if you remember some of the questions, I feel like they ask you the same question in slightly different ways to see if you answer similar. I don't always answer those questions the same. Cause I'll be like, how did I answer last? I know that's exactly it. Cause it's Oh, this is familiar. And I know that they do this. I can't remember what I said last time, but yeah. But I think it is looking for that culture fit, that personality fit, or if they have, maybe they're looking for a way to diversify their personalities on the team to try to have a better working relationship. I'm not sure exactly how people are using them. I was just going to say, could you imagine being the hiring manager? And then you get this data back about Frankie. And then it's okay this is great. I have the answers to her questions, but how do you make any sense? How do you know if that person is in a good fit? I feel like you'd have to meet somebody in person to know that. I agree. And here's where it gets trickier is you don't understand the algorithm behind that personality test. And that's where this kind of all ties together is. They're hiring people based upon these results and they don't understand what it's actually doing and nobody really knows I don't know how any of these personality tests work and how much weight do you think these employers are putting on this? Personality test. I can tell you, I didn't get a second interview. I want to know, I want you to take that here so that we actually know for one of my personality tests, it worked out great. And for the other one, I didn't get an interview. So I think for the next podcast, we should all take like a Cosmo quiz or something, it almost is like it's self fulfilling like prophecy in that way. Do they. They hire only a specific type of person right that they bank off of this test and then they hire that person They're like this is these people are successful because they're successful here, but you're like the only people there are those type of people Yeah but yeah, it's just really interesting that they're making these decisions and I mean there could be discrimination in these hiring processes, right? If you're constantly looking for one type of person You could really get yourself in trouble by using those personality tests and ruling people out based upon who they are. A tricky thing, but I saw that in the article and I thought that was really interesting because I've been that person before. Yeah. Yeah. The other thing that I thought was interesting is that they mentioned a book That was written in 1954 by Daryl Huff called How to Lie with Statistics. Oh, yeah, and I was mind blown that in 1954 they were already like talking about this. Like it seems like such a recent topic. We just call it data. Yeah, so it just seems so recent to me and I was just whoa This was a thing way back in the 1950s, which is crazy. Another part in this article, it talked about insurance companies and how they're using, they used to use your driving history, right? And now they're using these algorithms of like full driving profile or whatever they want to call it. It's like Your GPA is and what your your credit score. And that all is a reflection of how good or how much you should be paying for insurance. And I thought that was like, okay, can we hack that? Or could we change that into a way that could benefit us to lower insurance costs by, by looking at that algorithm and being like, all right let's do it. Let's just manipulate it a little bit so I can save 50 a month. Yeah, that's an interesting one. And what kind of things are insurance companies using to look at us? And what are they measuring us up against? And I do know I worked in insurance once before. And so I do know that all of those things are regulated by the state too. So what they are rating you on, they have to get approved by the state. So like we live in Wisconsin. So they have if you are writing policies in Wisconsin, they have to get approved right from the state of Wisconsin. And so there's certain things they can't actually like the rate increase But do they have do they have to be transparent on what that model is like what is made of it? Yep, that all has to get approved by the state But do they have to communicate to the consumer? I don't believe they have to communicate It's so they have an agreement with the state that's all well and fine But who knows what they're using like if you don't know what they're using to judge you I guess, how do you affect that? But I guess that's the point, right? They don't want you to be able to manipulate that data. But yeah, that's an interesting little dilemma, but yeah, I don't you think you might adjust your behavior maybe if you did know, if they said, we're looking at X, Y, and Z and you're like, Ooh, I'm not so great at why I didn't even know that was a thing that they cared about. Maybe you'd be more careful as a driver. Or you'd be. I think bills, I don't know. I don't know what it is. If it's your credit score, I'm like, Oh man, like I don't care. Whatever it is. Yeah. And I think there are certain ways of doing that, right? There's some things that are just. Naturally used like across most states for their rates and like credit score for example I think was consistently used but that has which is also a major that has to be Credit score is like completely a made up thing. Like it. It's just it's a model or it's a Calculation based on a bunch of different variables then That is being used in another calculation. And it's completely made up to begin with, because if you have no debt, you have no credit score, it looks like you have bad credit, which is crazy. That doesn't make sense. I'm sorry. I didn't mean to go off. As you can see, Colleen's really. I, luckily I have just enough debt so that I have a very good credit score, but like when you think about the concept of it, it's a relatively new thing and it's a completely made up number, but to your, like Sal said, they are using this in other models and it's could skew other data and maybe that's not appropriate or maybe that's not morally or ethically right. And I do think though, like when they were using credit score, I thought I got a letter in the mail. I, and then that's something too, that they use if, when you apply for jobs. So to your point about the personality tests, like they look at your credit score when you apply for jobs and it's that's why I need you. So that I can have money, so I can pay my bills and get a better credit score. But it's just an example, I think of using and reusing this data over and over again. When the. Initial set of data may be skewed. Yeah. Or, it might be biased in a certain way. We don't know when this was created and how it's changed over time. Absolutely. I think I've seen in the past that like the credit score is actually biased towards like race. I'm sure. I'm sure it is. Yeah. And so like really looking at that and thinking through Hey, is this actually an ideal thing that we should be hiring people on based on this? Or giving them less insurance or more insurance? Or jobs, and then back to the example of the school applications, ACT and standardized testing has been showed to be skewed towards certain races. Yes. So to have that be part of the model that you're building all of this on, then your admissions program is also skewed. Plus it really does benefit. The wealthy, in that aspect is like people that can take advantage of those algorithms and knowing, Hey, this like that 25, 000, but that's just a small element of it. But if you know that an ACT score or SAT score is going to be heavily weighed, and you can pay for somebody for training and classes and all these things, it's in tutors for that, that are specifically meant to test. Take that test. You're going to have such a leg up on people that can't afford that. And I think that is a unfair kind of weight for a lot of people. But again, on the other side of that is. I don't have a solution to, to weed out kids from not, I'm not an educator. It's sad that it's been so many years. We still don't have any better suggestions for some of those things. But I think of myself as a kid, when I was in high school, I was working multiple jobs. I didn't have time to sit and study for my ACD, right? There's plenty of people who fall into that boat. Very hardworking people definitely deserve a chance. Definitely could make something of an opportunity of going to college. But if you don't have that means. Like you're just, you're just, you're stuck. You're just treading water. Yeah, absolutely. So let's cycle back a little bit to data manipulation and a little off topic. I'm sorry. We are very passionate about what we talk about. Yeah. But I want to bring us back. And one article that we read was about pharma data manipulation. And so there was a company, is it Novartisis Is that how we say that? Yep. Novartis was. Accused by like the FDA for manipulating, manipulating their data involving a 2. 1 million gene therapy. I don't even know how do you say that it's AVEX. Avis failed to report for their gene therapy treatment. Yeah. Which this is, it gets a little scary as like when they start to mislead or misrepresent clinical data and how they're taking in that data so that they can get approved. I think there's a couple stats out there is like for a drug to get approved by the FDA it costs a half a billion dollars for pharma companies to get there. And so like the amount of money that is at risk for. In this case, Novartisis, if they don't get it approved is high. So when there's higher risk and high dollar values, I think it opens the door a lot of times for, Hey, can we skew or mislead the data to better represent it, right? You don't want to get five miles down the road and get these results that are pointed to the fact that maybe you should start over, or maybe you should make tweaks to your product. Have to scrap all the money you've already spent to get it that far. So like when you're thinking through this, like it's so easy just to mislead the data, right? Yeah. How do we put some governance or kind of barriers around this so that we don't do this? Like we, I think one of the big things is like, why is it cost that much to get there, first of all? Yeah. Yeah. Are you talking internal or external governance? Oh, both. And so like true governance and federal level. But also like internally, like making sure that like the people that are regulating them internally are not also getting paid internally, yeah. Because the shareholders, this is where we have issues across companies as a whole. And you can think back to henron or for example, like how people are incentivized is how they're going to act, right? That's going to impact your psychology no matter who you are So that's something that's really tricky. And I think that's why You know, even when we have had like government regulations around things like that Then we have government agencies to represent and be in that place A different point of view paid by a different person. We've still seen areas where it's been an issue and I can think back to like thinking about like whistleblowers and stuff like that. When Boeing had such a huge issue and still are having, right. And yeah, there's still hard. Yeah. It's never ending issues there. But I know that there was a government agency available for them at that time to But they were in on it. Yeah. And so it gets so tricky about what people are incentivized and it's where are they going to get paid and where is that money going to come from? Is it's almost like who's watching the watchers. Yeah. Honestly, it's all about transparency, right? Yeah, for sure. If you knew that you were getting paid or if people knew that you were getting paid in a specific way, I think you're going to be less likely to take those risks of exposing yourself for most people. Yeah. especially In the pharma world or any of the healthcare world, like really having that regulation there because it's human lives that are at risk, you really have to have that. There's probably certain industries where it's a little more important than others. But where you draw that line and how you decide what's important and what's not is another question entirely. And so the falsified information in this particular case. happened in like the beginning right? When they were working with the mice? Yeah. That sounds solid to me. So because they falsified that information what happened then overall with this project? Did they completely shut it down or did they let it keep going? So I think it's still under investigation. So I guess we got to hold off a little bit, but they're still looking at it. FDA is still understanding and going through the process of investigating it. And I know I think there's 5 U. S. Senators that are also like have sent letters in to look through this or critique them. And so holding our judgment for that. But I think this is a broader topic of, Hey, when a company has the ability to mislead, like who is really responsible for them and who's governing that. And then what is the outcome from there? How do we make sure that people know if there's anything, obviously there's lawsuits that if someone gets hurt and stuff like that, but but it shouldn't have to get to that point. We should be checking that. The only thing I can think of is that, it hurts to get hit in your wallet, right? And some of these organizations, they have billions and billions of dollars, if you were to try and keep something like this from happening next time a big fine would probably be the only thing that you could impose that would really affect their behavior in any way yeah, I was just wondering if they would shut it all completely down this particular study. Yeah, I don't know. I guess it depends on what the potential profits would be Yeah, and I did see that there was a risk like, somebody was talking about the you know, now that they did this, is this going to risk gene therapy as a whole? Yeah. Are people going to be setting the entire, practice back? And again, I don't know enough on this study, or we don't know enough on this study to be like, Hey, they, they manipulate it in this way, or they completely tampered with it and it was dangerous, but I wonder if they can just redo that part of the study and kind of release it again. And then re correct everything. And then re correct, right? And I think that's a big part. Yeah, and just reiterating fortunately, nobody was hurt from this study. But it did, they did talk about with mishandling data, things that can happen are patients can get hurt. And if patients are hurt, that could result in the entire field being set back. And then another issue that they iterated was that. even if they're not harmed from the result of the falsified data, it's just generally setting the precedent for the industry and they were making the industry look bad as a whole. Going from pharma, we can talk back about the Colorado DNA manipulation. We briefly mentioned that last time. But now we wanted to dive into it a little bit more now that there's more information available and so in colorado one of the People who were working in their dna testing She had been working there for 30 years and in that time And only in the last 15 years of her career. They found that somebody had tampered with the dna testing results omitting some results for 652 cases And it was this woman who was then forced to retire. I believe her name is like Ivana Woods or something like that. Yeah. That many cases over that many years is so substantial. And then just think every one of those cases have to now go back to court, right? Go and get appealed. And most of them will get appealed and have to go and cost taxpayers so much more money. It's unbelievable how much this could impact Colorado and just that. So we had good news from Colorado and bad news from Colorado, right? I do think though that there should have been something in place to catch this. Like how does one person skew results for that many years? And I think to your point too, you had mentioned like this window of time that they're looking through these cases of hers, they actually are going back beyond that. Like prior to that as well, it was the kind of more recent cases that they started with and now that they've gotten through those are going backward. But really that, how do you let something like that happen? That's gotta be negligence at some point, right? That the entire organization didn't have something in place to peer review this or to catch this in some way. And it happened for 15 years of what they can prove so far. It could have been 30 years. She was also in like a management role. She might've even been a director and it, and that means to me that she was overseeing. However, any other people that were on her team and she might've been approving or something like, or signing off on other people's bad results. Cause when you think about some of this manipulation comes from cutting these corners and not doing the. Like amount of vetting and validation that you should be doing. So if she's training other people to do the same thing she's doing, this problem could be so much bigger than what it, and we thought training in AI is bad. This is so much worse. Yeah. Yeah. This is scary. And I couldn't even imagine being somebody who is impacted by that, and right. Could you imagine if you had a loved one in prison that was sent away because of DNA results that she said one thing or another. And Colleen, what if you were in prison for 15 years? What if I was in prison? Would you guys bring me little Debbie snacks? Yeah, it's just crazy to think about. And it's really sad, honestly, like how many people could have been impacted by this. Speaking of prison, Jeff Epstein. Oh, what a segue. Yeah. You like that segue? That was great. There was an article that recently came out about Jeff Epstein and a company that was tracking all of the data That anybody that Jeff Epstein or Jeffrey Epstein or whatever his name is. Sorry. I don't know him enough I'm glad that I don't know him well enough for that But article came out in wire that is Jeff Epstein's island visitors were tracked by data brokers and it's from near intelligence was the data broker and they actually track their cell phone location. So via Instagram or Facebook, or if they're going on Twitter, they're actually tracking those locations. So what they could see is every person that went to his island and then back and where they're coming from. So you had a full map of where they're coming from, who they've been with and everything they released it illegally or not illegally, depending on what you want to say to the public and the public can look at this information about that island. But it brings up a larger question is who's collecting our data. These data brokers, these third party data brokers. Should we be concerned that they're selling our data? And providing that data to other resources that we don't even know. And the answer is yes. And I think the important thing to look at here is in this instance, like it's pretty hands down, Jeff Jeffrey Epstein was not a good dude. I think we can all probably agree on that. But what if it comes down to the matter of somebody else just thinks somebody else was harmful, but they can track who had interactions with this person that they think is bad and that person maybe isn't bad and now you're associated with now you're like, and they can release all this information. Like in the article, it went on to say they could tell where these people lived because obviously their phone spends a lot of time there. But what if they released to the public your home information based on a whim that just some random person that works for one of these data brokers, because they think somebody you were in contact with was bad. Yes. Now you have protesters outside your house. I'm, I don't know. I think it's like a really scary path to go down. A little bit unrelated, but I'm thinking back to like, When my phone would send me alerts that I had a COVID contact or something. That's crazy too. Cause it's just probably uses much the same technology, right? Exactly. And it's just looking at all of your location history and other people's location histories. And then if they report and then. I guess in a certain way, that's good. Like when it comes to COVID, you'd probably want to know if you had some sort of contact. I'm just trying to look for a positive way. Yeah, we need to be more positive. In healthcare realm, it is good, but it's also like you have both sides of that. You have Oh, now they know where I am. You know that maybe I went to this festival, right? Or I the Summerfest or whatever. Yeah. Or I'm putting myself in these risks so that I have to then have that, right? Or now I'm gonna get charged more'cause I'm putting myself into these other risky situations. Or it could be the other way hey, now you're exposed to Covid, so make sure that you're monitoring it, which is a really good thing. And so both ends of these you have good and bad, but I want to be aware. Yeah. I want to know what you're tracking and how you're using it. If there's more transparency around the practice completely, then I think that is probably what all we can ask for. And twisting it again a little bit, cause I like to do that. What about if this information was sold to companies you were interviewing with I don't know what you guys do in your free time, but. I feel like that could get really tricky. I go to Jeff scenes. You might want to hire Sam. He's been to some sketchy places. Yeah, but it's just I feel like that's too much personal information for a company to have about me. Or even I think looking at our social media is also not appropriate for a company to do. I don't think that's the case. Like that's your personal time and like your personal time shouldn't have any impact on whether you get a job or not. I can think of like situations though from my real life where that did, I feel like it might have been justified. I'm gonna go on the side tangent you cut it out if you feel like I just went way off the rails. I worked for a long time at a law firm. I'm not going to say which one, but there was a person who worked in the mail room. And now years after I left this law firm, they found that this person was a white supremacist and they found out that he was employed under a name that was not his real name. And they were able to connect him to social media accounts where he was like sharing this white supremacist information. Yeah. So at some point, do you think it's, like you don't want somebody that's filled with hate, maybe working in your mail room or whatever it is that, at what point does, are your employees a representation of you as a company? And you don't want certain types to work for you. Obviously that's a very slippery slope. You don't want people going depending on the terror, if it's a terrorist group or I think they're categorized as a terrorist or a hate group, organization, hate organization. So that, there's maybe a certain point that you want some of that. And I don't know where is the line, right? Who defines a line? That's a really good point. If you could have saw my face when Colleen was telling that story, I was like, Oh, I should have never said that. It was one of those things. Somebody texted me and I was like, Oh my God. Like I knew this person. Like I had no idea. But this gets actually back to a couple of points. Again, I am 100% not saying that these that this is a good idea, this is a good idea. Or that these racist, groups that this guy was joining were but are good. But what if like back in the day they were right. I believe it's called like Marx, Marxism, or even communists. They had the communist parties, right? fbi, communists. Yeah. And then they, the people would show up at these communists. These communist events and stuff like that. And they were, but they were allowed to do it. They're American citizens. They have freedom to explore that as long as it's not, again, the terrorists or hate organization, but who's defining that. And then if you, maybe you were associated with a person there and now you're targeted. What if I'm on some watch list now? Because I worked with this person. Exactly. Or I talked to him. I went to dinner with him. And you didn't know all this and you just went to dinner and now you're associated based on your location or your social media stuff that you're now tied to a person that you didn't know that it was like that. And that was that they did in their private time, but you went to a work event at a dinner and no one knows the difference. I think ultimately my, my, down the line ideas about this is that you shouldn't be able, you shouldn't be judged on what you do in your private life. Yeah. I do think sometimes that your private life can blur into your work life in such a way that it's detrimental to that business, but I don't know that we really should have any laws or anything that allow companies to discriminate against you for beliefs that maybe they feel are not great That example I gave is obviously an extreme example, but I just wanted to throw it out. That's good to know. Like some themes cause I think we would all be in agreement. Yeah, I would not want to work with a person like that for sure. All right. I think that's a wrap on today's podcast. We've talked a lot about a lot and there's a lot to digest from this. Need to think about all of our life choices and where we go in with them. Yeah. So the purpose of this. discussion was really to just, help to make you aware of how you can look at data and what to look for to know if your, if the data you're looking at is accurately displayed and, make sure you're checking for your sources and look at the access and the titles and labels and really dive into it to make sure that it's a good source. Yeah, absolutely. Make sure you're gut checking. If you see a stat that's so outrageous that you're like, How is that even possible? Look into it further. Don't just take it as truth, right? Go out, redo additional research and see if that's actually true or if it, maybe it's just a wrong statistician putting out information. Yeah. And I would say be cautious of sources too. If you see some infographic or something in an article and they don't cite their source, I'd be really wary of whatever that is that they're telling you. So if you loved today's episode, make sure you subscribe to stay up to date on other topics related to data. We don't have our next time completely planned out, but we'll get that out. But Sal might not be with us. Yeah. Sal's got a baby on the way. Yeah. Yeah. I'm not dying. No, Sal will still be with us, but he may be taking a break for the next time we record an episode. We may do something girls only or something regarding data. Not sure yet on that, but yeah. So thank you again for listening to us and our rants. Until next time, let's keep calculating.