Debra Farber 00:00
Hello, I am Debra J. Farber. Welcome to The Shifting Privacy Left Podcast, where we talk about embedding privacy by design and default into the engineering function to prevent privacy harms to humans, and to prevent dystopia. Each week we'll bring you unique discussions with global privacy technologists and innovators working at the bleeding edge of privacy research and emerging technologies, standards, business models and ecosystems.
Debra Farber 00:27
On today's episode, we welcome Stephen Wilson, an accomplished innovator, researcher, analyst, and advisor in data protection, and one of "The World's Most Original Thinkers in Identity." Stephen has over 35 years of R&D experience in Australia and the United States, with 27 of those years dedicated to data protection and digital identity. Specifically, Stephen leads the digital safety and privacy efforts at Silicon Valley-based Constellation Research, where he covers data protection, digital identity, authentication technologies, blockchain and DLT technologies, and privacy engineering. Stephen is also the Managing Director of Lockstep Consulting where his work is centered at the heart of what we now call "verified information."
Debra Farber 01:13
In this episode, we discuss Stephen's assertion that privacy is about restraint; what you choose not to know about people; the trend of creating data sharing platforms to facilitate and scale global information value chains; how if data is like crude oil, then it requires safe handling, and why we should all treat data more like clean drinking water instead; the importance of data quality data originality and data lineage; Stephen's analysis of the growing market for "data protection-as-a-service," which includes: data clean rooms, privacy, API's, and more; and lastly, why you don't need to own your own data to get good privacy outcomes. Enjoy the episode.
Debra Farber 02:03
Hello, Steven, welcome.
Stephen Wilson 02:05
Hey, Debra, great to be with you. Good to see you again and thanks for having me.
Debra Farber 02:09
Oh, it's a pleasure. And just a note my listeners that you're calling from down under in Australia, where you're based. And I'd love to, you know, to hear a little bit about who you are and the work that you do at both Lockstep Consulting and Constellation Research.
Stephen Wilson 02:26
Fantastic. Thanks for that opportunity. So, I have congruent roles in those two organizations. Lockstep is my own business in Australia, and I've been doing that for 17 or 18 years. I got into what we call "digital identity" in 1995, as a R&D leader at a startup company doing PKI (public key infrastructure) which was all about what we thought at the time was identity, in things like smart cards, SIM cards and early technology like that. So that became a digital identity industry pretty quickly. We've been trying to figure out what you need to know about people, how to equip individuals to prove things about themselves online, and to allow businesses to know who they're dealing with - to prove those things that that are being presented. And that all adds up to this thing, which we call identity. And I'm being a little bit cagey about the term "identity," because I think we'll unpack this, Debra, as we go. But I think that the term is being reimagined. So look, I started in PKI, as I said, smart cards and e-signatures, e-signature law, at the dawn of e-commerce. So this is in the late 1990s. And if I could tell you the quick story about how I got into privacy, in the early noughties.
Debra Farber 02:30
Please.
Stephen Wilson 02:31
Look, I found an absolute privacy mess that an organization that I was helping. The mess resulted from a total breakdown, not just in communication, but of just imagination between lawyers and engineers working on privacy. There was just...they was speaking completely different languages. And I recognize that, you know, identity is what do you need to know about people to be able to deal with them, and that's transactional, you know, we're talking about online identity, and it is a bit transactional. It's only a snapshot, you know, of true human identity. So I want to acknowledge that. But if online identity is what you need to know about people, then privacy is what do you not need to know about people? You know, how do you exercise restraint? How do you choose not to know things about people and still be able to do business with them? So that's sort of my game. I've been an R&D leader, a consultant, and an independent researcher now.
Debra Farber 04:45
It's unusual to see privacy experts have a role of R&D leader. So can you just describe a little bit about what that means in the context of the type of work that you're doing?
Stephen Wilson 04:59
You bet. Good question because privacy has been dominated, I think, by lawyers and legal thinking. And that's fine. I mean, I like lawyers. I think that it's really important, but I think the privacy to me is about...well, let me let me just say, fundamentally, I think that privacy is about restraint. It's about what you not need to know about people, what you choose to not know about people, and still be able to deal with them. And that's sort of a funny orientation that brings privacy and security together because security is about "the need to know" and data minimization. And these are fundamentals about privacy, aren't they? Choose not to know things about people. So if you look at data privacy, and again, I think I want to pay respect to the bigger human rights issue in privacy and dignity and acknowledge that data privacy is like a slice of that. But it's an important slice because everything we do these days is online and digital and therefore about data.
Stephen Wilson 06:00
So data privacy is about information flows. It's about controlling information flow, minimizing information flow, being visible and transparent about how you use information, how you disclose information, and ultimately, how do you destroy personal information when you finish with it. So in those terms, it is about informatics, and that's why I think some R&D and some information science and information practice is really important because if you want to map out information flows in the business and know about the covert and subtle inside effective information flows, as well as the obvious stuff, then I think it does become a bit of an engineering problem. Now, this is where the sort of engineering in the privacy law can come together in a really constructive way. So certainly, ever since I found that mess at the client 20 years ago, I've been working to try and bring engineers and IT professionals and lawyers and policy people to the same table. And that's my orientation. That's how I come at this problem.
Debra Farber 07:09
I love it. I love it because we're not really used to hearing people talk about privacy as a restraint. We really hear about privacy as agency, and I know we're going to talk about control later, but our privacy as control over information, privacy as about respecting what people want to share. We typically don't hear about what do you not need to know about people, and I think that's a great segue to what we're going to talk about today, which is discussing trends regarding the sharing of data to facilitate and scale global information value chains. And so you've stated that there's a global push on multiple fronts to facilitate more sharing in the past and some of your publications. What organizations are pushing for this and why?
Stephen Wilson 08:02
Let me say, everybody.
Debra Farber 08:06
That's fair.
Stephen Wilson 08:06
There's some terrible analogies or contested analogies about data being the new crude oil, etc. I actually kind of liked that one because it speaks to the to the value of data and it speaks to the business of data, and I think that we should confront that sort of thing because a lot of the business of data, obviously, is terrible, and underhanded and asymmetrical and not doing people a service. And I get that. The thing about the crude oil analogy is that, you know, all that asymmetry and the robber barons taking advantage of people - that was the story of oil 100 years ago. So, it's interesting to see how we can try and do better on the next wave of business. I mean, what are we now the Industrial Revolution number four or something? And let's look at the previous industrial revolutions and see if we can do this one better.
Debra Farber 08:56
Yeah, and to that point, in talking about that data is crude oil analogy, which at times, I think has broken down as an analogy, but if we're going to say that data is crude oil, then we also have to talk about the toxicity of crude oil if it spills or, you know, it leaks, the risks around it, the the fact I think a lot of the big data economy proponents had really championed the "more data is better" because that's just more value that you can do something with in the future without thinking about how do we make sure that we have valuable data that isn't offset by the toxicity of the breaches, the spillage, the privacy harms caused to humans as a result.
Stephen Wilson 09:50
100%. 100%. There's a discussion there that we need to have about the quality of data as well as the quantity of data. So, I think the infonopolists (the people that are trying to get as much data about us as possible, and stockpiling it, and then mining it - data mining), it's all about quantity, and it's crude, and it's asymmetrical, and it's just unfair. We can, I think, start to align people around this because there should be a win-win in this. There should be a customer-centric business model or a user-centric business model that does deliver value for business and benefits and value for consumers. We need to start thinking about the quality of data, though.
Stephen Wilson 10:34
So there are some good tools for this like in security, we're familiar that security is normally thought of in three dimensions: confidentiality, integrity, and availability. And that's kind of a really sterile view of data security. It's important, but it misses out on things like, what was the user's buy-in to the data? So, can we add permissions to that? You know, we've got confidentiality, integrity, availability, can we add permissions or consent? Can we add things like jurisdiction, which is important in privacy? Can we add originality? I think the most important missing piece in all of this is originality. Like, is data about me that sloshing around? Did it really come from me? Is it Steve Wilson's data? And when data is presented, how are you sure that it's really me that's presenting the data and not either my delegate, which is cool, or an identity thief, which is way uncool. So originality of data.
Stephen Wilson 11:31
Now, I'm thinking about these different facets of data, because it's the way that we treat things like crude oil, or a less-contentious example, there might be like clean drinking water, I think that data needs to be safe, like drinking water, we need to be able to use data, just like we use water from a tap without thinking about it because we know that there are people out there that look after the safety of this stuff. You know, under the covers, there should be data pipes that are safe and clean to extend the metaphor, and predictable and well managed. And that's the sort of quite rich metaphor that I think is emerging. And when we can think about data in a really structured way and think about its qualities and its properties, then you can get some really good privacy outcomes because you get data minimization, you get consent built in, and you get transparency, because one of the really evil things is that people just don't know what's going on with their data. So, if we could be more transparent about that, then I think we're getting towards the maturity that we need in data protection.
Debra Farber 12:33
And that breeds trust, right? I mean, trust not so much in the system itself, and the underlying...but trust in the organization when you have the transparency and understand how the process is supposed to work and have some assurances. And so that to that point, you know, give us examples of organizations that are really pushing for data sharing.
Stephen Wilson 12:54
Yeah, well, we can start at the top, there's a really good publication that came out about a year ago from the World Bank, and it's called "Data for Better Lives." We could put a link in the show notes, I guess, Debra, or it's pretty easy thing to do. So data for better lives was a big document from the World Bank that that calls for a new contract, a new social contract for data. So it goes over some of the territory that you and I have just covered. How important is data to society and to individuals and to business and how do we have a better social contract that enables data to be reused in a measured and controlled way, in a symmetrical way to create economic and social value? I love that report. You know, it's real leadership.
Stephen Wilson 13:42
But I think a lot of listeners would also be familiar with Open Banking, which is a big push for the last three or four years out of the UK and Australia in particular, as well as the U.S. and Brazil. Some of those leading emerging economies are really strong and open banking as a way of lubricating the consumer involvement in financial affairs and financial products. So open banking is there, and look under the covers, there's a whole lot of interesting protocols that will allow people's data to be portable from one financial provider to another. Now, there's some interesting kind of subtext. Open banking has a bit of a laissez faire philosophy behind it that says that people will be able to get a better deal from service providers and banks if they use the data as a bit of a bargaining chip. And I think as soon as you do that, you are certainly buying into some asymmetries again in business. I mean, the thing about big businesses that they are more powerful than the little person, and as soon as we put information into that mix and make information like a resource or an asset, I'm absolutely aware that we need to be careful of the asymmetries. Especially information because, you know, data is complicated. Data science is complicated. I don't think many people really understand how data flows in the Internet. But you know, Facebook and Google, they understand. And these people make billions of dollars, like the richest people in the world in history, have made their billions from data. And it's kind of breathtaking.
Debra Farber 15:21
I guess that is true that the current richest people are probably the richest people in history. So at this point. I mean, it makes sense. I haven't thought in that context. Wow.
Stephen Wilson 15:29
And so, they're very, very clever. There's teams of 1000s of people working on data mining, and you know they're not using data to solve cancer. They're using data to find clever ways of selling advertising to people and monetizing that knowledge of the consumer and profiting from it in eye-watering ways. So there are some asymmetries there that we need to be careful of, and I know that acutely. And look, a final thing to say about data is a push for publicly-funded research data to be made available and shared more equitably. So, you know, when when the taxpayer is really funding university research and public science research and clinical trials, there's a really nice push worldwide to have for governments to make the outputs of that publicly-funded research available and shareable. And I think that that's a really important sort of policy lever that we've got now that's driving a lot of this stuff.
Debra Farber 16:30
Thanks for that. So, those are some of the global pushes for data sharing. What do we mean by an information value chain given that we're talking about trends regarding the sharing of data to facilitate and scale global information value chains? And what is needed for stakeholders to trust them?
Debra Farber 16:51
Yeah, great question. So information value chains sounds a little bit sort of cliquey, but it's the sort of structure that's been forming under our noses for decades, actually. And there's lots of models. There are reports and social science institute's that publish sort of diagrams and copyrighted models for how to visualize this thing. It's simply the case that data moves from stage to stage through the economy, if you like. Data will originate somewhere. It'll be collected somewhere, and then it will be processed and analyzed and disseminated. You can think about the mass media supply chain, where reporters and now citizen reporters, you know, peoples with with mobile phones in the street, are gathering facts and pushing them into aggregators and social media and big media, and obviously disrupting the whole journalism business model to a large extent. But that's an example of a value chain of data that's created and passed from hand to hand and value-added all the way along.
Debra Farber 18:00
Now, in things like medical data and population data, and things that are public assets, these things, of course, originate through surveys and censuses and research, and the data, again, it goes from stage to stage in a pretty orderly way. That it gets some statistically-analyzed. It gets interpreted. Data forms reports, and so you have that idea that data can get transformed into information and in turn, it can get transformed into knowledge. That's a pretty old idea: data, information knowledge. But what we're seeing in business and in government and in healthcare is a structured way in which data really starts to look like an asset or a resource, you know, it looks like a raw material that's being processed and value-added all the way along. And in this information age that we're currently in, the tools are getting more and more powerful. So there are analytics, there are syntheses of data that are utterly transforming data as it moves and making it more valuable and, I guess in the wrong hands also more risky because a lot of these algorithms that can work out health outcomes, for example, can also be used and weaponized against us to be working out things like your insurance risk as an individual or Lord knows what. So, the inferences - I think that that's the final step in these value chains - the inferences that are able to be drawn from data these days is what really, to me, characterizes the modern information supply chain.
Debra Farber 19:37
I don't think there's any other way to look at data and personal information other than the supply chain. Right or wrong, you know, data is business. Data is big business, and a lot of what energizes me at the moment, Debra, are new ways of being orderly and transparent about that, so that we get, you know, better outcomes and we bring this stuff out of the shadows. So much of this data processing has been has been done surreptitiously. I absolutely acknowledge the work of Shoshana Zubov on surveillance capitalism - amazing work exposing how social media in particular is really sort of a covert operation to monetize data. And I'm working hard to bring a lot of that stuff out of the shadows, I guess, for better privacy outcomes.
Debra Farber 20:27
Amazing. I love the work that you're doing. How do we ensure that these information value chains become and remain orderly, fair and transparent?
Stephen Wilson 20:39
Well, part of that is actually a technology story or even an engineering story, and I don't want to get too geeky or too difficult about that. But there are some really cool tools now from cryptography and information management and verifiable credentials that allow people to be much more purposeful in the way that they hold information about themselves and how they control the release of that sort of information. Now, I've been using the term "infostructure." Lately, it's, you know, it's infrastructure...
Stephen Wilson 21:12
I wish it was my term. It's not. The term was invented about 20 years ago. But infostructure is about how do you treat information as a resource or a material and how do you protect it? So, what are the rules and the technologies and the business models for protecting information as if it was electricity or clean drinking water? So infrastructure is part of the answer to your question. How do we make supply chains orderly? I think we adopt infostructure, and it's happening anyway. The other part of this, of course, is rules and regulation and just civilization. If data is so important, then we can't just have a wild west treatment of data or, back to the crude oil analogy, you can't have prospectors driving their equipment, you know, metaphorically through your private land and digging up oil behind your back or digging up data behind your back. Like we need to make that orderly and civilized. So, there are some...
Debra Farber 21:12
Ooh, I like that.
Debra Farber 22:13
and in front of my back. No, I'm just joking.
Stephen Wilson 22:15
Right. Yeah, do it in plain view, and do it with permission, and do it with sharing the profits. And I guess I just want to call our data protection and data privacy laws worldwide. We've got really good principles-based privacy legislation in about 130 countries around the world, and it's finally coming to the U.S. There are really good state statutes now, principles-based statutes and the second generation law in California, the CPRA (the California Privacy Regulation Act) and a number of other states are passing similar laws. And these are broad-based laws that seek to protect data. They seek to control the flow of data. They seek to make sure that people know what information has been collected about them. And this is sort of the modern approach, which is incomplete. There's some very good scholarly work that shows that privacy law is incomplete. And the whole concept of personal information should be extended. And I get that, but I also want to say that, I reckon 70% or 80% of the terrible things that you see on the Internet can be moderated and brought to heel by the laws that we already have - the data protection laws that we have. The GDPR is a popular example, but by no means is that the only standard. There's, like I said, 130 countries around the world with similar laws, and and they by and large function to try and rein in those excessive abuses of personal information. So, you know, what do we do about supply chains? We use infostructure to make them orderly and predictable, and we use, I think, the right measure of laws and regulations to make it all civilized.
Debra Farber 24:02
Thanks for that. So then, how can personal data be shared for use for these information value chains while remaining useful for analysis and maintaining the privacy of individuals? I know, that's a big question.
Stephen Wilson 24:19
Yeah, and I want to say something different rather than just repeat myself because a lot of this does have to do with them. Transparency and visibility of what's going on. Now, I think some people will actively participate in future. This may still take a generation or two. I see some consumers participating actively in their data and what happens to it and how do you benefit in the in the processing of that data? I think that that's probably still an edge case. I think asking people to participate in their data to that extent is a really big ask.
Debra Farber 24:55
Why is that a big ask? What do you need to surmount in order to make that workable? Where's the challenge?
Stephen Wilson 25:01
It's a big ask because a lot of it's intangible. It's a bit like healthcare, you can expose people for their health records, and you can get them to participate in their health care, but a lot of it gets technical and hard and a bit scary for some people. So, a lot of privacy and a lot of data is managed on our behalf by professionals and regulators. Similar to car safety. You know, at some level, we absolutely participate in car safety because we're supposed to be safe drivers, and we're supposed to follow the rules. But, the fact is that there are rules to follow, and so the community agrees that there's going to be rules for speed limits and car safety. There's going to be a whole lot of technical standards for car safety that nobody really understands. You know, I don't buy a car, knowing what the strength of the windscreen glasses or the emissions or anything. I just know that somebody's looking after that. So, I think data and privacy is similar. You know, privacy is an outcome of nice, limited, safe, proportional handling of data, and a lot of that handling does get very technical, and I think it needs to be regulated on our behalf. So, that's what I mean by consumers participating.
Stephen Wilson 26:16
Meanwhile, huge amounts of data about us is flowing behind our backs. It's it's been created behind our backs. It's used and filed away and analyzed. And, by the way, a lot of that is good. You know, if you think about health care, if you think about going to hospital, and the gigabytes of data that is generated at every hospitalization, and all of that stuff about me, if I've had a surgery, my surgeon, the anesthetist, the doctors, my referring doctor, the nurses, the insurance companies...all of these people are sharing data about me, let's say, behind my back, but with my welfare absolutely top of mind. So, I just make that point to say that a huge amount of data flows, we can expect mom and pop to see that, understand it, control it own it. It would be a nightmare, and I think a really civilized digital society has ways of making sure that data flows in in an orderly and fair way on our behalf.
Debra Farber 27:17
That makes sense. And you know, you kind of stress that maybe "safety" is a better umbrella term, rather than privacy making app safe, an experience safe to use. I really liked this distinction between privacy applying to an individual's rights, but then safety kind of applies to the use of personal data. You know, is it safe to use without infringing on a privacy right? And I know that Australia has appointed an e-Safety Commissioner, Julie Inman Grant, and she's an independent regulator focused on Online Safety. In fact, the eXtended Reality Safety Initiative (or XRSI), which is the topic of my very first episode of this podcast and I'm also the Program Lead for the Privacy and Safety Framework there - we named our risk framework for the XR industry, "The Privacy and Safety Framework" (or PSF), where we embed the requirements for the Commissioner's e-Safety Framework into that because I kind of agree with you. You know, we need - especially when it's coming to consumers or when we're talking about consumers - it kind of makes more sense because we're talking about, you know, moderation, psychological safety, whether or not someone is part of a vulnerable population, embedding diversity and inclusion requirements, like there's a lot more than just is the data that's attributed to a person secured, right? There's so much more that's involved. So, to that end, what is, in your opinion, the best regulatory approach to get practitioners to embed safety into their product and technology roadmaps?
Stephen Wilson 29:00
Debra, I love the work that you're doing with XRSI and I love the way that you've just called out the different sort of properties of data. It's fantastic. It's really, again, it's giving us new frameworks and standards to to have the conversation about data and the importance. I think the XR industry, the extended reality industry is...at the moment is dominated by thinking about augmented reality and Metaverse, and all of that's cool and very sexy. But I think there'll be a lot of mainstream extended reality as well, that is just...it's all about data sharing, isn't it? I mean, under the hood, this is all about massive amount of data about people that needs to be used and shared in a safe way. So, to think about this as safety I think is is a real breakthrough, and it brings a lot of people to the table.
Stephen Wilson 29:53
Julie Inman Grant is doing fantastic work in Australia as e-Safety Commissioner, and it's kind of nice to aknowledge, too, that she comes out of a social media background. She was very senior at Twitter and also at Microsoft, so, she's got that industry chops, which is making her extremely effective and extremely credible. She's almost like a model regulator because she is working to get joint outcomes. You know, win-win is a is a cliche, but she really working hard to get win-wins and succeeding. So it's the leading-edge of this balance of regulations that we need. Regulation is a bit of a dirty word and we often overregulate, don't we, and we want to find that we need to pull it back. And that's okay. I mean, that's sort of the story of the free market, I guess. We wind up with a regulatory equilibrium that is more or less right and also still allows for innovation.
Stephen Wilson 30:52
A lot of people in Silicon Valley worry that innovation will suffer at the hands of regulation because it sort of ties your hands, but I know plenty of people in the automotive industry, who will say that, you know, they're probably the most, or aviation. These are the most regulated industries in the world, but there's no shortage of regulation in automobiles and aircraft. So, I'm very positive about regulation. I think it just speaks to civilization and the right level of regulation is inevitable. It's super important that we do have a balanced view in privacy. Privacy-enhancing technologies, for example, have been with us for a long time - the idea that encryption and access control have the sort of the center point of privacy in some people's minds. But it's such a limited and very sterile view of the world. I firmly believe that privacy is more about what you don't do with data than what you do do with data. And if privacy-by-design means being very purposeful about the stuff that you don't need to know, then that sort of puts technology in the corner, you know? Technology is important and I love this infrastructure idea, but I do think that it's what you don't do with data and therefore it's more about policy and standards forum for limiting the use of data. And that will that will give us a nice civilized outcome, I reckon.
Debra Farber 32:18
Yeah. And then those are kind of serve as requirements for the lowest level of your infostructure as you're building from the bottom up, I think. Yeah, of what are you not collecting? What does not go over the network? You know, if you're thinking in terms of restraint. I really liked this idea and I'm going to play around more with it as I'm kind of threat modeling and thinking about privacy as a key aspect of product development. But let's turn our attention to the ways in which organizations are facilitating data sharing today and in the future. Can you describe for us some of the data sharing platforms and business models you've come across in your research?
Stephen Wilson 32:56
Yes. Well, at Constellation Research, we are developing a new category, which is probably going to turn out to be "data protection as a service." It is what it is on the box.
Debra Farber 33:06
That's cool.
Stephen Wilson 33:08
Thank you. It's a pretty broad category. There's a lot of ways into this and a lot of good component solutions out there. You know, there are new encryption algorithms, homomorphic encryption. So a quick detour, you know that we think about encryption of data at rest in databases, and we think about encryption in motion so that when data transfers over the web, you've got HTTPS and that sort of protocol. The new leading-edge idea is encryption in use. So, how do you protect data while it's actually being used and processed and analyzed, and keep it encrypted? Now, it's a bit of a mind blow, but there are some amazing algorithms, homomorphic encryption, and fully-homomorphic encryption. So what is that, "FHE" is the acronym to look out for. Now, it's very leading-edge, and I've got to say, I'm not an unqualified fan of fully-homomorphic because it's new, and it hasn't been totally shaken down yet, and there are compromises involved. So that's cool. We just need to be careful and keep studying it.
Stephen Wilson 34:13
There's also the concept of a data cleanroom. It's a little bit like a, like a mergers and acquisitions deal room that lawyers would be familiar with. You know, you might take a whole lot of data about two companies that are coming together and you put it physically into a room with a lock on the door, and you would have a log of the lawyers and advisors and accountants that are allowed to work on a deal. And they've got 14 days access to the data before it all gets shredded. And you know, that's how business works for literally for real world deals. Now, we can do that with technology to do it with data online, virtual cleanrooms on the Internet. And a number of vendors are providing these services now.
Stephen Wilson 34:57
Some vendors talk about privacy API's. So, that is an API, a programming interface where you could get a piece of software to make inquiries over the API into a database in a privacy-preserving way. So, the whole idea of zero knowledge proof comes in here. So, if I want to know the health status of a bunch of people in a community, and if that data is in a, in a cleanroom, then I could use a data privacy preserving API to make inquiries about the state of the data without any knowledge of who the data applies to. So, that stuff is all coming onto the market and I want to say that there's different...I don't want to use the word piecemeal, but they're certainly separate and independent pieces of this puzzle that are coming together under Data Protection as a Service. So much innovation in that area. It's great. So I have to be publishing some research in the new year out of Constellation Research on exactly that topic. Yeah.
Debra Farber 36:00
Excellent. Well, you can be sure that I'll definitely be reading that research.
Stephen Wilson 36:04
And look, let's kick it around as peers. You know, we like to have a peer community of research and advisors and reviewers, and I'd love to share that. There you go.
Debra Farber 36:13
Excellent. I'm in. Yeah, sure thing. So, we discussed how safety differs from privacy, and you've mentioned a little before about what privacy means to you, in this perspective of restraint, but as it relates to web3 goals like decentralization and data ownership, how would you unpack that a little more for us? What does privacy mean in that context?
Stephen Wilson 36:37
Yeah, such a good question. And such a, you know, I'm going to tread gingerly in this area, Debra, because because words are powerful.
Debra Farber 36:45
Indeed.
Stephen Wilson 36:46
We use metaphors to describe a lot of this stuff, and sometimes we don't even know that these things are metaphorical. My favorite example is digital identity. A lot of people feel that digital identities are like digitized identity; it's like my avatar, that is going to correspond to the real-world Stephen Wilson in some sort of version of me in cyberspace. And that's somebody's intuition. But most digital identity is actually much drier and much smaller than that. It's all about identifiers and customer reference numbers and stuff like that. And you can add all of that up just to synthesize a holistic identity of the person, but usually, what we're trying to do online is kind of transactional and more fine-grained. And that is good by the way. We've got to remember that keeping different identifiers for different tasks is privacy-preserving; it's good to keep all of your little activities siloed.
Debra Farber 37:45
Because then that way, you're able to, I guess, decide / choose what information about yourself that's attached to each of those identities would be shared?
Stephen Wilson 37:56
Yep. And you can decouple that and you can you can withdraw consent. You know, withdrawing consent is a really hard problem. The more identifiers we have, the easier it is to say to a database, "Hey, I want I want all of my social media stuff. When I've been tweeting about my health, I'd like that all taken out now, please." And you do need to keep things sort of threaded separately. So, that brings us to decentralization. Like what do we really mean? What gets decentralized? I think the decentralization is a pretty cool political slogan, and a lot of it comes out of the anger, that justifiable anger that we have about business and government excesses online. And it's it's easy to just blame government, and decentralized often means, you know, let's break down the establishment. Let's try and reform a fairer society that would be, you know, quote unquote "decentralized," insofar as it's not run by government anymore. And I sort of get that. I don't think it's going to be really easy, and I think that there's a middle ground.
Debra Farber 39:01
What does that middle ground look like, at a high-level?
Stephen Wilson 39:03
Sure. At a high level, I think it's got to do with being really careful about what pieces of data really matter about me and being able to control and protect those pieces of data. So you know, there's facts and figures about me that are super important. A lot of them come from authorities. My health identifies and my health records and my health status comes from health professionals, and the authority that information remains centralized. Like, I don't want to make up my own cardiology diagnosis. And certainly, if somebody wants to know my health, I can't trust Stephen Wilson to talk about my health. You know, if I change doctors, my new doctor needs to get a summary of my health status from another doctor. So you know, health information is centralized in terms of where it comes from and how do you trust it, but it should be decentralized insofar as people should have more control and morel say over how it gets ported from one provider to another. So, there's no simple sense of decentralization, so like a many splendored thing. And I think we're beginning to sort of break it down a bit.
Stephen Wilson 40:13
The other thing in web3, of course, is ownership, and, look, I'm not going to be too ginger about this actually. I might as well...
Debra Farber 40:19
Yeah, go for it.
Stephen Wilson 40:21
I might as well call it out. I think that ownership is a really cute metaphor. I like to think that I own my career and I own my health. So, I'm responsible for my health and if I ate too much, then that's on me. So yeah, look, I own my health, but we all know that a metaphor. So what does owning data mean? I think owning data can't be taken seriously and literally because data is sort of ephemeral. As I said, before, most of the data about me is actually created behind my back. Good or bad, mostly good. So how do I own that? If an algorithm works out the risk of me having diabetes, and then algorithm is super clever and it's been developed and patented by some data scientists, they in a commercial sense, have got every right to own the algorithm. And they might even think that they own the output to the algorithm, and I don't actually take a position on that. I don't, I don't want to argue one way or the other. I just want to say it's complicated. So ownership is not what it seems. The positive thing to say about ownership is that we don't need it. The really surprising thing is that you don't need to own data to still get really good privacy outcomes. So, I don't know whether we're gonna have time, but we could talk about some of the big cases out of Europe.
Debra Farber 41:39
Yeah, let's do it. First up, tell us about the Facebook Europe case regarding the collection and use of facial biometrics and the use of autotag technology without sufficient consent?
Stephen Wilson 41:50
Well, it's such a good story here about why principles-based privacy law is like a cyber superpower. You can get some very strong protections of people using laws that are 20 or 30 years old. So, what happened there is that the German regulator in about year 2010 found that Facebook's use of facial recognition, biometric facial matching, and especially their use of tech suggestions, was breaking really well-established privacy law. This is pre-GDPR. And the legal principle was facial recognition and facial matching takes an image in a reference image and delivers you an answer about, you know, is this Debra's face or not? And that answer is actually new information. And the matching has been done without your permission because it's all done by photos in the Facebook matrix. And the German regulators said, "Hey, the people whose photos are being matched, have no say in that. They have not consented to that, and you've got brand new information being being produced." The final clincher was just fascinating. Facebook, of course, had tag suggestions where on your feed a photo would come up, and it would have a helpful arrow saying, "Hey, we think that this is Steve Wilson. Can you please confirm if the tag is correct or not?" And people thought that was really cool, and it is kind of cool. You know, I've seen it my kids loved it. It's sort of, you know, it's participatory. You're really getting involved with that Facebook conversation. What they did was that they were gamifying the training of the of the algorithms.
Stephen Wilson 43:32
So, training a facial biometric algorithm is hard work, you know. You've got to give it a lot of data. And when it's done in the lab, it's expensive, but Facebook crowdsourced and they gamified the training of the algorithms. Now, I think it's sheer genius. I don't support it, but I acknowledge how clever that was. The regulators thought it was just beyond the pale, and they said, you just can't do that. So, the Germans ordered Facebook to stop doing tech suggestions and to destroy the biometric data. And Facebook, to be fair, did that without any protest, and they even went further: Facebook shut down tagged suggestions worldwide for years and years and years, because they realized that it was just radioactive. It's a very difficult thing. So to summarize, it's an old privacy principle about consent and also that information can be collected by an algorithm. It's not necessarily the stuff that you volunteer in a form. But when the algorithm generates information - it's called collection by creation. So the algorithm creates information and therefore collects it and the German regulators threw the book at Facebook, pardon the pun.
Debra Farber 44:45
Threw the book at Facebook. I love it. There's also an infamous Google Spain case, right, that originally granted the "right to be forgotten" or as also known as the "right to eraser." You know, what can we take away from that case? What lessons learned?
Stephen Wilson 45:01
I've written a lot about this, Debra. I think it's an amazing case because it deals with the intuitions that we have about Google and about the web and about Google Search or or web search. You know, this is not necessarily about Google. The case involved Google, but the outcome of the precedent is all about web search. And I think this is important because a lot of us have grown up with the web, and the www experience and what that means. Now, our intuition is that the web is like an enormous public library, and I think maybe that metaphor gets used, and web search is a way of indexing that colossal public record and putting it up on your screen or your browser. So, when Google Search was found to have invaded this guy's privacy, here's what the case was about in a nutshell. An individual in Spain objected to their name coming up in connection with I think it was a crime or a misdemeanor; it was a bad act. And this person's sort of history was forever tarnished because every time you did a name search, the first thing that you would get and the second and third thing you could get on Google Search was bad news, and it was associating this guy with a bad event. Now, it doesn't matter whether the bad event was true or not. It doesn't matter if it was slander or libel. None of those things were the principle.
Stephen Wilson 46:27
The principle was that whatever it was, it was a fact, it was in the public domain, and Google was just putting it up on the screen. So that was the intuition. A lot of people who were surprised by the Google right to be forgotten answer said, "Look, it doesn't make sense to me because Google Search is just reporting facts from the public domain." I look at it very differently. I think it's an illusion that Google Search or web search presents facts. It presents a very carefully curated list of matches against what Google thinks you're interested in. And that's the really key point that Google's algorithms are trying to work out what you're interested in through a whole lot of other clues so that they can firstly provide, you know, really interesting answers to your search and also so they can set up ads. So we've got to remember that this all is driven by adtech. The purpose of search is to serve up ads; and deeper than that, I think the sort of scientific purpose of the search engine is that it's an ongoing experiment in mind reading. And if you think I'm being a bit sort of fancy or decorative in my use of the language, think about the fact that your Google Search results are different on every computer if you use your phone or if you're on a web kiosk or using an office computer, the web results are always different. And it's because in the different context, it's using different history and different signals to predict what you wouldn't be interested in. So it's they're trying to read your mind, and they want to do that because they want to sell you ads.
Stephen Wilson 48:06
Now, therefore, it's not the case that Google search just provides objective facts from the public domain. It provides synthetic information. And it just looks like facts and figures, but it's actually not. The Google Search result is actually a little story that they have made up using their secret algorithms to try and address the need that you've expressed when you entered a search term. And that's what's really going on, and I think that the right to be forgotten case has exposed that because it said, "Look, what you've got here is complicated algorithms that are creating whole new stories, whole new pieces of personal information;" and under conventional law (and again this is not new law) the Google Spain result drew...was drawn on old statutes. And the statute simply said that if you produce personal information from people - fresh personal information - then you're responsible for the collection and use and lifecycle of that of that data. And I think it's fascinating that it exposes this idea that what looks like facts and figures, it looks like newspaper print, or it looks like a like a TV screen, doesn't it? But it's not just facts. It's it's synthesized facts, and we've got every right to want to have some control over that.
Debra Farber 49:22
That's an excellent point. I don't think I've ever heard anyone talk about search quite in that way - that it's like a synthesized story using secret algorithms.
Stephen Wilson 49:25
So clever. And look, I love it, and I couldn't live without it, but it is it's so clever. It looks like it's facts and figures, but it's really a very complicated story that they put together.
Debra Farber 49:43
That's a great point. And so speaking of story put together by big tech, and with Twitter's current takeover by Elon Musk, which I will call an implosion, and we won't discuss here because there's just not enough time, what are the expectations of privacy in a proverbial digital "town square." And for First Amendment purposes, I do not think Twitter is a "town square," but for the concept of this is where people are going to have public conversations, what are the expectations of privacy in the digital town square that is the Internet, or on Twitter, however you want to bound the question?
Stephen Wilson 50:22
Absolutely. It exposes some of these other intuitions about what is privacy and what is secrecy and confidentiality, for example. So privacy is a broad topic and confidentiality is just one part of it. Our intuitions about the Internet and the town square, really very counterintuitive. Or our intuitions are bad and they're false because we did not grow up on the Internet. And it doesn't matter how good web3 and web5 gets, we're still not living in the Internet. We're experiencing the Internet as if we're watching TV. I mean, let's be honest that the experience of the World Wide Web is very much like watching TV with 10 million channels.
Debra Farber 51:04
My browser tabs are all open, right? Unlike what TV, I could scale better in the browsers, by having so many open.
Stephen Wilson 51:15
106 open tabs, that idea. The funny thing about the town square is that it's actually a really cool secret place, you know, the old spy movie trope of meeting in the town square and having a having a discussion, because you're almost certainly not going to be overheard. Obviously, Twitter and nothing on the Internet is like that, especially because Twitter looks like it's a community, but it's of course an experience that's curated and provided and served up by software run now by Elon Musk. And Twitter really only exists so that it can "instrument" us as a verb. It instruments what we do. I mean, Facebook is even better example. Facebook provides games and marketplaces and communities and dialogues in order for Facebook to watch what you do and to monetize that. So, there is absolutely no secrecy going on there.
Stephen Wilson 52:12
Now, can you still have privacy in that circumstance? Yes, you can. You can, ironically, be very, very private in a legal sense in these mediums if those mediums following the regular law of being restrained and what they do with the data about you. Look, social media serving up ads, a lot of it's pretty good. And I'm not going to ban advertising in my perfect world. I'm just going to make it clearer to people that when you're on Facebook and you do things, and you express an interest in something, then you can expect to get an ad. Now, that's kind of obvious and I think a lot of people are now pretty down with that. They understand it. But there's a lot of much, much more subtle stuff going on. You know, the reason why Facebook does facial recognition so well - and they have spent billions of dollars in R&D and they've spent billions of dollars in buying companies - the reason why they do that is that facial recognition gives them a whole new instrument for watching what you do, because they've got what is it, I think it's 10 billion photographs of people in the public domain, doing stuff. I'm not even a Facebook member, but they probably know where I have breakfast, and when I have breakfast and who I have breakfast with, and so they know what I like even though I've never pressed like on anything in Facebook in my life. They know what I like, and that's the sort of thing that's very difficult to manage user consent, especially when I'm not even a member.
Stephen Wilson 53:40
So, I think that this town square thing is going to be exposed and remodeled. And we're going to use different metaphors to really talk about what's going on, especially when we get into web3 and 5 in the metaverse. The Metaverse is a lovely idea, and I think the work you're doing on trying to give it some structure and some standards is just top rate, Debra, but at this stage, the metaverse is it's like a three dimensional TV screen in some cases that is being served up by platforms, and the motivation of those platforms is entirely commercial.
Debra Farber 54:15
Make sense as to why there's a potential danger for the building of the systems we don't do right? We don't quite yet have the immersive experience, fully immersive experience that, quote unquote "Metaverse" is sold to us as, so yeah, I'm absolutely delighted to have the opportunity to put some guardrails in place before we do. And I love your description of the need for transparency and clarification and the need to gain trust from users so they understand what's happening and they can trust you do with their personal information. I know we could go on for hours and hours just geeking out to privacy and I absolutely plan to have you on a future episode interview, but for now, I want to thank you so much, Stephen, for joining us today on Shifting Privacy Left. Do you have any calls to action? How can people contact you? What do you want to say to everyone before we close up today?
Stephen Wilson 55:15
Thanks for having me. I'm delighted. You can follow me. I'm @Steve_Lockstep (on Twitter). My website is lockstep.com.au. I want to say to stay positive. I think that there's a whole new clarity that's emerging around privacy and data and the information economy. I think that we're working at privacy engineering. You know, that's not just privacy for engineers. It's about engineering complicated requirements, and privacy is one of those requirements. And how do we how do we strike the right balance the work of the regulator in Australia and elsewhere? I want to call that Michelle Dennedy, another really powerful person in privacy engineering.
Debra Farber 56:02
Love her!
Stephen Wilson 56:04
We all do. It's amazing work to just make this stuff tangible and let people own it. We all own privacy as as doers, and makers, and policy people, and lawyers, and engineers. We all own privacy, and now we can figure out why. In the past, it's just been the legal or the compliance responsibility, and we know that we've all got a role to play. So we can all own privacy, and I think that's reason to be cheerful.
Debra Farber 56:31
I agree. I'm optimistic about it as well. And it's why I named my podcast Shifting Privacy Left. I think the time is now when companies are finally realizing it's not just about a compliance paperchase, but how do you prevent the compliance problems - the privacy harms? How do you do the threat modeling? How do you, you know, build for trust. So, that's a really, really great point.
Debra Farber 56:55
Well, until next Tuesday, everyone, when we'll be back with engaging content from another great guest.
Stephen Wilson 57:01
Awesome. Thank you, Debra.
Debra Farber 57:05
Thanks for joining us this week on Shifting Privacy Left. Make sure to visit our website shiftingprivacyleft.com where you can subscribe to updates so you'll never miss a show. While you're at it, if you found this episode valuable, go ahead and share it with a friend. And if you're an engineer who cares passionately about privacy, check out Privado, the developer-friendly privacy platform and sponsor of this show. To learn more, go to Privado.ai. Be sure to tune in next Tuesday for a new episode. Bye for now.