From Startup to Exit

Gen AI Series: Ensuring Data Transparency for Gen AI models. A Conversation with Jai Jaisimha and Rob Eleveld, The Transparency Coalition

TiE Seattle Season 1 Episode 10

Send a text

Artificial intelligence has the potential to be a powerful tool for human progress if properly trained, harnessed, and deployed. Policies must be created to protect personal privacy, nurture innovation, and foster prosperity. The Transparency Coalition (https://www.transparencycoalition.ai/). a non-profit, is focused on creating tools, systems, and policies that hold AI developers and deployers accountable for the legal and ethical creation and operation of AI systems. Listen to the two founders of Transparency Coalition, Jai Jaishima and Rob Elevelde as they articulate the strategy they have deployed to hold developers like Open AI, Microsoft, Google and Meta accountable.

Jai Jaishima is a Seattle-based technology entrepreneur with expertise in AI enabled product development and data science. Jai has a Ph.D. in Electrical Engineering with a focus on AI/Image Processing/Information Retrieval from the University of Washington. Over a nearly 30 year career, Jai has founded or held leadership roles at four start-ups that leveraged AI/Machine Learning to build software applications. He was CTO at Hitch Works Inc., acquired by Service Now in 2022, which deployed AI/Large Language Models (LLMs) to help employees and their employers take a skills-based approach to work and career progression. Jai was also CEO and Founder at Appnique which developed privacy preserving advertising technology solutions using AI/LLMs. At Microsoft, AOL and RealNetworks, Jai led pioneering initiatives in the digital distribution of music and video content that also ensured that user privacy was respected and the rights of creators and copyright holders were protected. He has also worked for Medio and Amazon.

Rob Eleveld hails from Grand Rapids, Michigan. After college, he spent 5 years on active duty as a submarine officer in the United States Navy, serving aboard the fast attack submarine USS Batfish. Across a 25-year career in tech software and data companies, Rob is a 4-time CEO of early-to-mid stage companies in software and data beginning in 2000 with Vykor Inc., which he co-founded and led. Over the past 10 years at Whitepages Inc., he served in executive roles and eventually became CEO in 2016. Rob led the spinoff of the enterprise unit, rebranded Ekata Inc. in 2019, which Mastercard acquired in 2021. Rob has a BA in Engineering Sciences from Dartmouth College and an MBA as well as an MS in Manufacturing Systems Engineering from Stanford University.

Brought to you by TiE Seattle
Hosts: Shirish Nadkarni and Gowri Shankar
Producers: Minee Verma and Eesha Jain
YouTube Channel: https://www.youtube.com/@fromstartuptoexitpodcast

SPEAKER_00:

Welcome to the Startup to Exit podcast, where we will bring you world-class entrepreneurs and VCs to share their hard-earned success stories and secrets. This podcast has been brought to you by Thai Seattle. Thai is a global nonprofit that focuses on fostering entrepreneurship. We encourage you to become a Thai member so you can gain access to these great programs. To become a member, please visit www.seattle.tai.org.

SPEAKER_03:

Another episode of From Startup to Exit. Um very exciting uh guest today on our generative AI series. Uh very important that we all understand our uh uh guests today, what they what they are doing to make this uh real, I should say. Um but first and foremost, my name is Gauri Shankar. Thank you all for subscribing and supporting our podcast. It's now available on every platform that podcasts are available. Uh this podcast is brought uh brought to you by Thai Seattle, uh, where we focus on fostering entrepreneurism. And uh uh the co my co-host, uh uh Sherish Natkarni, a serial entrepreneur uh and a serial author. I'll get to his books in a minute. But uh Thai Seattle is being very instrumental in us uh getting to where we are as uh uh podcasters, both of us. So we thank them a lot. Shirish has written two books. Uh the first book from Startup to Exit. Uh we took the title and shamelessly named our podcast the same. And uh his second book, Winner Take All, uh about marketplaces, is also published now. Uh it's uh my honor to be uh present uh this uh Gender of AI podcast and I'll uh I hand it over to Sharish to introduce our guest. Sharish.

SPEAKER_04:

Thank you, Gauri. Uh so welcome, Jay and Rob. Um great to have you on our program.

SPEAKER_05:

Thank you. Thank you for having us. Happy to be here. Great.

SPEAKER_04:

So maybe we can start by having you guys um tell us a little bit about yourselves. Uh Jay, you want to get started?

SPEAKER_05:

Sure. Uh yeah, I've uh known both Gary and Shish for a long time. Uh, been in the Seattle uh ecosystem uh for well over three decades. I started my career in Seattle as a PhD student at the University of Washington. Uh took my first AI class in the late 80s and was a researcher in the AI space in the early 90s, and then uh uh been uh both in and out of big and small companies, starting um started a few uh and also worked at um Microsoft, Amazon, and AOL along the way, as well as at Real Networks. Uh more recently I've been uh really interested in natural language modeling and how it applies to uh B2B applications. So I've uh been involved in uh three different B2B um uh ventures over the last uh you know 15 years or so uh that use natural modeling or natural language processing uh and uh what you might call AI uh as part of their uh underlying technology. So really excited to be here. Uh been out of the for-profit world for about a year now and uh uh excited to be working with Bob on this uh venture.

SPEAKER_02:

Great Rob. Hi, thanks a lot, Sharish and Gallerie, for having uh Jay and I. Um I've got uh quite quite a bit in common with Jay, and we've known each other for 15 or more years socially. Um quick background uh I'm an engineering major undergrad. I went into the Navy's nuclear power program. I was a submarine officer for five years on active duty after college, then went back to grad school. Um went into enterprise sales for a couple of years in the late 90s, and then I've been a four-time CEO since then. All of small, uh smaller, small to medium tech companies kind of I've co-founded one, I've been the number two at another. Um companies I've run are kind of between eight and two hundred and fifty. I've had ups and downs along the way. Uh ran one that I co-founded for seven years and shut it down, just didn't work. Um and uh twice, third one I was uh asked to come in from the board for a turnaround and didn't get it all the way turnaround, so I've shut down two of them, stood in front of 20 some odd employees each time and laid everybody off. So I've had some bumps along the way that probably many people listening to this podcast can relate to. Uh the last one uh I ran for eight years independently was acquired by MasterCard. I helped integrate a MasterCard for two years. It was called a Kata. It was in uh identity verification and online fraud detection. MasterCard acquired that in mid-2021 for$850 million during the pandemic. And um uh I rolled off of my two-year requirement last summer, and then Jay and I kind of started working on what has become Transparency Coalition.

SPEAKER_04:

Great. So um uh thank you both. Uh can you tell us more about uh how you guys got started on the transparency coalition? How did the idea come about and what made you decide to start it?

SPEAKER_02:

Jay, why don't you take the lead here?

SPEAKER_05:

Sure. Um so I think the our main interest was actually in the uh rapidity and uh from our perspective, uh recklessness with which uh you know generative AI was being uh rolled out. Uh and so we were looking at kind of some of the early harms that people were starting to see, whether there'd be hallucinations or deep faiths. And we were trying to get to uh, you know, and there are certainly a lot of efforts being made to regulate risk or regulate uh har uh you know outputs, uh, but we realized very quickly that uh inputs uh to the model, uh specifically training data, are a big piece of uh where all these harms are rooted. Certainly uh confabulations or hallucinations as they're sometimes known, uh as well as uh the ability to uh you know imitate people's uh likenesses in a you know deep and immersive fashion, are all rooted in how training data is ingested and uh used to train the models. So we are uh uh so that that's kind of some motivation since training data is appears to be at the root of many of the issues that people are concerned about, not all the issues, but many. Uh and so we decided to focus on asking for uh increased transparency. So the idea here is that if you're a model developer, uh we would like to make sure that uh uh you are forced to empty your pockets about the sources of the training data you've used, uh, as well as give uh people uh visibility into how you got that data. You know, was it public domain? Was it something that you obtained opt-in permission for? Was it your own, you know, uh in internally owned data? Which what what what were the data sources? How are they licensed? Uh and that uh that then also allows uh people to potentially ask that their data be removed or verify their data is included. So that that is kind of the genesis of our focus is uh um is in this area of data transparency and painting data transparency.

SPEAKER_04:

I understand that uh you guys literally had a five-sack chat uh while you were camping uh when you came up with this idea and decided to work together. Is that right?

SPEAKER_05:

Rob, do you want to take that?

SPEAKER_04:

Yeah.

SPEAKER_02:

Well, you know, I had been stewing on this for a while. Um, and Jay and I got to talking about it on an annual camping trip that we often participate on with our with our families, and uh and you know, Jay kind of shot me an email a week later and said, Hey, let's sit down and talk about that some more. So, but yeah, I was uh I was concerned enough in maybe June of last summer to be kind of noodling on it quite a bit, and I got talking about it with Jay literally around a campfire um one night, and a lot of topics come up, you might guess, but that was it. I'll add a little bit more to sort of um uh Jay's context. One is um, you know, we we are we are taking uh um an approach that these technologies that touch so much of our lives need to have some guardrails and regulation in place. And um, you know, I've got uh uh three kids that are you know 18, 20, and 22. I think anybody that's had kids in the last 15 years, this notion that uh elected officials and policymakers should sort of sit on their hands and hope it goes well, that experiment with social media failed, in my opinion. Um and you know, we've got 41 attorneys general suing Meta right now. Um it's hard to get 41 attorneys general to agree on anything in this country right now, but boy, somehow Mark Zuckerberg and Meta did it. So um uh, you know, testimony in Congress and many other things that all the listeners out here know about. Um so we we feel like the the there needs to be policy in place. Um you all many of you all, I was in payments for a while. That was what you know the the broad ecosystem that the Kata, the company I ran for that that that MasterCard acquired, um operated in. You know, financial services touches a lot of our lives in many ways, and it's regulated. Why? Because there's a lot of people that'll do bad things and uh without regulation. And uh and and and some people commit a lot of fraud and need to go to jail. And uh there's so much less regulation in tech relative to financial services, but it's touching our lives in many ways more than financial services. So just as a sort of broad approach there. Um then the last thing I'll say is we decided to take a state-focused approach, not a federal approach. The states tend to move faster, um, they iterate more, and especially California, but other states as well, they trade notes. And um, once one state passes something, oftentimes uh a number of other state legislatures in the next session can kind of do a copy-paste. And whether it's the the mileage standards that were stricter than the EPA that California put in place 2025 years ago, or more recently the California Consumer Protection Act, we now have 17 states with some level of privacy act, many modeled it's uh off of initially off of the California Consumer Protection Act. So we've focused first and foremost on states, Washington where we where we reside, and in California in the 2024 legislative session, we are focused on getting to eight or nine states in uh 2025 with uh both a mix of red and blue states uh involved because it's a bipartisan issue.

SPEAKER_04:

That's interesting. Um so you think that uh the at the federal level they are not moving as uh quickly? Because I mean it's a pain uh for companies, uh you know, having done startups myself, like you guys, uh it's a pain to track regulation in different you know states and ensure that you're compatible with all of those requirements. Why not pursue uh regulation at a federal level?

SPEAKER_02:

Um I'll I'll take a first pass here, Jay, and then you can add on. Um a couple things. One is uh we we actually have more organically, and especially based on some uh um relationships Jay's built. We've probably briefed 25 federal staffers, uh congressional staffers at this point, uh, most of them in the Senate, a few of them in the House. Um and there is some bipartisan work. There's sort of two things in DC right now from our ear to the ground there that are getting bipartisan attention. One is just China broadly, and the second is AI. Um that said, as we all know, there's a lot of dysfunction in Washington, and it's really challenging to get things done. Um uh there is an AI bill that's been unveiled uh there uh that's got some bipartisan support in the Senate. It's it's very much of a kind of first step, put in place some tracking and some standards and and and metrics uh development and and and so forth type of bill as opposed to a lot of regulation and enforcement. Even that um will may be difficult to get across the line. Um for the first time, there's a a bipartisan privacy bill, the uh American Privacy Rights Act um that has maybe a chance. Um but one of the things that the states do, one of the reasons the APRA or the Privacy Rights, American Privacy Rights Act is getting a push right now is exactly what you said, Sharish, which is if enough states, if eight or ten states put in place laws, even the industry would rather have a federal standard um than a uh a patchwork of state laws. So the states making movement on things in the shorter term puts kind of back pressure on the U.S. Congress to do something. So that's kind of part of the thought process here. Got it.

SPEAKER_04:

And I guess uh California is leader in these kinds of uh legislations, and a lot of uh states follow the lead of California. Yeah, exactly.

SPEAKER_05:

I think one of the one of the things that's interesting about California, which you know we've is just the difference in the legislative sessions, how long they last, uh, you know, how um well staffed their uh state uh legislative aides are. Like for example, in Washington, you know, we had a two and a half month legislative session. Most legislators only have one staffer. Uh whereas in California it typically uh uh people have multiple staffers who have tenure uh and you know who are well qualified to to partner with their uh legislators. Uh and so we end up uh you know they're able to take a more substantive look at things. In our experience, many of the California-based uh legislators also have uh tech background. You know, they come former entrepreneurs or entrepreneurs uh are you know connected by family to entrepreneurs, uh, and so they have a more than a surface understanding of the issues. Uh so that's that's something that's also super helpful.

SPEAKER_04:

Got it.

unknown:

Okay.

SPEAKER_04:

So let's talk about uh where all these uh models are getting their uh data. Uh can you talk a little bit more about um you know what has been disclosed so far by like Facebook or OpenAI? Where where are they getting their data from? Are they uh in the case of OpenAI, are they primarily skating the web and Facebook they're you know taking user data? You know, what can you tell us more about where the data is coming from?

SPEAKER_05:

I think the uh the one one thing that uh most, if not all, commercial models do is uh Facebook is a bit is a little bit different, at least partially, but uh uh particularly you know OpenAI and others generally uh take the view that if it's publicly available, uh it can be uh ingested for AI training. Uh that's of course a very broad tent because if someone were to take your personal data and without your permission put it up on a website, uh you might uh not have a say on it, but they would say, Oh, well, it's publicly available, so I can ingest it. So so there are sort of multiple sources of content that people have proven exists, you know, have been mostly it's not driven by disclosure by the model developers, most more by researchers finding the content, or in some cases copyright owners being able to prove that their content was ingested. So it's either uh um publicly available web data, which includes, by the way, uh images, video, personal, uh potentially uh also includes personal information, uh content from pretty much any website you can think of. Uh it includes um recently we found out uh scraped YouTube video and the and the audio from YouTube videos. Uh it includes uh books. Uh so there's uh databases are originally created for research purposes, uh, you know. So uh for example, in the you know, copyright law, if you're doing research on, for example, natural language models and you wanted an access to a large corpus of natural language data, it would be legal for you to scan a bunch of books and do that do that work, right? But if you're trying to build a commercial product, that is obviously uh potentially not uh kosher. It's being you know uh uh decided in the courts or potentially uh you know with a future revision of the law, but at the moment the practice of taking copyrighted content such as books in is also uh you know is also problematic. So it's personal information, web information of all sorts, image, video, uh, and copyrighted content, including um including you know uh books. And then uh you know the New York Times was able to show evidence of people circumventing paywalls uh to go you know download content. So it's uh so there's definitely uh quite a lot of uh you know quite a quite a few data collection practices. Now if you're if you're uh happen to have the world's largest social graph, uh which you know one of the model developers does, uh you also then have access to that information, right? You have access to uh you, your connections, their habits, uh, you know, uh other information that you may post on social media, uh, and and so that information is also being ingested into models. In the case of Meta, they they have kind of a two-fold approach. They have an internal model that's based on an open source platform. They do not, however, make uh the model weight or training data open source. All the information about a commercially deployed version of uh Meta's AI is uh you know its own. Uh they deploy the model source code, but not the model weights or the training data.

SPEAKER_04:

Got it. And does uh uh does Facebook in their uh uh privacy policy or terms of service, uh do they explicitly ask for permission to use uh my posts and all that uh to AI? Is there is there provision? Have you uh looked at what exactly they uh asked for permission?

SPEAKER_05:

So they do not ask what I my general observation has been, and this has been by studying like iterations of privacy policy, right, uh uh from not just Facebook, but also from Google and others. The general approach is revise the privacy policy quietly without uh you know, without notifying people, or they'll notify you of a change. Uh and typically the change will be uh such that it allows them to use data for purposes other than what you gave it uh to uh to them for to begin with. And so the uh like the APRA, for example, includes a very important provision called data minimization, where uh they are asking that if you collect data for uh X purpose, you should only use it for X purpose. You can't now start saying, Oh, sorry, I changed my mind, I'm gonna use it for AI training as well. So that's that's an example of something that is not uh that is permissible because it's not outlawed, uh, but uh you know certainly doesn't meet the spirit of what you and I thought of when we signed up for Facebook account.

SPEAKER_02:

So yeah, I'll I'll just add a just a couple of of uh uh additional elements here. You know, uh uh the uh a recent article indicated that um and business insider indicated that um uh there's an authors guild lawsuit against uh open AI. OpenAI um destroyed two large training sets, basically destroyed evidence. One was called Books 2 and one was called Books 3 with hundreds of thousands of copyrighted books in them. Um uh just recently uh it was reported that um it uh uh uh the the report I read was it was in Microsoft, was uh sorry, was um the New York Times that uh um OpenAI at their COO level or the president level um uh launched a text to sorry uh uh uh audio to text um converter for all YouTube um videos to feed into their model against YouTube's terms of service, Google decided not to prosecute it because they decided they wanted to violate their own terms of service to uh use in their training models. You can't mix this up, you know. So um so anybody who's ever posted a YouTube video who thought that they were doing it for their own benefit, whether it was copyrighted or their own content or their own creativity or whatever, it's gone. It's being loaded into models now. So um, this is the type of thing that's going on. The other thing that's important is you know, we used to be spend a lot of time on what's called Personally identifiable information, which you know, Gabri is your name, address, phone, email type of things. Um, that's about 1% or a fraction of 1% of what we're talking about in personal information right now because if you use Google Maps, Google Maps knows everywhere you've been every six minutes. Um, if you uh, you know, so um if you um are getting cancer treatments and go to a cancer center, Google knows that it's being loaded into an AI model. That is not protected under HIPAA. Uh HIPAA uh is is um for hospitals and healthcare organizations. They're constrained on providing your healthcare data. But if you uh went to uh a mental institution for or uh for depression or whatever, or your kids, you're visiting your kids there, you know, Google Maps knows. Um and uh you combine that with your uh who your friends are on a social media app or your contacts, uh where you sleep at night, which is you know where a lot of your six minutes are spent that that are being tracked. You know, there's no anonymization. Very quickly, it's gowery, and uh all of your health information is available and so on and so forth. So these are pretty problematic things relative to what anybody intended them to be used for, and uh and and and and very little of the public knows it. Our broad focus is if you're using someone's copyright or IP, you should license it. And you know, I've got a lot of experience running businesses to license content and IP. Jay is an adjunct professor at the University of Washington. You go up the street from the electrical engineering department where he works to the law school, there are combs of books on copyright and IP. There's 150 years of business history there. Somehow it doesn't apply to Sam Altman. That's absurd. That's a joke.

SPEAKER_04:

So but in some respects, isn't the cat out of the bag in that you know OpenAI and Facebook and others have already trained their models, uh, spending literally tens of tens of millions of hundreds of millions of dollars to do that?

SPEAKER_05:

Uh really good to uh Yeah, I think one of the things to remember I think one of the things to remember is that when people say, Oh, I've already trained my model, that's uh you know smells like you know some particular type of BS to me. Because the reality is every time you improve your model, you strain it from scratch. You cannot build a new model. So every time they say I'm working on chat GPT 4 a little oh or chat GPT 4.5, they have to start training from scratch. So every time uh the architecture is changed drastically, whether it's the context window, which is the number of words it analyzes, or the parameter weight, or the other changes, some fundamental changes that happen, they have to take the data that they have and process it again. So the the so they definitely have the opportunity to uh pay attention to it. Uh now the cat the cat may be out of the bag in terms of like uh um inflated expectations that they may have passed on to the street, uh, but but it's not it's definitely not out of the bag. And in fact, I think yesterday the CEO of uh the Annan Institute of AI was quoted as saying that you know that's the biggest issue with AI is the speed at which it is just being rolled out without you know proper considerations on safety and reliability. So the so I think there's this whole uh mad rush to uh you know uh if if you thought your children or my children were you know worried about FOMO, uh wait till you meet the start, you know, the CEOs of these big four companies. That they are FOMO is is is the biggest thing that's driving it, and nobody wants to be second. Uh so um you know uh inflated promises are made to you know about product expectations uh and uh you know consumer safety and privacy be damned in the process, you know. That's that's uh something that is certainly a uh a result of this rush to market. Uh so there uh I think the uh because it's being rushed to market, you know, you s you see frequent reports of issues. Uh for example, when Google rolled out and then rolled back, uh, you know, AI and search, uh that that was an example of something where the technology is not as reliable as people make it out to be. Right. And so even for a company, the scale of Google with the amount of expertise it has, you know, uh uh to be able to put something like that. That uh I have a professor uh I sometimes collaborate with at the University of Washington. Uh her name is Emily Benter, and her and what she calls them are stochastic parrots. It's like a parrot you've trained, except you don't know what it's gonna do, whether it's gonna swear at you or it's gonna say, I love you, you have no idea, right? And so I think uh it just depends on the mood of the parrot. And so you could ask the parrot the same thing, and it'll say a different thing every time, and to some extent, these models uh you do behave in that fashion, you know. So that's something that uh uh is something that you just have to keep in mind when when you're thinking about this technology, thinking about how you roll it out, um, and thinking about how to use the technology as it is today, you know.

SPEAKER_04:

So all right, uh over to you, Gary.

SPEAKER_03:

Hi, hey, thanks, Shirish. First of all, uh I want you to gentlemen to be honest and tell me the libation and you had at that camp because it's so good.

SPEAKER_02:

We might have had a beer or two, Gary.

SPEAKER_03:

So here you are, you went on a camping trip and then woke up and said, nah, let's solve world hunger. Nah, that's too small.

SPEAKER_01:

Go for it.

SPEAKER_03:

Uh first congrats on the uh on the launch. And for why did you set it up as a not-for-profit as opposed to uh a lobbyist who are all profitable? I mean, they you know what what's the thesis behind the not-for-profit approach?

SPEAKER_02:

Yeah, let me let me take that one and then they can add in. Uh so um for first of all, uh after I was wrapping up this sort of two-year uh hitch at at MasterCard, I just decided I'd I'd been a CEO four times. I, you know, you never say never, but I really wasn't looking to get involved in another for-profit organization. I was really looking to try and give something back somehow. And I believe in service, and I'd done that early in my career in the Navy, and and uh uh this just sort of came up. Um, but what specifically came up in in this um um uh uh in in the way we formed it was I initially looked for a board to join or someplace to help. I've got enough knowledge. We we did a lot of machine learning, which is kind of 80%. I think all your listeners know 80% of what's actually deployed out there today is is really machine learning AI, right? Predictive AI based on scores and so forth. I had enough knowledge to uh kind of think about what's going on. And I looked around to help, and and the challenge becomes very quickly. Um, you know, Jay and I now think about it as an AI policy marketplace, because everybody, all your listeners understand markets, and uh there's always a demand and a supply side of the market. So if you think about it, uh a demand and supply side of the AI policy market, there's a lot of demand out there. There's copyright holders and IP holders, there's parents, there's a bunch of research organizations. Jay mentioned Emily Bender, he's part of, he he participates in the uh RAISE UW, which is responsible AI systems engineering. There's one at Stanford, there's one at Cal, there's one at Columbia. Um there's labor is a part of the demand side. A lot of different organizations, people writing op-eds and so forth, that are that are concerned. On the supply side, actually supplying AI policy or enforcing it, there is incredibly constrained time problems there. Um most legislators, as Jay said, 40 out of our 50 state legislatures, uh, you've got a you've got an elected official with one staffer who's basically a scheduler. Um they don't have time to chase, uh, and they are dealing with a really wide range of issues. J and Raddon meeting with people in Olympia here in Washington's capital, and uh they're they deal with really tangible issues that people are that run into every day, and they don't have time to spend a huge amount of effort on a fast-moving target like AI. And so in between the demand and supply side, there is this real gap of um basic education of policymakers, uh, whether they're staffers and elected officials, or whether they're attorneys general or or or or agency officials or so forth, just basic education of what AI is, how it works, what the inputs are. Um, and that's a one-on-one or one-on-a-couple people type of interaction, whether it's on Zoom or whether it's in person. You can't do it through Facebook posts or buying Google AdWords. And then the second thing is there's a real effort toward, there's there's a real need for actual language that can go into bills. And um, and so you know, um we're supporting a couple of different bills down in California right now. I testified, Jay's probably gonna testify here shortly. Half of language in one of the bills we're supporting is really Jay's language, and um and we've refined that over time, but if we have time to focus on it, most of the state legislatures do not. And so we started a nonprofit, and we actually started what's called a 501c4, which I didn't even know the difference, but a public charity is a 501c3, and you get a tax write-off to donate to it. A 501c4 has capability and flexibility to lobby, but there's no tax write-off. We now actually have a 501c4 and a 501c3 operating in parallel because our educational activities, our marketing activities, if you look at our website at transparencycoalition.ai, we have our own staff writers. We're we're we're we're we're we're we're trying to provide um uh as well as is is call attention to certain um publications and articles. We're we're writing some of our own. Um, all of that can be 501c3 focused um uh work, but our actual involvement in state uh uh with state legislative uh uh folks and um and uh lobbyists or consultants in various states, that's a 501c4 activity. So that's where the nonprofit kind of came from and what we're trying to get done that we saw as a is a gap in the market.

SPEAKER_05:

I think just to add just to add to that, uh I mean I think it also because uh typically anybody with any sort of uh engineering experience or uh product experience that talks to them is trying to sell something, right? You know, and so the fact that uh we are a nonprofit, that we are not corporate funded, that we are not doing the work for hire, uh usually improves our reception. So we get meetings, and you know, as as when I was a startup uh CEO, I would have killed for the kind of uh you know how easy it is to get meetings sometimes for us, or easier than I would expect, right? Because people are really hungry for uh impartial or uh uh just no BS advice on this uh on this topic. And so uh that that that is certainly improves our effectiveness. Whereas if we were seen as you know trying to wrap wrapping some on this specific agenda, that would not be the case.

SPEAKER_02:

So yeah, and let me ask uh and as mentioned one more thing with relation to your question, Gowry. We do think there is a place for for-profit organizations in what we're trying to get done, namely um platforms that um manage uh training data um and um and and um make it uh audit capable. Um, you know, Jay and I have operated in environments where SOC II is a requirement for um um cybersecurity. We we we believe that there will be some kind of uh third-party audit mechanism that people will want, people and companies will want to have done to sort of uh uh uh verify the efficacy of the training data and also that you know it was obtained in a legal manner. Um and and so um those are the types of things where we see a for-profit opportunity, and Jay spent some of his early years in um music streaming and um in digital rights management, and there's a I think there's a big opportunity for that here, and hopefully some of your listeners are working on something there. Got it.

SPEAKER_03:

So uh I mean, what uh have you guys set up as since you're going state by state, there's going to be different speeds at which it operates. Every jurisdiction operates differently. Uh it obviously now the pace at which Jay you referred to the rollout, it affects the uh it affects the entire 7 billion plus people, humanity, right? You can go to any country, any state, anywhere in the world, there's some AI implications. Uh how uh even as sophisticated as California could be, but this could be uh a more acute problem elsewhere. Have there been overtures from other uh jurisdictions that are not uh US states uh to help them write essentially the AI constitution, if I look at it that way. I mean, because they they have the ability to do things that may otherwise not be possible. Uh because cat is out of the bag in the US.

SPEAKER_05:

No, I think we uh we pay attention, especially on the research front, the policy research front, to other like-minded organizations, especially in Europe, where there's a lot of this thinking happening. Uh uh, we work with some organizations. There's an organization called the Future of Privacy Forum, which is active globally, for example. Uh and so our uh given, I mean, I think uh may have been uh you that told me this once, Gary, uh, but more startups die of uh star you know overeating than starvation. So I I think even nonprofits are not immune from that, uh, you know, immune to that. So we have to be really focused uh to demonstrate impact to make sure that our donors feel like they're getting their you know money's worth. And then where we can't uh you know do things ourselves, we have to either influence or partner, right? And so I think uh so I think that's kind of our uh approach is influence. Uh LinkedIn is a powerful platform, and honestly, to influence other people's thinking, uh, because you can do it without you know actively engaging with them. So that's why uh we do a lot of content, produce content, reflect other people's content. So we're seen as uh you know a productive voice in this uh ecosystem. And through our link, you know, links up link ups either informal or formal with uh other organizations, in at least at this stage, we'll that's how we expect to have influence for uh you know more more globally. Uh I think uh within the United States, the state by state strategy and federal is kind of how we expect uh things to things to roll out. Won't happen overnight. So, you know, we we do think that uh despite the uh you know, I do expect that this will be you know um a two to three year process uh of actually producing the right regulations and then figuring out a way to uh enforce them in a way that works for industry and works for humanity.

SPEAKER_03:

Right. So obviously patience uh is uh more than a virtue when you deal with the government, right? Because it's going to be change of uh administration, change of uh the flavor of the day, whatever the case may be. Uh uh in some cases, let's uh let me take a specific example, right? Uh say in drug discovery, I think AI has a extremely significant role compared to say that I can write I can write poetry as good as you know uh the best poet. I don't want to so uh how do how do you edge uh through your education arm, how do you help uh so that it's not a one size fits all? Okay, all training it as bad all train because if somebody some researcher uh partnering with say one of the large language models able to cure cancer, or a group of them are able to cure cancer, it's it's worth the trouble because they do have the data. So how are you guys uh uh separating the do good AI with uh you know something that the AI can be disastrous? You know, how do you how do you see it?

SPEAKER_05:

I think the ver verticalization of AI or the the the the taking the you know uh what uh some people call small language models or models that are more modular and focused on specific applications. Well, you know, fundamentally they don't need uh the ability to summarize Shakespeare and the ability to analyze uh you know protein uh uh protein uh you know chemistry, right? Both both those capabilities don't need to exist in the same model. So if you build a model that's purpose-built for the space, I'm also a mentor at Co-Motion, and one of the things I'm exposed to frequently is uh AI startups from the Institute of Protein Design, right? And so these are you know using the similar or the same technology. In some cases, they're using predictive AI, in some cases they're using generative AI, but they understand that the training data uh, you know, in their case, they're focused on synthetic training data. They can't, they can't, you know, they can't crawl the web to go look for protein chemistry, right? You know, so they have to figure out how to generate uh potential layouts, and they do that using uh synth uh synthetic data, and then they you know apply that to try to discover to do the drug discovery problem you're talking about. I mean it's still a I think it's still an open research topic. Uh the user generative AI in the space, but predictive AI is a much you know more well understood space, and and there you know there's lots of uh highly successful companies that have already been created potentially using AI even in the drug discovery place. You know, my there have been companies in Seattle that uh you know were built even in the starting in the late 90s that did use predictive AI and some some aspects of drug discovery. So it's uh it's it's not a one size fits all solution. There's uh despite the fact that um you know the same brand has been uh I I'm sure I've pitched a few to you, Gary, uh uh things or uh Sidish where I said, Oh, this is AI, and uh you know what I meant was something very different uh you know over the last 15-20 years. So nothing so that I think is part of what's happening is the confusion about what flavor of AI people are using and how there are you know very beneficial and uh productive uses of AI that are already being developed, even including uh using uh the underpinnings of generative AI, but not using the same uh fell-off the truck approach to acquiring training data.

SPEAKER_02:

So and Gary, just to add on this a little bit, this is really what we're advocating for, which is you know, in a real simple way, we're we're uh focused on training data and transparency around think of it as a nutrition label like you see in a cereal box. You know, is is there personal information going into your model? Is there uh copyright information? Do you have a license to it? We're not trying to take any trade secrets as I think most of your listeners know that trade secrets and in are in processing the training data um and and and curating it and and and so forth, much more than a high-level do you do you own it or or have you licensed it? Is personal information going in there? What are your broad sources? You know, all of these big players have announced a few of their licensing deals. Well, okay, so that they can't say that they don't want to tell anybody about all the trade uh all of their sources are trade secrets because they've announced a number of these publicly, whether it's Reddit or so on. Um and so um but but the advantage of of publishing what training data brought broad strokes around your training data is number one, you gotta think hard if you want personal information in there, or you should be curating it out because you might be facing some lawsuits uh if you don't, right? Secondly, um if you um um if if training data is licensed and has a real cost, then what you get is you get organizations that are focused on small models and and and specific problems like you know uh being super uh um uh accurate on detecting stage four pancreatic cancer, which is the type of thing we want AI to do well. And um and you don't need to be able to you know write a Shakespeare play to do that. And and and so um uh you know, just as an example, the company I ran focused on detecting fraud on a global basis. And when we were acquired that year, we had you know uh mid 60s of uh uh revenue, mid 60 million, 67, 68 million of revenue, and we had 11 million dollars of data licensing cost. So besides headcount, data licensing, IP licensing was by far the biggest part of our cost structure. Way more than than AWS processing costs, you know, 5x more and 10x 7x more uh than than AWS processing. So most companies that operate in AI license and curate heavily their training data. And um and and and and the other benefit of curation and and and sort of uh realistic amounts of training data is you can actually correlate when when you've got a problem in the output, you can you can link it back to the inputs. And in my business, every time our scores were skewed off in Southeast Asia or in Europe, it was because we didn't have as good of a training data set there as we had in the US around fraud. So we went and worked with customers in those areas. When you suck in everything on the internet, and that that that model hallucinates in a chat bot chat room and tells someone to commit suicide, and they do, which has happened, and you know, OpenAI says that's a hallucination, that's not a hallucination, it's a lack of accountability and loss of control of your training data. Pure and simple. And and there's no there's no excuse for it, honestly. And so, you know, um that that's why we're focused on on transparency around training data, because it it begins to narrow focus if you if you if you have some constraints around not sucking in everything that's ever been digitized to answer, you know, be create the cyborg deity, you know, basically.

SPEAKER_03:

So Rob, that's interesting what you just said, right? Uh they all these um organizations, I mean, they're very, very large companies, are very used to licensing everything. I mean, they have a floor, floor full of lawyers just doing this, right? Where what happened that one morning they said, uh, that it's okay to just suck it all in. Is it the FOMO, or did they see the rewards being so high that it was okay to pay the penalty later? Or what what changed in the mindset? Because for the most part, I don't think any of these organizations or any of these entrepreneurs woke up one morning and said, we violate everything. That they until yesterday did it.

SPEAKER_02:

Yeah, so uh, you know, I I don't know what the thinking was, but I can tell you what's been documented. Number one is at some point OpenAI just decided they were going to take a shortcut. And, you know, Jay mentioned uh the the CEO of the Allen Institute yesterday said, you know, uh the big tech AI is quickly losing the public's trust because they they're not telling anybody what they're taking in their training data. And and that decision was made, and for two or ish years, it was kind of under the hat because no one knew what was going on, and there's all sorts of crawlers out there to index data for search. So who knows if there's a crawler on your website or on your book or digitized item to uh suck in for training data until uh you know uh Chat GPT 30 or 345 or whatever it was went live and people started seeing the output in a very tangible way. So, you know, there were two two to three years of no one knowing what was going on really. Um, and then what you've got is you know documented uh articles on in internal discussions at Meta saying, well, we're gonna violate copyright laws, but they open A's eyes already doing it. So it's industry standard, let's just follow. And it's not, I don't think it's right. If you if you wrote wrote a book, Shiri issue an author, you know, I don't I don't necessarily think that you you you'd like your book indexed for search. I don't think that necessarily means you want it reverse engineered so someone can write a book under your name sometime with 80% of the same content and call it their own. Um and so um uh that that that's that's I think what's going on. And you know, I'm I'm extrapolating with a few data points, not any any um Jay and I don't have time to go research this. We're we're trying to, you know, look forward looking. So what are the uh go ahead, go ahead, Jay.

SPEAKER_05:

Sorry, there are there are whistleblowers' uh complaints that you know like this surfaced uh even just yesterday, uh uh at uh on kind of practices that uh at various companies. So definitely there's people digging into it and there's people of good conscience working in all these companies that don't you know don't like what's going on. Um but uh you know the profit motive sometimes uh drowns everything out.

SPEAKER_03:

So so that opens up this uh whole can of worms, right? Are whistleblower blowers protected? I mean, there were two whistleblowers at Boeing, they were not even in the AI business, right? Uh I mean these are extremely, extremely powerful organizations today, right? I mean, let's not let's not uh sort of uh pretend OpenAI is a startup, right? It's a it's a Microsoft uh thing, right? They are the biggest investment. Anthropic is an AWS company and so on, so what the point is uh Google company. So do whistleblowers uh uh uh I mean especially transparency coalition, I think can benefit from whistleblowers saying they are not or they are. Therefore, your education can be why they should, meaning legislate it. Is there is there a way for whistleblowers to be confident that uh they will be protected, uh quote unquote, uh from yeah, we're we're not we're not obviously experts in in that statute or anything of that sort.

SPEAKER_05:

But but luckily for us, you know, there are people with consciences around. Uh former VP of stability AI, a gentleman named Ed Newton Vex that we've uh met with several times, you know, he let he quit his job because he didn't agree with the the practices of data ingestion in his company. And so uh there are uh someone at Microsoft, uh Shane Jones, he you know came out and against some of the stuff. So there are brave people that we you know all should eventually, you know, we will all have to thank them, you know, for their for their service to humanity. Uh but I think the the important thing for us is that you know uh we're not uh you know the the stuff we talk about is grounded in our understanding of the problem. Uh we surface them, uh these other uh messages, you know, which may come at some human thoughts with the people that are speaking up, uh, you know, uh uh are usually reinforcing our messages. And of course, recently there was also talk about uh uh clawback oper, you know, clawback uh agreements at OpenAI, for example, where they could claw back uh equity from people who spoke out uh against the company. So it's a it's uh you know so there's every tool in the book is being used to keep people quiet. Yeah, uh obviously, you know you know, we're all familiar with golden handcuffs, you know, we love them when we have them. But uh but at the same time, you know, I think that there are some things that might be too important uh too.

SPEAKER_03:

And and it's uh it's important that you mention the golden handcuffs, right? Let's just take Nvidia as an example, the least controversial of the lot at the moment. I'm not saying that, right? So they moved from 2 trillion in February total market value to 3 trillion sometime now. You know, we're recording this in uh June of 2024. So is the reward, I mean, that's like swallowing up the top, the bottom hundred countries, right? They could write cash and just buy the bottom hundred countries of the uh in the in the world. Is the reward so unprecedented for the few? It's not like everybody's enjoying, right? Salesforce took a beating because they didn't have enough AI, while NVIDIA is rewarded disproportionately. The few are being rewarded so disproportionately that will be able to actually ever know what goes on inside these major models. Because the golden handcuffs uh i i i they they I've not even imagined yet what my reward would be. When you move a trillion dollars in a few months, that's that's is is that even imaginable by just average humans? Or even extraordinary humans.

SPEAKER_02:

It's uh I think you know it's um we're we're our goal is to to look out for ninety-nine point nine percent of American citizens that aren't part of that. And they're really not represented at all. I I don't know about Nvidia at all, but but but you know, I don't I don't believe that Facebook or Google cares a minute about 99.9% of the population, not for a second. Um they care about themselves. And um, and so uh but that's where elected officials and policymakers step in. And um, you know, there there uh there were there's precedent for this in the United States, and the reason antitrust laws exist is because, you know, in the in the 1870s and 1880s and 1890s, the railroads literally owned everything, and I've read enough books, you know, there wasn't a judge that wasn't owned in the West by Union Pacific. There wasn't an elected official that wasn't owned, bought, and paid for. The whole reason there are ballot initiatives in the Western U.S. is because it was the only way to get around Union Pacific stranglehold on every legislature and every judge. Um so, you know, um we we also believe that the general population understands that that that big tech really doesn't care about them at all. And um there's just a it's hard for them to know how to fight back against that. So um the the last thing I will say though is you know, one of the things that our policy focus is around is maintaining innovation in entrepreneurs, and your your listeners should know that. Because right now, if Jay and I started a company and found something really interesting to do in AI, it would necessarily require training data. And the way it goes right now, if we created something that looked like it had a good market, we would never get acquired. Our training data would be stolen because there is no downside right now. I'm not joking, Gary, you're laughing, but like literally OpenAI or or Google or Meta would take our training data, and there is nothing we could do about it because there's no downside right now. They're taking anything that's digitized behind paywalls, you name it. And so, you know, an entrepreneur literally is just going to be run over and run out of business if they come up with a good idea in AI that has any type of large market usage right now, um without uh transparency around training data and ownership and IP licensing requirements for training data.

SPEAKER_03:

So obviously you touched on uh one thing, Rob, which is uh 99% of the citizens uh care and uh want uh responsible uh AI, right? That there's no there's no question about it. So um what can the average citizen do? I mean, you you are educating legislators, obviously, the average citizen votes the legislators in or does other things that uh uh that will influence a uh a legislator. Could you share how can they find information uh about the activities you're doing and how could they participate uh in any activity that could help help you guys?

SPEAKER_02:

Yeah. So number one, we have a website. It is um our our focus is direct uh work with policymakers and broadly, a little bit more broadly stated, stakeholders in in the society, whether they're educators or b business executives outside of tech that have a lot to lose uh potentially. Um uh anybody who's in the publishing business, broadly stated, whether it's written or musical or video, uh all of those types of folks, that's what we're focused on. So we we aren't trying to take a broad citizen approach. We do there's a uh the Center for Humane Technology that put on kind of uh uh we're we're we're working uh you know shoulder to shoulder with them. They've got a much longer history, but they put out the social dilemma um on Netflix that I think uh many folks saw that they they have more of a focus on on citizen activation. But uh number one, citizens uh all should know a little bit more about their own privacy and how to protect it, uh opt out as much as possible on um uh on what's going on on their phone and so forth. Number two, though, is like anything in in this country in a democracy, um uh write your uh elected officials um at the at the municipal, at the state, at the federal level, and uh make your voice heard because it's important whether you're a parent or whether you're a copyright holder or whether you're just a concerned citizen, which Jay and I are. Those are the some of the ways um in our size and in footprint right now, we would love to have more active ways to get involved. We just we're not far enough along. We only stood this up in in uh October. That's true. Thank you.

SPEAKER_03:

So Rob and Jay, it's been uh fascinating. Uh uh this is uh I think we may have to have you guys back because once you have the first legislation through, this is you know, we are we are still in the we're still in the beginning stages of your journey. So we gotta have you as a featured guest uh periodically so you can continue to train our uh our own uh uh listeners on what what to look for. Thank you uh for your time and Shrish uh back to you.

SPEAKER_04:

Great. Thank you, Rob and Jay. This is a fascinating conversation, and we wish you all the best in your legislative efforts and look forward to hearing back from you at some future point.

SPEAKER_02:

Wonderful. Been a pleasure, thank you both.

SPEAKER_05:

Thank you so much for having us. Yep, cheers.

SPEAKER_03:

Thank you for listening to our podcast from Startup to Exit brought to you by Dai Seattle. Assisting in production today are Isha Jain and Mini Varba. Please subscribe to our podcast and rate our podcast wherever you listen to them. Hope you enjoyed it.