Genealogy of Cybersecurity - Startup Podcast

Ep 14. Privacy is in the Code: Relyance AI's Solution for DevOps Data Flows

September 25, 2023 Paul Shomo / Relyance AI Founder Abhi Sharma Season 1 Episode 14
Genealogy of Cybersecurity - Startup Podcast
Ep 14. Privacy is in the Code: Relyance AI's Solution for DevOps Data Flows
Show Notes Transcript

Innovation Sandbox finalist and Relyance AI Founder Abhi Sharma discuss privacy and compliance in a world where every company is a software company, and DevOps code produces so many data flows with your private and regulated data. Abhi points out a privacy solution must govern DevOps, “privacy is in the code.” 

Abhi discusses NLP, LLMs, OpenAI, and Chat GPT, and how Relyance AI’s intelligence understands privacy clauses in compliance documents, contracts, SLAs, etc., and having shifted left into static code analysis, understands if code is violating these privacy responsibilities. Paul and Abhi discuss how generative AI and NLP have sped up Relyance’s delivery of functionality. Paul pushes back on how they’ve built a product with so much functionality in such a short time. Abhi has an interesting response as they discuss AI and the future of software development.

You can find Relyance AI at Relyance.ai, on Linkedin.com/company/relyanceai, or Twitter @relyanceai

Founder Abhi Sharma can be found on Linkedin.com/in/abhisharmab or Twitter @abhisharma_b.

Send feedback to host Paul Shomo on Twitter @ShomoBits or connect on LinkedIn.com/in/paulshomo.

It looks to me like you're doing a lot for a startup. If you're doing code in API analysis and it kind of seems like your solution might understand contracts, compliance and policies. Yeah, that's a great question. I think there are a couple of reasons for that. And one at the very top of the list is we have hit a pretty interesting turning point in the world of AI and machine learning just from an NLP point of view. And I would say it's accelerating extremely fast with the just what we have seen in the last month with large language models. So I think that is definitely talk about building on the shoulders of giants, 100% .1 up there. The genealogy of cybersecurity is a new kind of podcast. Here we'll interview notable entrepreneurs, startup advising cisos, venture capitalists, and more. Our topic, the problems of cybersecurity, new attack surfaces, and innovation across the startup world. Welcome. I'm your cybersecurity analyst, Paul shomo. Yep. I am Abhi Sharma. I am the cofounder and co CEO of Relyance AI. I spent most of my career building tech. I'm originally a compiler in machine learning, enthusiast, and I'd say product builder. Journey is mostly a traditional engineer, to entrepreneur. And I've mostly been building startups all of my life or products all my life and alliance is definitely one of the most interesting, most cross functional and also the fast and growing project that I've been a part of. I love that background of going engineering to founder. And I want to also congratulate you for making the innovation sandbox finalists. I mean, it's great promotional platform and a great validation for what you've already done. Thank you. I appreciate it. So I definitely want to get into your product and company, but at first I kind of want to talk about you know new malware families and vulnerabilities don't usually cause a new startup or new product category to arrive. In my experience, typically startups come from to protect or govern like a whole new attack surface, right? And we know almost all companies are developing code right now, and that accesses and moves and stores data. Can you help us understand this intersection of the code and the data as a privacy and security concern? Yeah. Yeah, a 100%. And to your point about what provides the perfect Petri dish for sort of reliance to exist. I think in my view, and just I'll step back a little bit. From a macro point of view, there's three major ships going on in the world. One text becoming tech in general is becoming heavily regulated. And it's a combination of people building for code with more data. And that wasn't the case before. FinTech and healthcare tech used to be regulated, but now with the advent of AI, I think there's this excitement shock, nervousness, about tech in general. Second, in society, also there's an elevated sense of privacy, like the average individual who cares about it, like you use it in ten years ago, used an app, you were so excited that you could communicate with somebody that you didn't forget about you didn't really think about what's happening with my data. And today everybody's worried about if they're sending their daughters on TikTok and what happens to that data. We had a congressional hearing about it. One of its lifetime, first of its kind that if we actually banned it back here. And the third is everybody wants to do machine learning. I spend most of my career you know working in machine learning before it was cool. And now you won't find a single company at RSA that isn't doing something with machine learning, right? And against that backdrop, you have one of the most historic tectonic legal ships around regulations. I think that combination of all those directional vectors have allowed for and generated the need for having a systematic thought out solution for what privacy compliance from a regulatory point of view, but also just good data engineering. Because even customer trust is at its lowest point. And our thesis of Relyance was like you know whether you look at this from a data security posture management, whether you look at it from a data catalog point of view, whether you look at it from a pure regulatory privacy point of view, the synthesis, the foundational element, at which data processing happens the scope. And somehow we were trying to put all these different bells and whistles around data, they were usage data protection, and there's tons of cybersecurity products, but we seem to have forgotten that all data processing starts with code. Data is a side effect of code, not the other way around. At least not yet. And so we try to take so those were the conditions. This was important. So if you apply now and then obviously an important fact with customer trust being at its lowest. And so we just took what I would call the contrarian approach with a very interesting set intersection of a privacy lawyer and a compiler nerd, which is me and my cofounder kind of meshing their brains together to flip the approach and strategy on how things are being built. And we start from the source of truth and then build up to your requirements, your contracts, your policies, regulatory requirements, to attack surface, but not only from a security standpoint, but also from a trust input, like for example, typically when people talk about attack surface, they're like, okay, hey, Paul, have you left the sport open? Or is there an open S three bucket that's probably available? But there's also a trust attack surface, which is becoming super important now, which is, hey, Paul, did you use my email address for a purpose you did not collect that email address for? And that is also an attack service from a PR customer trust and usage standpoint. And I think that's where things start to get very interesting because every time things are being built, code is changing a million miles an hour every single day, you got to be attached to that surface in order to be mapped to every element of change that happens in data processing. So you can reason about that control that clicks it. And that's kind of being the general pieces of Relyance and has been very topical in terms of people resonating with that approach. I think the data life cycle gets really stretched with code. I do want to make you want to ask for one clarification. The code that you focus on securing is what your customers are developing, right? Correct. We are talking about with Relyance. The way we say is we help you match the speed of your privacy operations and data governance operations, the speed of your DevOps. And the code that we are talking about is the products, applications, ML pipelines, services that our customers build, or any other code pieces that borrow or use, even publicly available pieces into other ecosystem of things that but yes, that's what I'm doing. You have this great term I see on your website, data flows. So code is accessing, moving, and sharing data, and you're describing this as data flows. Could you give us some example of data flows? Yeah. By the data flows, what we essentially mean is basically in the industry is also known by lineage, which is the origin origination of data, all the way from source to sync. And what happens and especially this is true in data in the cybersecurity industry is anything around lineage has been mostly constrained to what we would call table to table lineage. I have 6 tables. This is how they're related. But I have this databases and this is the data type center for systems. When we talk about data flows, we kind of uplevel the conversation from the point of view as our different product components, microservices, and monoliths. Flowing data across your organization through different purposes and intents. And that could be potentially for sales and marketing or that could be for the applications you build, and they are being built into place. And so the data flow concept is really the lineage of data through your application topology, but up leveled all the way where you can reason about it from a business entity product standard view because some of the regulatory obligations apply from that point of view. So that's kind of what we mean by data flow and just to give a high level picture. Say I am you know, I think the chief privacy officer is what a lot of companies call it. Say, I'm a chief privacy officer, brand new at a Fortune 500 company. So from a mile high view, I'm drawing on my whiteboard in my office like one assets and data flows. I need to govern across. What does that mile high view look like? Yeah, it looks like very visually and really looks like well if you really look at a top down, it looks like what are my business entities at this company and which geo locations do I do business in? Then what products and services do I sell as part of each location? And there could be the same product, but different data governance structures might apply when you sell a product in any versus in North America in Brazil, for example. It could be the same product. And then the third layer of abstraction is, okay, when I talk about those products and services, what databases, cloud warehouses, internal microservices, systems and components, stitch that thing together. And if I'm a privacy officer, trying to govern the data flows within my organization, I need the ability to look at my data flow topology for the organization across all these three layers of abstraction and go up and down in that layer of abstraction based on the question that I'm trying to ask. And I think that's the key difference. And then you can sometimes slice and dice on wanting to do it by department by processing activity by risk, where you say, okay, what is my employee dataflow? Versus where is my marketing prospect dataflow versus where is my zero customer data flow? So there's that, I would say Oracle layers of abstraction that I described from business entity to microservices and databases. And then there are these horizontal slices in between where you look at the same information, but you ask a question of a data subject or a department or a data processing type view, if you will. So I mentioned before about some startups arise because of I guess a tax surface isn't always the best way to say because that implies directly defending a threats. A new thing you have to govern. So regulatory framework would be probably more applicable to you, but I know that another reason is new implementation technologies come along. Like AI that allowed you to build solutions that just couldn't have been accomplished in the past. Now, it looks to me like you're doing a lot for a startup from the outside. I mean, I can tell if you're doing code in API analysis and natural language processing and it kind of seems like you might not your solution might understand contracts, compliance and policies, but how are you able to do so much? Is that because of advances in AI, could you help me wrap my mind around that? Yeah, that's a great question. I think there are a couple of reasons for that. And one at the very top of the list is we have hit a pretty interesting turning point in the world of AI and machine learning just from an NLP point of view. And I would say it's accelerating extremely fast with the just what we've seen in the last month with large language models. So I think that is definitely talk about building on the shoulders of giants. 100% .1 up there. The second points are a little bit more, I would say, an interesting mixture of sort of founder expertise coming from different sides of the point of view. Because the most startups are reflection of the people who start them. And I think we have that going. So the second thing for us is when I spend a lot of time writing compilers, we have a specific team. We're very focused on this. So we were able to borrow these ideas of treating privacy and data protection as a constrained solver problem. The constraints are laws, your policies, your data, usage posture, and what you're solving for is the operational reality on how you're solving and you would actually using the data, which is kind of how a compiler works and it's hard to start a lot of people think about it. And then when it comes to contracts and policies, we have that specific