Genealogy of Cybersecurity - Startup Podcast

Ep 13. Zama on the Holy Grail of AI Privacy, Fully Homomorphic Encryption

September 12, 2023 Paul Shomo / Zama VP Benoit Chevallier-Mames Season 1 Episode 13
Genealogy of Cybersecurity - Startup Podcast
Ep 13. Zama on the Holy Grail of AI Privacy, Fully Homomorphic Encryption
Show Notes Transcript

Innovation Sandbox finalist, Cryptographer, and Zama VP Benoit Chevallier-Mames discuss Zama’s efforts to bring fully homomorphic encryption (FHE) into commercial use. How FHE would allow application developers and customers to benefit from the insights obtained by sharing data with AI providers, like OpenAI or ChatGPT, but without exposing private data.

Benoit goes through some of the mathematical magic behind FHE, what ML approaches it enables, and some of its history. Benoit explains why fully homomorphic encryption has been such a performance challenge, and discusses Zama’s quantization approach. 

Finally, Benoit unveils Zama’s announced strategy to focus on securing blockchain smart contracts until cloud computing allows them to wield fully homomorphic encryption for the broader spectrum of AI use cases.

Zama can be found online at Zama.ai, on LinkedIn.com/company/zama-ai, or on Twitter @zama_fhe.

Benoit Chevallier-Mames can be found at Linkedin.com/in/benoitchevalliermames.

You can also watch this episode on using fully homomorphic encryption (FHE) to preserve privacy with OpenAI, ChatGPT on YouTube.

Send feedback to host Paul Shomo on Twitter @ShomoBits or connect on LinkedIn.com/in/paulshomo.

Well, that sounds like the Holy Grail nail being able to let chat GPT look at your private data without actually seeing the private secrets. Yes. So really, really in zama, we want to let people continue to use the great things about machine learning, but without having to pay the price of giving their data. And this is doable. I mean, this is we already do it for a lot of machine learning models. And we know that it will be doable also for very complex models like LLM, and we estimate that it's doable in the size of the next year. The genealogy of cybersecurity is a new kind of podcast. Here we'll interview notable entrepreneurs startup advising cisos, venture capitalists, and more. Our topic, the problems of cybersecurity, new attack surfaces, and innovation across the startup world. Welcome. I'm your cybersecurity analyst, Paul shomo. Benoit Chavallier Mames is a cryptographer and current machine learning team lead at the start of zama, which was a finalist at this year's innovation sandbox startup competition. I want to point out that cybersecurity is an international collaboration and many of us work with people around the globe who speak with accents. Benoit does have a pleasant French accent, but I want to fill you in on one word. There's a particular word here that if you miss it, you might be confused while we're talking about it. And it's just the acronym, FHE, which stands for fully homomorphic encryption. FHE is actually the Holy Grail of encryption for third party while encrypting data to be used by third party AI providers without giving up your private data. And zama is of course working on their solution for fully homomorphic encryption or FHE. And so when FHE has said very quickly with a French accent, it sounds like effigy, but it's actually FHE. So I just want to make sure you you were on the good receiving end of that and understood what we're talking about and enjoyed the interview. So I'm working the security industry for 20 years. So I started to work in the smart card industry. In the company, which is called the templates, so here I was an engineer, but I also did a PhD in cryptography. So I did that for 7 years. Then I went to Apple, where I was doing I was working on obfuscation and white box cryptography. And there it was amazing. And I spent 12 years, and now I am in zama for three years. Here I am the head of machine learning team. So I guess we are going to discuss that a bit more into the details. So here in zama, we are working on making open-source tools. To make a privacy protection easy for the users to protect them data. Sounds good. And here you can congratulations on making innovation sandbox. The finalist you know, the ten finalists in the past years, the judges do a really good job of picking companies that are very successful in the future. So congratulations for making that list. Nice. Can I say a few words about it? Yeah, go ahead. Yes. So yeah, so we have the pleasure to be the man to be very present at the next RSA. So yes, as you said, we are going to be in the ten finalists of the sandbox, which is great. So we'll have a CTO with also a famous cryptographer. We will present the company. But we also have other tools, so I am speaking with my colleague Jordan. We are speaking at the machine learning track and we are going to explain how we can do privacy preserving machine learning. So that's one thing. And the other thing is that Mark, so our chief scientist is also presenting yet another paper here on the cryptographic track. So yeah, a lot of presentations that are coming are the same. Very cool for people going, you'll get a lot of interesting stuff from Zama. So before we dive into because you have this great background in cryptography and machine learning, before we go into homomorphic encryption, which I do want to talk about, I just want to get from a very high level because I know that lets customers share business data while preserving the privacy with third parties. And I know of course you do this with this kind of exotic encryption called homomorphic encryption. But just from a high level, I kind of want to understand what kind of people zama allows to share data. So my question is, do you have a platform for users to share data with outsiders? Or is it the Zama platform for software developers to code applications that share data with third parties? So that currently is the mainly oriented to developers. So we are giving them some tools such that they can create applications where the privacy is available by design, so it is a lot about open-source. The developers can try on tools and make their own prototypes. Of course, we also give support. So we have a lot of channels, so that they can try and ask a question and we attempt to make the best products. So at the end, we hope that this will make a privacy preserving application directly available to the customers. So yes, one of the things we say often in the company is that we don't want the users to care about the privacy. We just want it to be private by design. It's a very different world where every company is becoming a software company and now in cybersecurity instead of securing IT where securing software developers and we're securing data scientists and machine learning engineers. And so I guess my second question is specifically though, your customers when they when they're the software developers share data, the third party that they share it with that you focus on is sharing the data with machine learning engineers that are at a third party company, correct? So one of the things one of the things which is very important for us is that for yes, we want the developers to make this privacy preserving tools, but we know that they are not cryptographers. And we want to avoid that they make a mistake or you know who are too complicated for them to use it. So for my specific case, for example, for the machine learning case, we are making a product, which is called a concrete ML. But actually, the APIs of the tool we are doing are very close to the APIs of the tools that there are. I mean, our users, the data scientists, are already using. So just to give an example, we are very close to scikit learn and to torch. Which are, I mean, a very standard standard frameworks. So it's almost the same thing that we just have a few more options just to activate the FHE. So at the end, it should make that the adoption of these tools is very easy for the developers. And also we avoid that they make mistakes because I mean, you know the news, it's very easy if you don't know the cryptography to make mistakes without realizing that you are making a mistake before it's too late. Sure, absolutely. It's good to help the developers develop secure code from the start. I do just want to clarify though. The only time you're sharing the data that's or your customers are sharing the data that's undergone homomorphic encryption, it's for the purpose of sharing it with machine learning engineers and data scientists. They're not just like sharing files ever, right? So what we do is that it's a developers, takes their models. So they are machine learning models. And they convert these models into equivalent models, but which runs directly over encrypted data. And then they can deploy because it's one of the options in the tools we do. They can deploy this FHE models directly on the servers and then they are own users instead of using the unsecure machine learning models. They can directly call the new models with data that they have encrypted with their own keys. The computation will be done in FHE. Maybe we will explain a bit how it works, and also later. And then the result is encrypted at the end. It comes back to the client, which is with the only one to know the secret key. So with the only one able to get the final result in the clear. So I'm just trying to orient to the fact that you're sharing so that machine learning, you're sharing data with machine learning engineers and data scientists. That's who your target to share with. You're not just sharing with an HR person or an account of it. You're specifically sharing for people to do machine learning and data science on the data, correct? Yes. So yes, we allow to do data science on encrypted data. Before we dive into homomorphic encryption a little more, could you could you verify two? Homomorphic encryption is one of the types of encryption that's considered still to be quantum safe. Is that correct? Yes, it is. Yes, it is. Could you maybe explain in 2023? What's the big deal with being quantum safe? And why is it so important? So yeah, so our cryptography exists for a long time. And the rise of cryptography was maybe when our essay was invented. So I would say it was invented at the end of the 70s, and thanks to RSA, it's possible to do digital signature or I mean, rising safely on the Internet. But the problem with this kind of cryptography that we would call a classical cryptography is that it's based on magic magical problems. Like a factorization or a discrete discrete logarithm, and this problem are known to be broken by quantum computers. So it means that when we are able to make quantum computers, and the research is making a lot of progress here, we will know more be able to use this RSA or DSA algorithms. So that's one of the important field of research in cryptography to deal to find, to analyze algorithms, which are quantum safe, which means that even when these computers will be available, the photographic schema would be secure. And yes, the problem that is the core problem below FHE is conducive. So even though people don't have access to working quantum computing right now, I mean, I guess there's some debate. If that happens to be the case 5 years from now, everyone can decrypt the data you're encrypting now unless you want to be an early adopter of some form of quantum encryption, right? Yes. Things which will be based on the classical cryptography will be broken. But the future, but I don't expect the quantum computers to be available in 5 years. What we may see is that it will be available to some maybe governments and so it will be broken by a government not by accuracy in the world. But still, yes, we know that it should come at the end. So we are preparing this transition. Besides the fact that homomorphic encryption is quantum safe, the other thing that's very interesting about it is that you can share it and let a machine learning engineer or data scientists work on it without actually exposing the private data and somehow they do data science, give you back the answer and the integrity of it all still holds up. Can you kind of help us understand that? Because that's kind of mathematically magical sounding. So yeah, so yes. So when you say a magic machine learning engineer, actually, it's not a human who does that. It's an algorithm, but yes, so the principle of FHE is the following, so FHE is a scheme where when you have encrypted values, you can add them without knowing the private key. Normally, with classical cryptography, once you have ciphertext, so to include values, you can't manipulate them. I mean, it's encrypted. You can't know more. You can't do anything, but decrypt it. With FHE, which stands for fully modern encryption, you can still manipulate the encrypted values without knowing the clear. So the clear data. And without knowing the private key. So you can add things. You can multiply things. And what is magical with the full human scheme that we use in Xamarin is that there is an extra operation. Which is called programmable bootstrapping. Which is the equivalent of a table lookup. So here, when you have an M 50 value, you can apply your table of your choice on this table on this encrypted value. And that's great. Because it allows to replace the activation functions. So which are critical elements in machine learning algorithms by tables. So we can replace virtually any machine learning model by its equivalent. So programmable bootstrapping is really a magical weapon for us. The only thing is that it's quite slow. So we are working hard in the mat to make it as efficient as possible to have less and less PBS programmable bootstrapping. But we also know that with CPU or GPU, we will have a limit in terms of speed. So we are also working with hardware companies. And there are a lot of hardware companies which are interested into FHE to make hardware accelerators. So we expect by maybe 2025 to have the first very efficient hardware accelerators and to be maybe 1000 or 10,000 X faster. I'm sorry. FHE fully homomorphic encryption is what that stands for. And there's a few different kinds of homomorphic encryption, but fully homomorphic encryption is the best of the time. Exactly. So a bit before I mentioned classical cryptography. And so when classical cryptography was invented, also came some schemes which were called partially homomorphic encryption. Which means that there were able to manipulate, but for a single kind of operation. So for example, you were able to add ciphertext together, but not multiply them. And so it was invented in the 90s and notably by one of the of the founder of the map. As a very well-known additive scheme, but for fully mummified encryption, so for several operations, it was believed to be very hard. For a long time, we were not able, they were not able to find the scheme.