Truth-based AI: LLMs and knowledge graphs - back to basics Artwork

The AI Fundamentalists

A podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses.

All Episodes

The AI Fundamentalists

Truth-based AI: LLMs and knowledge graphs - back to basics

May 31, 2023 • Dr. Andrew Clark & Sid Mangalik • Season 1 • Episode 2

0:00 | 30:56

Truth-based AI: Large language models (LLMs) and knowledge graphs - The AI Fundamentalists, Episode 2

Show Notes

What’s NOT new and what is new in the world of LLMs. 3:10
- Getting back to the basics of modeling best practices and rigor.
What is AI and subsequently LLM regulation going to look like for tech organizations? 5:55
- Recommendations for reading on the topic.
- Andrew talks about regulation, monitoring, assurance, and alarm.
What does it mean to regulate generative AI models? 7:51
- Concerns with regulating generative AI models.
- Concerns about the call for regulation from Open AI.
What is data privacy going to look like in the future? 10:16
- Regulation of AI models and data privacy.
- The NIST AI Risk Management Framework.
- Making sure it's being used as a productivity tool.
- How it's different from existing processes.
What’s different about these models vs old models? 15:07
- Public perception of new machine learning models vs old models.
- Hallucination in the field.
Does the use of chatbots change the tendency toward hallucinations? 17:27
- Bing still suffers from the same problem with their LLMs.
- Multi-objective modeling and multi-language modeling.
What does truth-based AI look like? 20:17
- Public perception vs. modeling best practices
- Knowledge graphs vs. generative AI: ideal use cases for each
Algorithms have a really interesting potential application which is a plugin library model. 23:00
- Algorithms have an interesting potential application.
- The benefits of a plugin library model.
What’s the future of large language models? 25:35
- Practical uses for ML and knowledge base knowledge databases.
- Predictions on ML and ML-based databases.
- Finding a way to make LLM useful.
- Next episodes of the podcast.

What did you think? Let us know.

Good AI Needs Great Governance
Define, manage, and automate your AI model governance lifecycle from policy to proof.

Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.

Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

LinkedIn - Episode summaries, shares of cited articles, and more.
YouTube - Was it something that we said? Good. Share your favorite quotes.
Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.

Unknown: 0:03

The AI fundamentalists, a podcast about the fundamentals of safe and resilient modeling systems behind the AI that impacts our lives and our businesses. Here are your hosts, Andrew Clark, and said Mongolic Welcome back for Episode Two. I'm Susan Page, I'll be your host for this episode. I'm also here with Syd and Andrew, are fundamentalists said, I am a research scientist over a monitor. And I'm currently working through a PhD in NLP and psychology. Hi, I'm Andrew. I'm a CTO and co founder of monitor. Today we're talking about LLM. And specifically, what we're going to really dig into is knowledge graphs. It's kind of a use case, a lot of people are talking about using internal knowledge databases and things like that, and using MLMs for them, and we're going to kind of break that apart, show that some of this has already existed and where you can level it up with MLMs. But then also where existing technology has a use. And at the heart of being the fundamentalists, it's so important that we're bringing the topic of MLMs back to their roots. Before we get into it, let's acknowledge where we are in the new cycles for the sake of this discussion. Yeah, I think there's been some really interesting stuff that's come out over the last even just few weeks here. Obviously, you've all been playing with Chet GPT for a while. But now we're trying to see companies make efforts towards letting us you know, tech users and non tech users see this in our, in our products in our lives. One place, we saw this was Google Cloud. With their vertex AI, they've made this new platform where they'll give you access to BART and bar to, and they want to help you train your own version of this model. And so we're really seeing companies making this push to make it easier for everyone else to use these models and really get in depth with LLM. I've also really liked that, there's starting to be more recommend more people putting blog posts and articles on like, how to use it and where to use it. And I think it's it's having a nice feedback loop of it came out with everybody's like, Oh, this is gonna solve all business issues. And it's an ex panacea and productivity. And I'm liking the trend back, we'll link a couple articles in the show notes that are showing more like, here's where it's practically useful. But helping narrow that scope of it's not this, this tool is going to solve everything, it is definitely a useful, useful thing. And what we'll get into later in this conversation as well is like the parts of the LMS that have been around for a while and there and we talked about last time integrated in your daily life, that I'm happy, we're seeing more that distinction of here's where it's going to be useful going forward. And here's where it's nothing really new to see here. And here's where it may not be the best use case. So it's definitely the conversation has been getting better over the last couple weeks, I think. Yeah, we definitely got to spend some time on both of those, basically, you know, what's not new and what is new. And something we can talk about little bit later. But what is new? Is the instant plugins, and how that might actually shape how we see this being used in the real world, in regulated settings. Yeah, that's really this is, it's been really interesting to see how quickly the tone changes, like Andrew said, one of the reasons that we started the AI fundamentalists, right was to just be that voice of rigor, when I hear you guys talk about your experiences, and what you see, whenever we do pull up these news stories. And what you guys see, there's, you know, it's about getting dip back to the fundamentals behind these and these plugins, and why do you see these tone changes? Because in like, yeah, we want to use this. But now we don't want to, you know, now we really need to think about it because of what we're seeing in what we're seeing in the news and what we're seeing in the market, and even the challenges that we're seeing amongst the data sciences, scientists and engineers in the model building community. So that said, like, what are let's, let's talk about that for a little bit. What's our topic for today? What was inspiring some of the stories that we saw? Definitely. Yeah. So we wanted to talk today about just MLMs in general, and kind of just their more use cases, how we see them, where, where's the signal from the noise, kind of how can they be regulated? Or what kind of monitoring would we have in place? We're going to talk a little bit about hallucination, which is one of the big problems with LLM type models, where they can talk about a couple of areas of like people are talking about MLMs where they could be useful but we already have existing In technology that can be useful. And and kind of where we see them fitting into the landscape. And then the next week we will, or next next podcast, we'll get into going to the that first principles approach, as we talked about, which is really breaking it all the way down to what is what is data? How do you build the data properly? What are the management policies want in place, and we'll try and tee that up a little bit in this conversation, as initially, we'll walk through simpler models and LLM. But we will eventually get back to this level of modeling. And hopefully, by that time, we'll have more of that deep rooted understanding on how should you properly train your model, maybe don't just expose it to everything on the internet and think you're gonna now have this great knowledge engine, but actually systematically build that that proper data. So today's call is really that overview of LLM is it's a major part in the news right now, kind of our take on them and where we see them as useful and the pitfalls. Yeah. And to that end, I want to give some recommended reading here on this topic, because while this isn't like a solved problem, people have started to spend some time thinking about what is this going to look like, especially for tech organizations that are trying this type of work for the first time. So chip Yuen, and they're building LM applications for production, start to work through some really nice, fine grained details about what this looks like end to end, what is this going to look like for server costs? Is your use case? Could it be a good fit? How are you going to be generating embeddings efficiently, so really covering that full pipeline from end to end? And then also, a recent Yang at all, harnessing the power of MLMs and practice a survey on Chad GPT and beyond? I thought this is a great read. This really covers for outsiders. What what did these technologies do at a high level? What don't they do? And starts to start start to address a little bit of this hallucination thing we're going to talk about. But I'd like to hear what Andrew has to say about regulation, monitoring and assurance and alarm space before we start digging into why this might be challenging, but what what do we want that to look like? Definitely. Thanks for that said, and on those on those recommended readings, why we chose them and the things they pointed out? It's it's really great to see that stat people are starting thinking about how do we actually use these versus unless you're just using chat GPT. As is for enterprises, that's most likely not going to be a thing as you have the news coming out of Italy and some other places about how and JP Morgan putting a stop to using the opening I chat GPT if you're actually wanting to leverage these technologies, you need to now start thinking about like the actual deployments and like a it isn't just this magical box, you have to figure out how are we going to do this. And that's what really gets into the regulated, monitored assured, of course, anything could be regulated. But one of my concerns with with what we're talking about now is just like regulate generative AI models like okay, well, what does that mean? What are we regulating? Why are we saying only certain people can build them? And what I'm a little concerned about in the regulation conversation at the moment is open AI. Now, a lot of people think it's ironic that they're asking for regulation? Well, it's it's not really because they're very, very, very well funded. And they're wanting to have barriers of entry of other people that can operate at their size scale. So put regulations they can hire the people to to, to perform whatever regulation steps you need. So my I'm a little bit concerned on just let's make new generative AI regulations right now without knowing why are we regulating and for what purpose, for instance, a lot of the the meat and potatoes that people want to have regulated and LLM is, you know, data privacy, how do I know my data is, is not being used in these algorithms? Or how to how do I know that they're being truthful and different aspects like that? Or how do we know we're not exploiting individuals? Well, we have data, there's existing data regulations, in some states, maybe we start talking more like a you a US why GDPR type scenario where we're doing talking about data privacy, maybe that's gonna be a little bit more impactful than just doing generative AI legislation, because that's a very broad category. And I unless there's a lot of discovery and a lot of input there. I don't necessarily know that that Congress knows what that even means. Unless they do a lot of discovery, right? And if you have open AI saying, I want it this way, because it's going to benefit me that's not really benefiting the American people and making us safer, right? So I'm a little concerned on what just regulating that the technology versus regulating the use cases or putting those guardrails in place, because also, if we're doing it for consumer use cases, you have FTC and other areas, and they've made it very well known that they're wanting to start if you're saying you're doing all these things, we're gonna come at you. We have existing regulations that can do so. So I'm just a little bit hesitant on just like, let's just regulate MLMs for MLMs versus like, what are we actually trying to solve? And there are a lot of privacy and actual concerns here. So I'm not trying to dismiss that at all. It's more of a question of let's break this down to the first principles the fundamentals of what are we trying to accomplish? Are we talking about individual privacy are we talking about non make sure these aren't being exploited? Let's Let's hammer in on those. And if we need to give the FTC a few more teeth, or if we need to expand some data privacy, let's talk about those aspects versus just doing a blanket of generative AI, because that could also slow innovation. I'd love to hear us its thoughts on that. Yeah, no, that's, that's, that's great. And I think that this really touches on the underlying problem here, which is that open AI, makes Chad GPT makes it public, and then immediately turns around and says, Well, we need to regulate this. And I think to people that are practitioners, this is a little bit transparent as closing the door behind yourself after forcing it open. And so we are in a situation where the major players here are pushing for regulations, which are going to look more like, are you allowed to release these models? The same way that we have. But I think what Andrew saying what we probably need to do more as practitioners is think about, what is data privacy going to look like business models, you know, what is going to be valid and fair training data? What is going to be allowable and permissible questions and prompts to give to the servers? And what are going to be acceptable answers to give these types of systems. And these are questions that the major players in this field aren't going to be as interested in, they're going to be a lot more interested in basically, who is allowed to play in this space. Great point. So that's where it just behooves us to take a step back and think about what we're trying to accomplish here and get past that hysteria of Oh, my, there's these things happening and figure out what is it that we're, we're wanting to accomplish? Once we figure out what that is, and we're move forward or even before that, for companies that are trying to deploy these algorithms responsibly, you already have the NIST AI risk management framework, which we can do a full deep dive on at some point. It's something that we work with a lot. So there's already like an overarching framework on what what how would you kind of approach this or what are the good policies you should have in place for for models, and then once you have these plot these these frameworks, which already have NIST, and an OCC has some great guidance. It's currently very written for banks, regulated banks, but it is like the gold standard for model risk management Miss took some aspects of it. And I would even go back to OCC as like this is what we should be doing. Once you have that, you can assure something, you can then do an audit and periodically validate have independent people see that you're performing the what you should be doing those controls, mitigating those risks. And of course, monitoring definitely flows into this is like, oh, as soon as mentioning, as in for an enterprise or even a country, like we want to make sure that we're only using these algorithms to help to do non consequential things, or help help us be more productive, productive at work. We're not We're not meant to make sure it's not answering how to build a nuclear bomb, that kind of thing, we want to make sure it's very much in the right spaces. So as a company, if I'm deploying this, make sure it's being a productivity tool, I can actually put guard rails on it can only accept these prompts. I can also still do my traditional input, drift output drift on what are the inputs we're putting into this model? If we're wanting to do prompts that are primarily around, you know, make make outlines for business presentations? And how do I improve my summarization, those sorts of things. We can make sure that those those are the prompts that are going in and, and what the outcomes are, make sure they align with some sort of a framework. Yeah, so I actually have two questions based on what you said before. And we might get into this later. But when you were talking about regulation how is this like, how is this overall different from some of the processes that we've experienced before in MA either model, risk management or model governance? That's a great point. And I love SIDS thoughts on this as well. And I think this is one thing where in the hysteria and keeping up with the Joneses among the big tech companies right now trying to deploy these things, we've lost a little bit of sight. Now there are some new things and said can help elaborate on what those are. But there's a lot of like, the existing we're still building models, there's basic blocking and tackling. There's there's correct approaches we should do. There's there's risk management steps, and there's validation steps and, and existing processes that you still go through, and you're still building a model. And you still need to have a reason why you built this model. How are you deploying in an enterprise setting? Those sorts of things? I don't if you're using a large foundational model and tuning it like oh, so it was mentioning earlier, you still need to know why you're doing it. What's the intended use case? Just this like, no matter what opening I says, and I'm a supporter of doing, you know, large language models. It's not like we're not in an inception moment. It's not thinking it's not an intelligent human being. So you still need to know what guardrails we're putting in place. So I think there's actually less new here than then than there actually is. Right? There's there is definitely new new technologies here but it's we're existing approaches you still should be doing? You don't throw existing risk management out the window. Yeah, and I think that's exactly right. So let me just dig in a little bit here. Right. So there's this public perception, I think that these models are basically new and different, and that they give correct, meaningful, thoughtful answers to problems. And that you should basically take them at face value, because it looks extremely coherent when you see the responses it gives. So what's different here, about these models versus old models? I would say it's probably people's perception of their abilities. Right? So with all GPT, GPT, two type models, you had them, you deployed them, and you could make it read a nice article about a unicorn in a field. And people like, wow, this is great and really interesting. But I wouldn't take legal advice from this. And where that where that comes from? Is this disconnect between how we think these things these models work and how they're actually designed to work? Right. So the same way as the as the unicorn in the field. And you asked me questions on your precalc homework, is it's only trained and optimized to give answers that sound good to you. These models are not optimized to give correct answers. And you may have heard this type this term floating around, but the word that we're using in the field is hallucination. And the idea of hallucination is basically that there's an unjustified belief that the model has cooked up based on the Trinity that has given into it. And these hallucinations, similar to how we might think of it in psychopathology are basically ungrounded assertions or beliefs, which weren't actually seen in the real world. And so what is the real world to a model? That's training data and what the real world to humans, it's what you get into your sense organs. And just Just a quick clarification on that is what NLP for any type of machine learning model including generative AI, it's optimized over something it's trying to get, it's trying to perform some tasks. So traditionally, like machine learning, regression, or any type of modeling, if you're doing like classification predicting between cat dog, it's trying to optimize the accuracy of predicting between cat dog in generative AI, where this disconnect is it's optimized into being looking as much like a human robot as possible when it's developing content. So that's where this hallucination gets really scary is it's made to look as accurate as possible, it could it's not actually knowing what it's doing. It's just making something up. If you like, if anybody spends a couple of minutes with these tools, and starts asking, you know, what year did Lincoln become president? Or, or just basic things like that, it's probably gonna be giving you the wrong answers, or at least you do it consistently enough at will. But it's going to make it sound like it's a human responding it. And does that change now, now that you know, we're we're at a time of space, we're having this discussion where now these models, these LLM 's and chat GP GPT, are enabled to go out to be able to access the internet directly, and scrape some of these answers. Does that change anything about the hallucinations or the tendency toward them? Yeah, I think we had really hoped that this would be like some inflection point. And I think if you want to see an example of this, you can go right into Bing. And then was given the ability to make things searches. And for any answer, it gives you give you a citation on it. However, Bing still suffers from the same problem. It's not trained for correctness, it's trained for comprehensibility. And so the responses it gives have still been shown to be incorrect, have still been shown to need, you know, these really interesting guardrails around like, you know, you'll give it it'll give you a response. And then they'll delete and say, sorry, I can't answer that. Or it'll even not be allowed to have a conversation longer than 20 turns with you. These types of guidelines are here, because you know, it just hasn't solved the underlying problem. And that's just, that's the optimization problem. I'll even, I'll pull it a line here from the Stephen Wolfram of Mathematica fame here. And this is about Chechi beauty, but it's relevant here. Chat GPT for all its remarkable powers, and texturally generated material, like what it's read from the web, and not be expected to do actual non trivial computations, or to systematically produce correct rather than just look roughly right? Data, etc. And that's really what's happening here. Right, we're doing next we're generation, we're creating the best answer that a human will say, I agree with this. This is what a human would say. And we're not creating knowledge based truthful answers. Great point. Yeah. And we've talked about it before, and we'll have a whole podcast at about it at some point about multiobjective modeling where you can choose multiple. This is not even a common approach in traditional modeling where it's an easier thing to accomplish is we can choose to optimize a model over fairness, performance, any other criteria. But I like theoretically we can make large language models be optimized for truth as well. But that's how you define truth and every wide range of domain and spectrum, it's gonna be very hard to make sure you look up all that information. Yeah. And that to that point, great question, what does truth based AI look like? Well, ironically, this has actually been something that's been around for a long time. And this is where the public perception versus where in modeling circles and in academia, what will people use is, we've actually had a lot of the use cases people want to use generative AI for is it's a solved problem. Honestly, it's knowledge graphs, it, we're bringing knowledge graphs back, that's the hashtag for this week's podcast. Neil, for Jay is a graph database that's used for knowledge graphs, a lot of companies have been using these sorts of things where basically you, you had used graph theory, to create an interconnected web of like a knowledge base for you could use this as for customer service, or those sorts of things, or for detailed, complex systems. It's, it's usually defined by subject matter experts. It's been used in law, medicine, finance, education, it's grounded in reality, it's really older, rule based systems AI, it kind of fell out of favor for a while because of how much work it is to create. But the biggest reason is that in how do you interact with it, you need to know like SQL, you need to know how to interact with it, where we see like, instead of trying a lot of people, there's a push right now to use LLM to do knowledge graph type work, I'm gonna acknowledge database, based on the conversation we're having, like the making the truthfulness, multi objective, the current version of LLM in the world, or only optimized for at least the chatty pcs of the world are optimized for the generative sound like a human, and then trying to overlay those on top of actual information is not going to really work well. But what where the magic could happen is use these existing knowledge, graphs where why they fell out of favor is how do you interact with them? Well, you can use a generative AI and or normal existing NLP techniques to be able to create that chat interface that didn't convert what the customer or the the individual interacting with it wants into a parameterised query that then hits the Knowledge Graph. So you're actually getting truth from information and you're just using the interactiveness of conversing to create queries that then hit your Knowledge Graph versus thinking you're just going to train a foundational model on top of that. Yeah, I think that's exactly right. And I think that's the future that we're looking at right is that these MLMs basically have not proven themselves capable of being knowledge databases. And that's not what they're meant to do. Right? They're basically meant to be great conversationalist. What we want to see in our models is robustness, fairness, repeatability, and understanding of how they work. And that still looks like our old style modeling techniques, right? That's our statistical tests. That's our linear regression. That's our tree based algorithms, because we can understand how those work. And we can show any regulator what that is. And they can understand what that means. So algorithms have a really interesting potential application, which is basically the beauty plugin library model, where you bring your robust knowledge graph, which has been vetted by experts to give correct and factual answers. And instead of talking to it and SQL, you talk to it which at GPT. And you talk to it the way you would talk to another human, and it will generate queries for you, which you can then audit and inspect later. But get truthful and correct answers from expert databases using a standard conversational model. And as we talked about earlier, about the regulating assuring monitoring type solutions, now that we've we've taken down from potentially anything I could ever ask about, I need to verify that it's true and validate against it, we've taken down that validation space to being able to monitor and validate that how I'm interacting with this knowledge graph that I can I can take a sample size of 25 I can, I can try to hammer with with queries I shouldn't be asking you about, about how to, you know, whatever, anything you don't want it and asking. You can actually validate and see how would the system respond by creating those? Those guardrails? Say the Knowledge Graph is about legal advice. Disclaimer, don't use a knowledge graph for legal advice with chatting continue on top. But let's just use that for sake of example, and I start still, if that's what I have my knowledge graph on I can start hammering it to see if I word it different ways. Am I getting the correct advice? If I if I asked you about medical advice, does it give me an answer say sorry, I'm an I'm only for law database like that we can actually do that validation, we've taken down that feature space amount of things we need to validate so we can actually test validate assure that our conversational engine is working properly on top of that knowledge graph. So this is where like, oftentimes in technology it's, we have these inflection points where you have technologies finally useful where Now having that with LLM, but that doesn't mean that like technology you don't go from you know, buggies. You know drive horses, or like horse and carriage buggies. You don't go from there to rocket ships, there's like, there's steps in between, we're at one of those steps where we changed from, we have the Model T now, right? So but there's still different old school techniques we still need to use with it. And it just we haven't gone from, from horse and buggy to rocket ship, there is that next step. And this is kind of where we see that next step being, and it can help for customer service aspects like that, but it doesn't solve everything. Yeah, I think that that's exactly right. And you know, this is still going to be a growing field. And they're just starting to see these integrations come in. But we've already seen really great success with stuff like Lang chain, or the Wolfram Mathematica plugins, were giving these MLMs, a source of truth, has really enabled them to do the right thing, much more so than just giving them access to the internet. Give them a tool, which we know has validated, only gives correct answers. And then we see them flourish and do exactly what we want them to do. Which is a really powerful capability, which you know, could be new and could be exciting. For sure. And one of the things that you got one of the things that you are bringing up is that there will be more and more practical uses for this, there will be more things that we learn, and more things that technology companies are going to learn about where and how to apply these. And not only that, what's the right interface and exchange going to be between the human and the models? That, you know, that goes on? One question I have is on that point is that chest GBT really got pushed you in from a user experience as being a great chat tool, like I can talk to it like I'm interacting with text or messaging, and that data has become the driver of like all of these product experiences. But what you just talked about with now like MLMs, and knowledge base knowledge databases, do you have any predictions on how that whether that might change? Or now that standard has been set? And it's going to stay? What are your thoughts? Yeah, I think that like, you know, this is still very nascent and very early stages. And I don't think I'm bold enough to say that this will definitely work out this way. But I would say that we're very close to approaching this state where we can say, maybe a stronger general statement. And a general statement would look something like large language modules have not proved that they're ready to be used in high stakes, environments, environments where we expect low risk tolerance, where they will work exactly as you predict, or environments, we expect that they're going to be directly monitored or shorter, validated. What we're seeing is that old style modeling is still the key and the king in the space. And it really lends itself to us finding a way to make LLM useful, since they are a great use of technology, but not as the source of truth itself. Exactly. I would even argue one of the biggest things that chat GPT did is not as technology renovate, innovating, is like it is definitely a step forward, right. But we've been working on large language models for a while. Now. The biggest, like differentiator is that user interface. Everything previously has been, you know, command line programming. So yeah. It's going to be interesting to see how it, how it unfolds. But definitely for now, where truth matter matters and risk tolerance is low. Ellen's by themselves aren't are not your best bet. Interesting. Well, as always another great discussion and, you know, wondering, what do we have coming on our next episodes? Definitely. So now that we've hit the popular LLM, so we're going to be trying to kind of juxtapose what's, what's the big topics going on with the original goal of the podcast, which is, you know, taking AI down to the fundamentals and working up from there. So we're going to the next topics, we're going to discuss our data, the why what exactly is is AI, what is safe modeling mean? And kind of build up from there from the from the first principles, but also, while we're doing that journey, please provide as much feedback as possible. Any questions and we will do these special like, if we want to do another q&a on MLMs or some more detailed conversations on MLMs or anything else that is top of mind we'll definitely slot those in. But with without we'll be working through our agenda of kind of trying to be that novel. Let's do it the hard way. Let's let's actually break down each individual component and I think that will be a novel asset in a space. Perfect and like I said, as the as the voice of writer or, and being the AI funnel analyst. You know, we really there's a great community out there that really wants to understand this from from the foundational level. So we're definitely looking for anybody with questions or topics about this that have been burning for a while. So, Andrew said, as always, thank you for your candor on today's topic of MLMs and knowledge graphs. For all of our listeners, we do appreciate any questions that you have and would like us to discuss on future episodes. Thank you. Thanks