The Macro AI Podcast
Welcome to "The Macro AI Podcast" - we are your guides through the transformative world of artificial intelligence.
In each episode - we'll explore how AI is reshaping the business landscape, from startups to Fortune 500 companies. Whether you're a seasoned executive, an entrepreneur, or just curious about how AI can supercharge your business, you'll discover actionable insights, hear from industry pioneers, service providers, and learn practical strategies to stay ahead of the curve.
The Macro AI Podcast
RAG Revealed -Boosting AI with Real-Time Data
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Ready to supercharge your business with AI that’s smarter, faster, and grounded in your data? In this episode of The Macro AI Podcast, hosts Gary and Scott unpack Retrieval-Augmented Generation (RAG)—a game-changing approach that combines real-time data retrieval with powerful AI generation. Perfect for business leaders, this episode dives into how RAG transforms industries, from manufacturing to insurance and legal, with practical examples like slashing customer service times and speeding up claims processing by 40%.
Why does RAG outshine standalone AI models? Scott explains how it taps into your proprietary info—like a retailer’s latest HR policies—delivers up-to-date answers, and cuts through AI “hallucinations” with secure, permission-based access. Gary breaks down the tech: prepping data in vector databases, retrieving context with embeddings, and generating spot-on responses. He also shares expert tips on measuring RAG’s performance—think “faithfulness” scores and retrieval tweaks—to ensure it delivers ROI.
But it’s not all smooth sailing. Scott highlights challenges like data quality, retrieval relevance, and security, offering solutions like role-based controls and encrypted databases. Looking ahead, RAG’s future sparkles with multimodal capabilities (audio, video, images), global language support, and personalization. Ready to start? Scott’s got your back with practical steps—start small, pilot with tools like LangChain, and scale smart.
Tune in to discover how RAG can make your business AI-ready—accurate, current, and competitive. Subscribe and join Gary and Scott every episode as they guide you through the AI era!
What is RAG - Marina Danilevsky, Senior Research Scientist @IBM
https://www.youtube.com/watch?v=T-D1OfcDW1M
Send a Text to the AI Guides on the show!
About your AI Guides
Gary Sloper
https://www.linkedin.com/in/gsloper/
Scott Bryan
https://www.linkedin.com/in/scottjbryan/
Macro AI Website:
https://www.macroaipodcast.com/
Macro AI LinkedIn Page:
https://www.linkedin.com/company/macro-ai-podcast/
Gary's Free AI Readiness Assessment:
https://macronetservices.com/events/the-comprehensive-guide-to-ai-readiness
Scott's Content & Blog
https://www.macronomics.ai/blog
00:00
Welcome to the Macro AI Podcast, where your expert guides Gary Sloper and Scott Bryan navigate the ever-evolving world of artificial intelligence. Step into the future with us as we uncover how AI is revolutionizing the global business landscape, from nimble startups to Fortune 500 giants. Whether you're a seasoned executive, an ambitious entrepreneur,
00:27
or simply eager to harness AI's potential, we've got you covered. Expect actionable insights, conversations with industry trailblazers and service providers, and proven strategies to keep you ahead in a world being shaped rapidly by innovation. Gary and Scott are here to decode the complexities of AI and to bring forward ideas that can transform cutting-edge technology into real-world business success.
00:57
So join us, let's explore, learn and lead together. Welcome to the Macro AI Podcast. I'm Gary Sloper joined as always by my co-host, Scott Bryan. And we're here to help business leaders like you navigate the wild, exciting world of AI. Whether you're looking to lead in the AI era or getting into the nitty gritty of cutting edge tech, we've got you covered. That's right, Gary. And today we're going to dive into a topic that's really starting to...
01:25
make more waves, it's retrieval augmented generation or RAG for short. It's an AI approach that's transforming how businesses leverage their data. And it's not just for the big, giant, large enterprises anymore. There are actually some tools and services that are allowing that to be a little bit easier for smaller and mid-sized businesses as well. We're going to break it all down for you, starting with a clear overview of what RAG is, then weaving in some real world examples, comparison,
01:54
to LLM only approaches, technical details, how we measure its performance, the challenges you might face, where tech is headed, and how you get started. So Scott, why don't you kick us off? What is your definition of RAG? All right. So picture this. Imagine your AI is like a super smart assistant who doesn't just rely on what it's been taught, but it can instantly pull up the latest information from your company's playbook.
02:24
So that's kind of rag in a nutshell. So it combines two powerful ideas, retrieval, grabbing relevant data from your documents or databases, and generation that everybody's familiar with. So using that data to craft accurate context, rich answers. Right. So it's providing your AI a custom library so it can flip through in real time instead of just guessing based on old training data. Exactly. Yep. So traditionally AI models like the big LLMs that we've talked about,
02:53
multiple times, they're, they're static. They're stuck with what they learned during training. For the most part, there are obviously now some new deep research products that are going to continue to evolve quickly. But RAG, it's, it's dynamic. It pulls from your company's knowledge base. Think of product manuals, customer tickets, internal wikis, SharePoint, you know, uh and uses that to respond. So it's like upgrading your AI from a, from a know it all to a know it now. uh
03:23
So you make an important point there around large language models becoming stuck or stagnant. They really can't grow and learn if you're not feeding the environment additional information. So once they are finished learning, they're really not learning any further. You can provide more information at the prompt of the LLM. However, you quickly become limited due to the context size window. So as we know, the context window size at the prompt is a number of words or tokens.
03:51
you can provide plus the number of words or tokens that are coming back in the response. it may, you know, in some of those environments, it looks like it's large character count, but this becomes a huge scaling issue. So if you're new to RAG, I highly recommend a great intro video on IBM's YouTube channel. Marina Denovleski, she's a senior research scientist. I'll put the link in the show notes, but she does a great job of walking.
04:18
beginners through the concept of RAG. And we're obviously going to go a little bit deeper, but I'll post that as well. So, so Scott, RAG is practical, it's grounded, and it's about making AI work for your business, not just a generic playbook. So where are you seeing this in action today? Uh, yeah, let's start with a large global manufacturing company or, well, say a midsize firm making industrial equipment. So they've got stacks of product materials, years of support tickets.
04:47
And before RAG, their customer service team was really drowning in lookups. So now they've hooked up RAG, a RAG system to all that data. So, you know, through the customer service process, customer asks questions like, why is my machine overheating? The AI can use natural language processing. can pull exact troubleshooting steps from the manual plus pass solutions and deliver a spot on answer.
05:14
So as a result, the customer experience is improved. get faster responses, happier customers. Right. So I completely agree, especially in that scenario, you have a traditional structured database in parallel to unstructured documents, like you mentioned. uh, support tickets also fallen into that category with rag. have a deep bench of dynamic data broken into formatted translation flowing into a.
05:41
large language model, which is stored in a vector database, which we'll talk about here shortly. This really allows searchable contextual information. So in my opinion, that's a win. What else have you seen in another vertical? Yeah, so how about a regional insurance company? So they use RAG to supercharge their claims processing. uh in that case, their system would dig into historical claims.
06:07
policy documents and guidelines to give their processes real time recommendations on different types of cases. So I think I read that one study show that they were able to cut processing down by 40 % while making their decisions much more accurate. So that's not just efficiency, but it's also a competitive edge. Yeah, that makes sense. I bet legal firms are loving this too. I mean, the model of RAG reduces the amount of time and costs of
06:37
training that LLM. So using a vector database with the generative AI environment can provide, ah you know, where that source data originated, namely citing it. So imagine a mid-size practice reviewing contracts, RAG could pull up, you know, the relevant case law or past agreements in seconds. Plus if the cited sources, Gary's blog, and I'm not a licensed attorney, I'm just ranting on a statute, it could prevent misrepresentation in a case.
07:07
Yep. Yep. There's been a few of those in the news recently too. ah But just that aside. So one example, one legal firm started with RAG for contract reviews, and then they scaled it up to case law research and compliance checks all without rebuilding their system. So it's scalable, affordable, keeps their lawyers focused on doing what they should do best, which is winning cases and not researching and digging through files.
07:37
Right. So this, these examples really in this scenario screams practicality. Um, we talked about some of the obvious limitations without rag. Why does it beat out just a single, you know, large language model on its own in your opinion? And probably in data too, right? Like what we see. Yeah, exactly. I think you were right on with some of your earlier points. Um, but let's, let's just dig in a little further here, a little more, a little more technical. So a standalone LLM.
08:06
Uh, like GPT four, llama, they're obviously impressive. Uh, they're trained on massive public data sets and they can turn out answers that, that sounds smart. Um, but there is a catch. It's limited to what it learned when it was trained. Uh, so ask it about your company's latest HR policy update. And it's obviously not going to know it doesn't, it doesn't have access to your company specific data. So it's clueless, uh, or worse, you know, it could actually hallucinate. So it could, it could make something up.
08:36
Right. So it's sort of like a know it all who's stuck in the past and sometimes just bluffs. Yeah. And I think they don't bluff intentionally. They just, know, it just comes out that way. so think about um retail. Obviously we do, we do a lot of work in retail. Imagine a midsize retailer that's rolling out a new remote work policy in January, 2025 without rag and LLM might guess it's about
09:07
flexible hours based on old public data and kind of a vague story, which would be total fiction. But with RAG, you can feed it your updated internal HR documents. So say the exact policy PDF, and then it nails that answer that's up to date. So it's a hybrid work model with specific guidelines for office days and tech support, you the latest details on tech support. that'd be their first big win. So RAG would tap into their private data and not just
09:36
you public information that's out there. Right. And it keeps things current too. Exactly. Um, so LLMs, uh, they're just, there's obviously starting to get more intelligent and searchable, but in general, they don't get news updates. So once they're trained frozen in time, RAG though, pulls from a vector database that you can refresh daily. So, you know, a new policy tweak, updated, uh, updated policy handbook, RAG will do it.
10:03
So while an LLM only setup is still stuck in, you know, a training set from early 2024, you know, you're up to date with, with rag. So, plus it also cuts down on hallucinations that we talked about by grounding the answers in the real retreat facts, even citing the sources. So you can actually, and oftentimes if you're live on with a customer, you can see what exactly it came from and say, yeah, it came from this policy document, know, relative peer.
10:31
Right. So it's more like a trust booster. what about security? Can it handle sensitive things? Yes, certainly. Yeah. And that's kind of a key attribute. So an LLM alone can tell you who's asking or what they're allowed to see. It's a one size fits all machine. um RAG though, you can hook it to a system that checks permissions. So say a junior staffer asks about salary tiers.
11:00
They're not going to get anything unless they're cleared for it. An HR manager with access, Rack will pull the right files. So you can put in fine grain controls. So perfect for confidential data, like employee records or strategic plans. So I think that's a good example. Right. So it's really more accurate up to date, secure. To me, that's a no-brainer for any business with proprietary info. So how do we pull off all of this magic? All right.
11:30
techies. here we go. um rag has three main pieces. So first you've got your data prep and we talked about that in a, in in a, one of our episodes. So you've got your company documents, know, PDFs, databases, whatever, and they're organized and indexed into a vector database. Like we talked about a little earlier in this episode. So think of it like a super searchable filing cabinet with every piece of info gets a unique numerical tag based on its meaning.
11:59
So it's not just keywords, it's smarter than that. Yeah, way smarter. it uses something called dense vector embeddings. So really it's complex math that captures the context of your data. um So then the next step, step two is the retrieval. So when a question comes in like, uh how do I fix this widget? The system searches the vector database, not just for keyword matches, but for stuff that's conceptually related to
12:28
fixing that widget. So it might grab a manual page or a past ticket or multiple tickets, know, things that are, that are right there and specific to what you're trying to accomplish. Yeah. And then the magic happens. Yep. So step three is the generation piece. So an LLM, like, you know, GPT, it can take that question and the retrieved info and then craft that response. So the retrieval,
12:57
gives it the facts and the model makes it sound human. So it's like a brain and the library, your custom library working together. That's very slick. Once it's up and running, how do you know if it's actually working well? Yeah, good point, Gary. I think this is kind of your, your wheelhouse. So why you take that one? You know, how, do businesses measure if their rag system is, is delivering for them? Okay. I'll take this one for sure. I mean,
13:25
I love metrics, but not paralysis through analysis. So measuring rag performance is key because it's not just about setting it up. It's about making sure it's worth the investment. Rags got a lot of moving parts, retrieval, generation, all that jazz. So you need to test it like a fine tune machine. So the goal in this scenario, what is it? Figure out if it's pulling the right info and giving answers that you think hit the mark in the response.
13:54
Yeah. So where do you start? So what's the first thing you might take a look at? So start with the basics tests, tested against real world questions, your team or customers might ask. Say you're the manufacturing firm we mentioned before, throw it, throw in, how do I fix an overheated machine? You've got to already got a gold standard answer in your manual. Check if rag retrieves that exact chunk and turns it into a solid response. You're basically comparing what it should do. Pull the right docs and nail the answer.
14:23
to what it actually does. So that'll give you, know, is it retrieving the right information? Yeah, good example. That makes sense. But how do you judge if the answer is good and not just, you know, close enough? Well, that's where it gets fun. You can use metrics like faithfulness. Does the answer stick to the retrieved info without going off script? There's even tech with another AI, a uh judge LLM.
14:50
scores it for you. reads the context and the responses. Yes. Nine out of 10. This matches. look at that, the relevance, did it grab the most useful docs and overall quality? Does it sound clear and helpful to the person asking? It's like sort of grading a student accuracy, focus and polished this, uh, on their, their final, final paper. Yeah. that, uh, that judge ideas, that's a good analogy.
15:20
Um, but isn't it tricky since, uh, rag has both retrieval and generation and how do you kind of untangle what's working and what's not? Yes, you're spot on. That's why you break it down. So evaluate retrieval separate from generation. If your system's pulling, you know, essentially relevant data, like a recipe instead of a policy, your retrieval's off and no fancy LLM can fix that. If it's grabbing the right documents, but the answer's gibberish, your generation needs to work.
15:50
Test each piece in my opinion, right? So tweak the chunk sizes, try different models until both are really firing on all cylinders. It's not a one size fits all. You've got to experiment and find what clicks for your business. That's why I do recommend if you have a Lighthouse customer, they may ask things that your team may not be willing to ask and test. Right. Yeah. So your team needs to be able to do constant tuning. um Any pitfalls?
16:19
So you can think of to watch out for? Oh yeah. So don't just set it and forget it. Rags performance can slip if your data changes, right? Or if your users needs changing. ah So keep a test suite handy. Think of 50 questions and run it regularly. ah If the score drops,
16:41
I would say dig in. Maybe your docs got messy or the model's drifting. it's, it's active management at the end of the day, but that's how you keep it delivering a return on investment. Yeah. That sounds, sounds like a smart way to stay on top of it. Um, but even with good performance, there's gotta still be some hurdles there, right?
17:01
Um, so, you know, with that, uh, question, let's just assume that, you know, no, no tech is going to be perfect. So one big challenge is data quality. So if your documents are a mess, outdated, maybe they're incomplete or poorly organized, then rags going to struggle. So it's like asking a librarian to find a book in a pile of chaos. So companies often need to audit and clean their data first. That makes sense. Garbage in, garbage out.
17:30
Yep, exactly. Yep, the famous phrase. So another hurdle is retrieval relevance. So sometimes the system grabs stuff that's off topic, like pulling a recipe when you're asked about taxes. So researchers are working on sharper retrieval methods, like a little technical here, but dense passage retrieval or self-reflection tricks to fix some of that. But then there's cost and scale. So running retrieval and generation together can get pricey.
17:58
especially if your data is huge or you need real-time answers. Right. And security has to be a big one, especially for sensitive things like legal or medical data, right? Absolutely. That's huge. ah So with RAG, pulling from your internal data, you've got to worry about who sees what. So imagine a healthcare firm, know, patient records are gold, but they're also a privacy minefield. So if the wrong info leaks, you're looking at lawsuits or worse.
18:27
So take a legal firm with client confidentiality. Obviously that's non-negotiable. So a breach could completely sink their reputation immediately. So even in manufacturing trade secrets in those manuals, um that could uh obviously give competitors an edge if they get out. Yeah. And I know we're going to have a future episode around clean data, but also you have to think about
18:56
Even if you didn't have an external compromise of the data, just internally, has anybody updated that data? Has anybody gone and used it for their own test case and made modifications to it? So you could have internal vulnerabilities. So that's why governance is really, really important. So, so how do you lock it down? Yep. So smart businesses and
19:21
Centers of excellence, AI center of excellence is they're putting security right at the top of the list and tackling it head on. So first, uh role-based access controls are back for short. It's like giving each employee a key only to the rooms that they need. So customer service rep sees a product FAQs, but not HR files. uh Second would be data filtering. So you can set rules. So sensitive stuff like social security numbers.
19:50
client names will get masked or skipped during retrieval. um Some are even using encryption on the vector database itself. So even if someone hacks in, it's just gibberish without the key. um And for the paranoid, I mean, you know, the cautious, there's the on-premise deployment. So the whole rag system in-house off the cloud for total control. And that's why we're seeing a little bit of an uptick in inquiries about, you know, the location again.
20:19
Right. Yeah, that's exactly right. We're seeing that hybrid cloud design come back with a charge. So that's reassuring. It's not just about building RAG. It's about building it the right way. So where's the tech headed in your opinion? Yep. Good, good question. think, you know, future is, future is everything. Future is bright. So we're seeing RAG evolved beyond text. So think about multimodal RAG where it handles audio, video, or even images.
20:49
So imagine a retailer, again, we'll go back to the retail example, using it to analyze customer call recordings alongside product specs to solve issues faster. Or doctor who could pull up the latest research and patient scans in one go. Hmm. Yeah, that'd be next level. So what about global reach? Yep. Yep. No brainer, multi-lingual rag.
21:15
That's picking up steam. So supporting low resource languages like Indic dialects. So businesses can go truly global and efficiency getting a boost too. So techniques like a hierarchical retrieval where it summarizes big data sets into bite-sized chunks. That's cutting costs and speeding things up a bit. Okay. So I bet personalizations on the horizon as well. Yep. You nailed it. Yep. So future rag could adapt to your preferences or context.
21:44
So like tailoring answers based on your past queries, uh, plus ethical tweaks are coming. reducing bias and boosting transparency. So you can trust when it spits out. Okay. So everything we just talked about rag, it sounds like it's becoming even more indispensable. So for our listeners, itching to jump in, where do they start? Yeah. So, uh, like we often recommend start small. So.
22:14
pick one pain point, like customer support is really an entry level for a lot of enterprises or internal searches where better info access could really shine retrieved access and leveraging the LLM to explain it. So step one would be, you know, gather your data, clean your data, which we'll talk about in an upcoming episode. So product docs, FAQs, whatever's relevant, clean it up, make sure it's current and structured. Yeah, that makes sense. I mean, even
22:44
IT help desk organizations, have, you know, a set library on areas that they support. That would be another easy one for them to go get an internal win to reduce some of the cycle time on tickets. So we have all that. So then what, what's the next step? Yeah. Next step is, uh, pick a platform. And a lot of enterprises are working with consultants just to, because consultants can stay right on the cutting edge of everything that's available and how it might fit into your ecosystem.
23:14
or how you might be able to do things cost effectively. But tools like Langchain, uh that's an open source or open source options like Face or FICE, that's a Facebook's AI similarity search. Those rag setups make it easier than ever. So you hook your data into a vector database, pair it with a solid language model uh and test it out. So start with a pilot, see what works. And scale from there. Exactly.
23:43
Yeah. Once it's something your team understands it there, they know how to maintain it and optimize it. Then you add more data sets and use cases. So again, maybe, you know, bringing a consultant where necessary, but there, there are plenty of AI specialists out there that can get you rolling fast. Gary and I have access to lots of them out there with different specializations and experiences. So the key is to experiment, measure results and grow it organically. Yeah. I couldn't have said it better.
24:13
mean, in the non-technical way, it's practical, doable and game-changing. To me, that's rag in a nutshell. Yeah. So that's our deep dive into retrieval, augmented generation, folks. It's not just uh tech, it's a tool that makes your business smarter, faster, and ultimately more competitive. Couldn't agree more, Scott. Thanks for tuning into the Macro AI Podcast. If you like this, subscribe, give us a like and share it with your network.
24:42
Also, let us know what AI topics you want next and feel free to contact us on the macroaipodcast.com. That's our new website, macroaipodcast.com. Until then, keep leading in the AI era. Yep. See you next time. Thank you. Thanks, Scott. See you soon.