Mastering AI Security: Top 10 Risks and Mitigations for LLMs Artwork

Secured by Design - IAM & Cybersecurity Podcast

Great security solution are designed from the ground up..
Secured by Design is a podcast where Santosh shares practical insights, frameworks, and perspectives on identity security and other aspects of cybersecurity.
Each episode breaks down complex concepts into actionable ideas for professionals protecting digital identities, designing secure systems, and leading security initiatives.
Because true security is built and not bolted on...

All Episodes

Secured by Design - IAM & Cybersecurity Podcast

Mastering AI Security: Top 10 Risks and Mitigations for LLMs

May 10, 2026 • Season 1 • Episode 15

0:00 | 27:09

Summary
This episode explores the top 10 security risks associated with deploying large language models (LLMs) and AI systems. It provides practical insights and mitigation strategies to help organizations secure their AI implementations effectively.

Keywords
AI security, LLM risks, prompt injection, data leakage, supply chain security, poisoning, output handling, system prompt leakage, misinformation, resource exhaustion

Key topics
Prompt injection vulnerabilities
Sensitive data leakage in AI systems
Supply chain risks in AI deployment
Data and model poisoning techniques
Handling AI-generated outputs securely
Managing AI agent autonomy and permissions
System prompt leakage and its implications
Weaknesses in vector and embedding systems
Hallucinations and misinformation in AI
Resource exhaustion and denial of service in AI

Chapters
00:00 Introduction to AI Security Risks
04:55 Prompt Injection: The King of Vulnerabilities
11:48 Supply Chain Vulnerabilities in AI Systems
18:47 Improper Output Handling and Its Risks
24:59 Misinformation and Hallucination Problems

Resources
OWASP Top 10 for Large Language Models (https://owasp.org/www-project-top-10-for-large-language-model-applications/)

Let’s Stay Connected

📧 Email: santosh@getitrightsoln.co.uk

🔗 LinkedIn: linkedin.com/in/kssantosh

SPEAKER_00 0:18

AI is everywhere. Your customer support bot, your code assistant, your internal knowledge base, your H chat bot that answers questions about parental leave. AI is woven into the fabric of modern business. And that's fantastic. But here's the uncomfortable truth. Most organizations deploying these systems are doing so without understanding the unique security risks they are carrying. And that's exactly what we are fixing today. Welcome to Secured by Design, the podcast that puts security at the center of how technology and businesses are built. I'm Santosh. Each episode I break down what it really takes to secure every layer of your organization: identity governance, zero trust, AI, cloud, compliance, and beyond. No fluff, just the insights and perspectives you can actually use. Let's make security the blueprint, not the patch. We're walking through the OWASP top 10 for large language model applications. The gold standard list of LLM specific security risks compiled by some of the sharpest minds in cybersecurity. 10 risks, real-world breaches, practical mitigations, and I promise we'll keep it conversational, not a lecture. Think of this episode as your security briefing before deploying anything with AI in it. Let's get to it. Before we dive into the list, I want to take two minutes to zoom out. Because if you caught our previous episode on OASP top 10 for agent tick applications, some of what we are about to cover is going to feel familiar. And that's not a coincidence. There's a really important relationship between these two frameworks. Think of it this way: the LLM top 10 is about the engine, the model itself, and the risk baked into how language model works. The Agentic top 10 is about the car. What happens when you give that engine, wheels, a steering wheel, and let it drive itself around your infrastructure. The two lists overlap in meaningful ways, and I think it's worth calling out those explicitly. These two frameworks are complementary. If you are building or securing an agent application, you need both. The LLM risks tell you how to secure the model, the agent tick risks tell you how to secure the system it operates in. If you haven't listened to that previous episode yet, I strongly recommend going back to it after this one. The pair of them together give you a far more complete picture of the AI security landscape than ADA does alone. Right, with that context set, let's get into the LLM top 10 itself. The top of the list is prompt injection. Prompt injection is the king of LLM vulnerabilities right now. If you only remember one risk from this episode, make it this one. It's been at the top of the OASP LLM list since day one and it's not going anywhere. So here's the analogy. Imagine you hired a very capable but extremely literal personal assistant. You tell them, do whatever the notes on my desk says. Now, a bad actor sneaks in, replaces your notes with their own, and suddenly your assistant is booking flights to Timpec 2 and forwarding your emails to a stranger. That's prompt injection. The assistant trusted the instructions without verifying who wrote them. In technical terms, an attacker crafts malicious input that hijacks the model's behavior, overriding the system prompts or developer instruction. There are two flavors here: direct injection, where the user talks to the model directly and tries to manipulate it, and the indirect injection, where the malicious content is hidden in the data the model reads, like a web page, a PDF or an email. Let's see an incident in 2023 to explain this. In 2023, researchers demonstrated that Microsoft's Bing Chat could be manipulated via hidden instructions embedded in web pages, telling the AI to change its personality, reveal its systems prompt, and attempt social engineering against users. This was indirect prompt injection in the wild. Similar attacks have been demonstrated against AI coding assistants where malicious comments in open source code repositories were designed to manipulate the AI into generating insecure code. To mitigate against this, firstly enforce strict privilege separation. The AI should not have access to sensitive actions unless absolutely necessary. Treat all external contents as untrusted data. Never let it overwrite system level instructions. Implement human confirmation loops for high-risk actions like sending emails, executing code or accessing databases. Use prompt hardening. Design your system prompt to be resistant to overrides and instruct the model to be skeptical of instructions embedded in user content. Regularly red team your AI systems with injection attempts. Here's the bottom line prompt injection is to AI what SQL injection was to web apps in the early 2000s. We are in the same era of discovery and pain. Build your defenses now. Moving to number two, sensitive information disclosure. Think of this as the AI equivalent of accidentally CCing the entire company on an email you just meant to send to one person. Except the AI doesn't know it's doing it and the information might be deeply sensitive. LLMs can inadvertently leak confidential information in several ways. They might reveal details from their system prompt, they might surface training data, including private information that was scraped during training, or they might expose data passed in through retrieval augmented generation pipelines. The model doesn't distinguish between things I should share and the things I should never share. You have to build that boundary yourself. To give you an example, Samsung made headlines in 2023 when employees pasted proprietary social code and meeting notes into ChatGPT, which then could potentially use the data for future model training. Samsung subsequently banned the use of generative AI tools internally. This was a policy failure, but it illustrates how easily sensitive data can flow into and potentially out of AI systems. To mitigate, never train or fine-tune models on sensitive data unless you have strict data governance in place. Standardize what goes into your retrieval pipelines. Implement data classification before feeding documents to your DAI. Apply output filtering. Detect and redact PII credentials or classified information in responses before they reach the user. Implement need-to-know access controls on your RAG. Have a clear, acceptable use policy for AI tools and actually enforce it. The adage of what goes in must come out has never been more relevant. If you wouldn't paste something on a public notice board, don't paste it in an NLM without understanding where it goes. Number three, supply chain vulnerabilities. This one is near and dear to the hearts of anyone who's been following traditional software security and now it's got an AI twist. When you deploy an AI system, you're not just trusting the model itself, you're trusting the pre-trained model tweets from whoever published them. You're trusting the fine-tuning dataset. You're trusting third-party plugins and tools. You're trusting the vector database, the embedding model, the orchestration framework. That's a massive attack surface. And most of it is invisible to you. It's like ordering a ready-made meal. You trust the restaurant, but the chicken came from a supplier who got it from a farm which used feeds from somewhere else. If any link in the chain is compromised, your meal is compromised, and you'd never know by looking at the plate. To illustrate this, in 2024, researchers discovered that the Hugging Face model repository, which is almost like a GitHub for AI models, had been used to distribute serialized Python Pickle files containing malicious code. Thousands of models are downloaded daily with minimal scrutiny. A compromised foundational model or embedding library could poison every downstream application built on top of it. To mitigate against this, whet your model providers. Use only reputable auditable sources for foundational models. Verify model checksums and signatures before deployment. Maintain a software bill of materials for AI components just as you would for traditional software. Regularly audit third-party plugins and integrations. Apply the same due diligence to AI libraries as you would to open source code packages. The solar winds attack taught us about the dangers of blindly trusting the supply chain. AI is no different. Arguably it's worse because the opacity of model weights make tampering incredibly difficult to detect. Number four is data and model poisoning. If supply chain vulnerabilities are about what you build on, poisoning is about correcting the foundation itself. Poisoning attacks occur when an adversary tampers with the training data or fine-tuning data to embed malicious behaviors into the model. The model learns to behave normally in most cases, but exhibits malicious or biased behavior when triggered by specific inputs, a bit like sleep a cell. You would never know until the trigger fires. Imagine hiring someone who seems perfect in every interview and reference checks, but on day one they hand over all your client data to a competitor. That's a model backdoor. It passed all the tests, the trigger was hidden. To mitigate against these, rigorously audit and sanitize training data. Implement data performance and lineage tracking. Use anomaly detection on model outputs to identify unexpected behavioral shifts. Test models against known adversarial prompts and edge cases before deployment. Limit access to fine-tuning pipelines. Treat them as critical infrastructure. The scariest thing about poisoning is the dwell time. A poisoned model might sit in production for months before the trigger is even activated. By then, the damage is done. Halfway through, and we are into number 5. Improper output handling. This is where AI meets classic application security and it's a beautiful mess. LLMs generate text. That text often gets used, rendered in a browser, executed as code, passed to a database query, processed by another system. If you pass that output directly into downstream systems without validation or sanitization, you are essentially treating the AI's output as trusted input. And as any season security professional knows, you should never trust input. Here's the kicker. The AI might generate perfectly valid malicious payloads. Ask it to write HTML for a web page and under the right attack scenario, it might generate a script tag that exfiltrates cookies. Not because the AI is malicious, but because it's very good at generating realistic content, including contents that happens to be an exploit. As a mitigation, never render AI output as raw HTML without sanitization. Treat LLM output as untrusted user input at all times. Validate, encode, and escape accordingly. Use sandbox execution environments for any AI generated code. Implement output schema validation. Ensure the AI's response conforms to expected formats. Apply content security policies on any interface that renders AI output. The irony here is rich. The AI generates the attack payload and your own application executes it. You've essentially outsourced your vulnerability to a chatbot. Don't let that happen. Number six is excessive agency. This one is becoming increasingly relevant as we move towards agentic AI. AI that doesn't just talk but acts. Excessive agency occurs when an LLM-based agent is granted more permissions, access or autonomy than it needs to accomplish its task. The principle of least privilege applies just as much to AI as it does to human users and service accounts. When any agent can read files, send emails, query databases, execute code and call external APIs all at once without oversight, you have a catastrophe waiting to happen. Think of it like giving a temp worker on their first day full admin access to every system in the company. Plus a corporate credit card with no limit. Sure, they might be perfectly competent, but if something goes wrong, if they are tricked, if they make a mistake, if they are malicious, the blast radius is enormous. In 2024, an A agent integrated into a customer service platform was manipulated via prompt injection to issue refunds. In 2024, an AA agent integrated into a customer service platform was manipulated via prompt injection to issue refunds it wasn't authorized to give. The agent has been granted broad API access for flexibility, and that flexibility became a liability the moment an attacker found the injection point. To mitigate, apply strict lease privilege access. A agents should only have the permission they need for specific task. Require human in the loop approval for high impact actions, sending emails, financial transactions, data deletions, external APA calls, etc. Design for minimal footprint. Your A agent shouldn't need access to your HR database to answer customer service queries. Implement rate limiting on agent actions to prevent runaway automation. Log and audit all agent actions for post-incident review. Autonomy is the feature. Accountability is a safeguard. If your agent can do something irreversible, make sure a human has to authorize it first. Number seven is system prompt leakage. This one surprises people because it sounds minor and then you realize what a system prompt actually contains. The system prompt is a set of instructions developers give to an LLM before any user interaction. It might contain the personal definition, business logics, access credentials, information about internal systems or proprietary instructions that represent significant business value. When users can extract this through clever prompting, they gain a roadmap to your entire AI architecture and potentially credentials, endpoint URLs, and behavior guardrails that can now be bypassed. It's a bit like giving every customer a copy of your staff manual, including the section on how to override the till and process refunds without a manager code. You meant for it to be internal, but if someone asks the right question, out it comes. To mitigate against this, never embed credentials, API keys or internal URLs directly in system prompts. Instruct the model explicitly not to reveal its system prompts, though it's not foolproof. Design a system so that system prompts is not a single point of failure. Even if it leaks, it should not grant access to anything sensitive. Monitor for prompt extraction attempts in your logging. Use secret management system, not hard-coded values and prompts. Number eight is vector and embedding weaknesses. This is a newer entry and it reflects the explosion of retrieval augmented generation systems where AI is given access to a searchable knowledge base. In RAC systems, documents are converted into numerical vectors, embeddings stored in a vector database and retrieved at query time to give the AI relevant context. The attack surface here is threefold. The embedding model itself could be compromised, the vector database could be poisoned with malicious content, and the access controls on what gets retrieved could be insufficiently enforced, leading to data cross contamination between users. As a mitigation, implement strict access controls on your vector database. Enforce user level and role level permissions on what can be retrieved. Validate and sanitize all documents before they are ingested in the vector store. Use metadata filtering to ensure users can only retrieve content they are authorized to see. Monitor for anomalous retrieval patterns. Example, one user's query suddenly surfacing content from another user's namespace. Treat your vector database as a sensitive data store, not just a search index. And number nine, misinformation, also known as the hallucination problem. And it's a security risk, not just a quality problem. LLMs confidently generate plausible sounding falsehoods. They don't know what they don't know. They will cite non-existent court cases, invent medical dosages, fabricate historic events, and produce security advisories for vulnerabilities that don't exist, all with the same confidence as accurate information. When these outputs are consumed without verification, the consequence can range from embarrassing to catastrophic. If your AA power chatb tells a customer the wrong medication dosage or addresses the developer to use the deprecated cryptographic function, or tells a legal team that a certain contract clause is legally binding when it isn't, that's a security and liability risk. Garbage in, garbage out. But even the reliable input can produce garbage output. In 2023, a US lawyer submitted a legal brief citing several court cases, all fabricated by Chat GPT. The cases did not exist. The lawyer faced sanctions and the story made global headlines. More dangerous from a security perspective, AI-generated advisories and documentation about non-existent CVEs have appeared in security forums, potentially leading practitioners to remediate phantom vulnerabilities while ignoring real ones. As a mitigation, ground your AI in verifiable sources using RAG. Controlled knowledge bases reduces hallucination significantly. Never deploy AI for high-stakes decisions without human review in the loop. Implement confidence scoring and surface uncertainties to users rather than presenting everything with equal assurance. Build fact-checking pipelines where AI output is cross referenced against authoritative data sources before delivery. Educate your users on AI limitations. Trust but verify is not option. And final risk on our list unbounded consumption, sometimes called resource exhaustion. Think of this as a denial of service attack. Reimagined for AI era. LLM inference is expensive. Every token processed costs compute, memory, and money. When an AI system has no limits on how many requests a user can make, how large those requests can be, or how long the model is allowed to generate output, you have a financial and availability risk waiting to be exploited. Attackers can craft inputs that force the model into extremely long or recursive generation loops, or simply flood the system with requests until it becomes unavailable to legitimate users. It's the classic tragedy of the commons. Everyone can use the resource, so someone abuses it and now nobody can. Except in this case, the commons is your GPU cluster and someone's running up your AWS bill like it's a competition. Multiple AI API providers have reported abuse cases where bad actors, often using compromised API keys or free tier accounts, ran automated mass inference jobs to mine responses for sensitive information, train competing models or simply exhaust rate limits. The cost of successful LLM denial of service attacks on a poorly configured deployment can run into tens of thousands of pounds in compute cost within hours. To mitigate against this risk, implement rate limiting at every layer per user, per session, per API key. Set maximum input token limits and maximum output token limits. Monitor spend in real time and set hard budget caps with automatic alerts. Detect and block unusual usage patterns. One user querying a thousand times in an R is a signal, not an anomaly to ignore. Require authentication for all LLM endpoints. Never expose raw AI inference to unauthoricated users. Alright, 10 risks. Let's bring it home. If you're building with AI, deploying AI or responsible for defending systems that use AI, the OASP LLM top 10 is your checklist. Not a nice to have a checklist. There are 10 risks at a glance. Prompt injection where attackers hijack the AI's instructions, treat all inputs as untrusted. Sensitive information disclosure where the AI leaks what it should never share. Control your data pipeline. Supply chain vulnerabilities where you inherit risks from every model, plugin and library you use. Know your chain. Data and model poisoning where corrupted data training can create backdoor models, audit your foundations. Improper output handling where AI generated contents can be weaponized downstream. Sanitize everything. Don't leave it unguarded. Vector and embedding weaknesses where RAG systems need access controls as much as databases do. Misinformation where confident AI output is not necessarily correct. Verify before you trust. Unbounded consumption where unlimited access means unlimited liability. Rate limit everything. Here's my parting thought. AI is not inherently insecure, but it's insecure by default in the same way that an unlocked door is not inherently a security incident. It only becomes one when someone walks through it. The threat actors are paying attention. The question is, are you? Security is not a feature you bolt on after the facts. It's a discipline you built from day one, and wasp community had done the hard work of identifying exactly where the dangers are. Now it's on us to act on it. Thanks for listening to Secure by Design. If today's episode gave you something to think about, subscribe and follow on your favorite streaming platform for more discussions on identity and cybersecurity. Please do consider taking your time to rate the show. I would really appreciate that. That's one of the best ways for you to support this podcast and help grow this by providing feedback. Please feel free to share it with your team. You can also connect with me on LinkedIn for updates and new episode releases. Until next time, stay secure, stay resilient, and stay secure by design.