Quranlm's Podcast

https://quranlm.com/

Mission: Bridging Divine Wisdom with Modern Intelligence.

Description:

Welcome to the QuranLM Podcast, the show that merges the timeless revelation of the Holy Quran with the cutting-edge capabilities of Artificial Intelligence. Our mission is to "engender further curiosity" about the Quran by exploring its text through a unique, data-driven lens.

Each episode dives deep into the Divine text using advanced techniques—from uploading the Quran to the NotebookLM platform to search for structural patterns and numerical symmetries (S1 E1), to stress-testing frontier LLMs on Classical Arabic and complex legal texts (S1 E2).

What We Explore:

AI-Driven Insight: We move beyond simple interpretation to examine the Quran through "mathematical precision," using tools like linguistic accuracy and vector mapping to connect themes across chapters.
Theological & Ethical Dimensions: We analyze core spiritual concepts like constant remembrance (Dhikr), profound sociological challenges (e.g., the story of Zayd and Zaynab), and deep theological topics like the Day of Judgment and The Trust.
The Limits of AI: We tackle the critical tension between an LLM's coherence and its faithfulness. We openly discuss the dangers of hallucinations that cite non-existent verses and explore essential solutions like Retrieval-Augmented Generation (RAG) and abstention as a virtue to ensure system trustworthiness.

Our Foundation:

The underlying models and discussions are strictly trained on verified resources, including the Modern English Translation by Talal Itani (ClearQuran.com) and the Tanzil Quran Text. We situate our work within centuries of human scholarship, seeking to move AI from a pattern-matcher to a powerful, grounded tool for scholarly research.

Goal: Unlock your potential with expert Quran education and responsible AI application.

⚠️ Disclaimer: This podcast offers a data-driven tool for spiritual empowerment and research and is not an endorsement of any single theological interpretation of the Quran.

Would you like me to draft a shorter, more attention-grabbing title and tagline for this podcast?

All Episodes

Quranlm's Podcast

S1 E2: Faithful Code

December 08, 2025 • quranlm • Season 1 • Episode 2

0:00 | 15:23

Link: https://quranlm.com/

Mission & Core Tension

Welcome back to the QuranLM podcast, where we pursue the mission of "Bridging Divine Wisdom with Modern Intelligence." Our focus in this episode shifts to the critical tension between a probabilistic machine's confident tone and its capacity for complete factual error—especially when the subject is the Holy Quran.

We explicitly intend to probe the limits of Large Language Models (LLMs) and determine what it would take to make them truly trustworthy for sacred texts and Islamic scholarship.

Methodology: Stress Testing Modern Intelligence

We explore the text not by seeking structural patterns, but by stress-testing advanced AI models against the precision and complexity required by Divine Revelation. This process aims at uncovering the fragility of linguistic models and the ethical imperative of safety.

Each episode tackles a deep problem in AI application, analyzing diverse linguistic, legal, and theological challenges. Dive into discussions on profound concepts, such as:

Part 1: The Language & Logic Breakpoint (Where AI Breaks)

Classical Arabic Nuance: We examine how Classical Arabic preserves morphological nuance that most models fail to capture. We detail how even small changes, like removing diacritics, fundamentally alter meaning and degrade performance.
Stress Testing Inheritance Law: We push frontier LLMs to their limit using complex, conditional reasoning within Islamic inheritance law. This process exposes brittle chains of logic and the danger of authoritative-sounding hallucinations that cite verses that do not exist.

Part 2: The Ethical Imperative of Safety (How to Fix It)

Ethical Alignment & Abstention: Using the Islam Trust benchmark, we explore the need for ethical alignment across Sunni schools of thought. We highlight abstention as a virtue—the model knowing when to say "I don't know"—and how current systems falter under ambiguity.
Grounded Retrieval (RAG): We demonstrate why Retrieval-Augmented Generation (RAG) is essential. By chunking at the verse level and constraining answers to verified passages, RAG “chains the model to the truth,” sharply reducing fabrication and inventing doctrine.

Core Takeaway & Foundation

The core takeaway is simple and hard: LLMs must move from coherence to faithfulness. LLMs are pattern matchers, not authorities. They need curated data, ethical benchmarks, abstention policies, and grounded retrieval to serve scholarship instead of inventing doctrine.

We situate this work in centuries of human scholarship, drawing from historical context, manuscripts, and variant readings that continue to guide responsible system design for tools like QuranLM.

⚠️ Disclaimer: This discussion is not an endorsement of specific theological interpretations. We offer a critical, data-driven analysis for spiritual empowerment, safe research, and responsible technology development.

If this conversation resonates, follow the show, share it with a friend, and leave a review with your thoughts on where AI should draw the line.

Support the show

Setting The Stakes

SPEAKER_02 0:00

Welcome to the deep dive. Our mission today is well, it's highly specialized and the stakes are incredibly high.

SPEAKER_00 0:07

They really are.

SPEAKER_02 0:08

We're looking at the intersection of cutting-edge AI, specifically large language models, and one of the world's most complex and sacred texts, the Holy Quran.

SPEAKER_00 0:18

Aaron Powell We've got a fascinating stack of material here for you. We're bridging computer science benchmarks, deep linguistic studies of classical Arabic, and some really crucial research into ethical AI alignment.

SPEAKER_02 0:30

And the core question is: can a machine that works on probability can it really handle a text where there is absolutely no tolerance for errors?

SPEAKER_00 0:38

Aaron Ross Powell Exactly, where the sources tell us even a single incorrect diacritic can entirely alter the meaning.

SPEAKER_02 0:44

That's the tension right there. So we're going to be diving into things like inheritance law, ethical guidance, and finding out what happens when these models hallucinate religious doctrine.

SPEAKER_00 0:52

There are some major aha moments in the research.

SPEAKER_02 0:55

Okay, let's unpack this. So before we even get to the AI, we have to talk about the language itself.

SPEAKER_00 1:00

Yes, you have to.

SPEAKER_02 1:01

The Quran is written in classical Arabic. Al-Arabiah al-Fu. Now, for anyone not familiar, what makes this language such a huge challenge for modern NLP models?

SPEAKER_00 1:12

Aaron Ross Powell Well, it really comes down to its nuance and its structure. Classical Arabic is um profoundly complex. It preserves features that have mostly vanished from modern speech.

SPEAKER_02 1:24

Aaron Ross Powell Like the grammatical cases.

SPEAKER_00 1:25

Exactly. It has three full grammatical cases and declension, a system known as IRAB. For centuries, human scholars have relied on, you know, deep study to understand this.

SPEAKER_02 1:35

And that's the barrier for the machine, isn't it? It can't just absorb that context.

SPEAKER_00 1:39

Aaron Ross Powell It can't. The traditional grammar resources, they're not structured for a computer. They're written in prose.

SPEAKER_01 1:44

They assume you already know the rules.

SPEAKER_00 1:46

Right. They assume this deep intuitive understanding. For a computer, that means every word is potentially ambiguous. Is it the subject? The object? It just can't tell.

SPEAKER_02 1:55

Aaron Powell, that sounds like a complete non-starter for a computer scientist. So did they have to build specific tools just to handle this?

SPEAKER_00 2:01

They absolutely did. It took a massive effort. One of the key projects was the MSAQ dataset.

SPEAKER_02 2:06

Morphological and syntactical analysis for the Quran text.

SPEAKER_00 2:10

That's the one. Think of it as a kind of rosetta stone for Arabic NLP. It has over 131,000 morphological entries, 123,000 syntactic functions.

SPEAKER_02 2:22

All designed to translate that traditional IREB into a format an AI can actually learn from.

SPEAKER_00 2:28

And it worked.

SPEAKER_02 2:29

And the results were pretty stunning, right? Once they had this structured data.

SPEAKER_00 2:33

They really were. When they tested parsing algorithms on EBASQ, one model, the random forest model, hit 99.0% accuracy in predicting grammatical roles.

SPEAKER_02 2:43

So subject, object, things like that.

SPEAKER_00 2:45

Yes. It just shows you that if you feed the machine the right kind of structured data, it can learn the structure. Trevor Burrus, Jr.

SPEAKER_02 2:50

And there's a subtle point in the research about diacritics that I think is so important.

SPEAKER_00 2:54

The tiny marks on the letters.

SPEAKER_02 2:56

Right. What happens when you take them away?

SPEAKER_00 2:57

Aaron Powell Well, the studies show that removing those diacritics consistently dropped model performance by about two to three percentage points.

SPEAKER_01 3:03

Aaron Powell Which might not sound like a lot, but in a domain like this?

SPEAKER_00 3:06

It's a huge deal. It proves they aren't optional. They are absolutely essential for the nuanced meaning of the text.

LLMs On Inheritance Law

SPEAKER_02 3:13

Okay, so we've seen that very specialized models can do well with custom data. But what about the big general purpose LLMs? How do they perform on a really high-stakes task like Islamic inheritance law?

SPEAKER_00 3:26

Aaron Powell This is where you see a huge performance gap open up.

SPEAKER_02 3:29

And these tests were zero shot, right?

SPEAKER_00 3:31

Yes, zero shot with Arabic prompts. Just to clarify for you, that means the model is just using its general knowledge. No special training for this task.

SPEAKER_02 3:39

Aaron Powell Got it. So what did the numbers look like?

SPEAKER_00 3:41

Aaron Powell Some of the advanced Western models like O3 and Gemini did okay. They were in the low 90s.

SPEAKER_02 3:46

But the Arabic focused models.

SPEAKER_00 3:48

They struggled. Models like Funar and Alum scored below 50% accuracy overall.

SPEAKER_02 3:54

Wow. And I'm guessing the difficulty of the cases made a big difference.

SPEAKER_00 3:57

Aaron Powell A massive difference. That performance drop was really clear on the advanced inheritance cases. Alum, for instance, went from 58% on beginner cases all the way down to just 27.8% on the hard ones.

SPEAKER_02 4:08

And what makes those advanced cases so hard for an AI? Is it the sheer number of variables?

SPEAKER_00 4:13

Aaron Powell It's the complex legal reasoning, the conditional logic. Inheritance law is full of if-then scenarios based on intricate family relationships, and the models just couldn't follow that multi-step logic.

SPEAKER_02 4:25

Aaron Powell Which brings us to the biggest risk of all hallucination. What kind of errors did the research find?

SPEAKER_00 4:32

We saw some truly shocking examples. It goes way beyond just getting a calculation wrong.

SPEAKER_02 4:37

How so?

SPEAKER_00 4:38

In one case, the Gemini model fabricated an entire Quranic verse. It just made one up and attributed it to Surat El Nisa, a verse that does not exist.

SPEAKER_02 4:47

Aaron Powell Wait, hold on. So it's not just an error, it's an act of authoritative deception. It creates a convincing but completely false piece of religious text.

SPEAKER_00 4:57

Aaron Powell That's the danger. These convincing but incorrect responses are a serious, serious issue when you're dealing with the sacred text.

Measuring Ethical Alignment

SPEAKER_02 5:03

Okay, so it's clearly not just about raw accuracy, it's about whether the AI can align with the ethical framework of the faith.

SPEAKER_00 5:09

Aaron Powell Exactly. The whole conversation has to shift from quantitative failure to qualitative safety.

SPEAKER_02 5:15

And that's where something like the Islam Trust benchmark comes in. Tell us about that.

SPEAKER_00 5:18

Islam Trust is a multilingual benchmark built to check if an LM's responses align with consensus-based Islamic ethical principles. You know, across the major Sunni schools of thought.

SPEAKER_01 5:29

It's asking, does the AI get the core values right?

SPEAKER_00 5:32

Yes. And the results were pretty telling. The best model only achieved about 66.5% alignment.

SPEAKER_01 5:40

So a third of the time it's misaligned. Why do the researchers think that is?

SPEAKER_00 5:44

Two main reasons. First, there's just not enough nuanced Islamic ethical discourse in the general training data. These things are trained on the broad internet, not specialized scholarship.

SPEAKER_01 5:55

And the second reason.

SPEAKER_00 5:56

When the models face an ambiguous prompt, they tend to default back to a kind of generalized non-Islamic knowledge. They fill the gap with something that sounds logical but is doctrinally wrong.

The Case For Abstention

SPEAKER_02 6:05

That's a huge problem. And another study, FIQA, looked at rulings from the four major Sunni schools.

SPEAKER_00 6:10

Right, the Maliki, Shafi'i, Hanafi, and Hanbali schools.

SPEAKER_02 6:14

Which raises a really important question. Should an AI even try to answer if it's not 100% sure?

SPEAKER_00 6:19

And that is the core of it. This is the concept of abstention.

SPEAKER_02 6:22

The idea that it should know when to say, I don't know.

SPEAKER_00 6:24

Precisely, like a human expert. And in one test, it was fascinating. GBT 4.0 had the highest raw accuracy, but other models, like Gemini and Fenar, were much better at abstaining.

SPEAKER_02 6:37

So they were better at identifying the questions they couldn't answer reliably?

Retrieval Grounding To Prevent Errors

SPEAKER_00 6:41

Exactly. And that's a critical safety feature. You want a model that knows its own limits, especially here.

SPEAKER_02 6:46

And interestingly, even though the source questions were in Arabic, all the models did worse when they had to reason in Arabic compared to English.

SPEAKER_00 6:54

Yes, that just highlights that the linguistic challenge is still there. They default to their more robust English reasoning capabilities and then get things lost in translation.

SPEAKER_02 7:03

So we have these incredibly powerful models that are also prone to spectacular failure. How do researchers lock them onto the truth? The answer seems to be retrieval-augmented generations.

SPEAKER_00 7:14

Yes.

SPEAKER_02 7:17

It forces the LLM to ground its answers in a verified knowledge base, right? To stop it from just inventing things.

SPEAKER_00 7:24

That's it exactly. And we have a really clear example of the RG pipeline they used for Quranic question answering.

SPEAKER_02 7:31

Okay, let's break that pipeline down for everyone step by step. What's the first step?

SPEAKER_00 7:34

It starts with chunking. You have to divide the text into manageable units. For the Quran, the most logical unit is the verse, the ayah.

SPEAKER_02 7:44

And they do that by splitting the text at a specific Arabic symbol, right?

SPEAKER_00 7:48

Yes. The symbol which marks the end of a verse. Chunking by verse is critical because each ayah is a meaningful, self-contained unit of revelation.

SPEAKER_02 7:58

So once you have all these individual verses, what's next?

SPEAKER_00 8:01

Next is chunk embedding. Each verse is transformed into a high-dimensional vector. Think of it as a numerical fingerprint or a coordinate in a vast mathematical space.

SPEAKER_02 8:10

A 1536-dimensional vector in some cases. And this captures the semantic meaning.

SPEAKER_00 8:15

It captures the idea, not just the keywords. So when your query comes in, it's also embedded, and the system can find the verses that are conceptually closest using semantic search.

SPEAKER_02 8:24

And then the final step is the LLM.

SPEAKER_00 8:27

Yes, the LLM performs the final refinement. Yeah. It generates the answer. But, and this is the crucial part, it is only allowed to use the verified verses that were retrieved.

SPEAKER_02 8:37

It's chained to the truth.

SPEAKER_00 8:38

It cannot go outside that context. So it's physically prevented from making up verses or fabricating doctrine.

SPEAKER_02 8:44

And we saw the same principle in the E-Men framework for other texts, like Sahih al-Bukhari. Grounding the output just dramatically reduces hallucinations.

Manuscripts, Variants, And Context

SPEAKER_00 8:55

It's the only responsible way to use these models in such a sensitive context.

SPEAKER_02 8:59

You know, it's just fascinating to see computer science grappling with a text that scholars have analyzed for over a thousand years.

SPEAKER_00 9:06

And the human-led scholarly projects are still so vital? Yeah. Like the Corpus Quranicum.

SPEAKER_02 9:10

Right, a massive digital research project. What was its main goal?

SPEAKER_00 9:14

Its goal was to document the Quran's entire history and transmission. They cataloged early manuscripts, the manuscript of Quranica, and even used carbon dating on more than 40 ancient documents.

SPEAKER_02 9:25

What other kinds of information were they collecting?

SPEAKER_00 9:27

They documented Varia Lexona's Quranicae, the variant readings it developed, because early Arabic script often have few or no diacritical marks.

SPEAKER_02 9:36

The same marks that trip up the AI.

SPEAKER_00 9:38

The very same. And crucially, the project places the Quran in its historical context. The world of the Byzantines and Persian empires, early Christianity, and Rabbinic Judaism.

SPEAKER_02 9:49

Before we wrap up, let's touch on the structure of the text itself. The data showed this really interesting diversity in chapter length.

SPEAKER_00 9:55

It did. The Quran has 114 chapters, or surahs, and they vary wildly. Surah 2, Abukhara, is the longest. It's huge, with 286 verses.

SPEAKER_02 10:05

Almost 7,000 words.

SPEAKER_00 10:07

Right. And then you have Surah 108, Al-Kathar, which is one of the shortest at only about 10 works.

SPEAKER_02 10:12

And that reflects a dual purpose, doesn't it?

SPEAKER_00 10:13

It does. It shows a dual approach. You have the long chapters for detailed narratives and legal discussions, and then you have these short, powerful chapters for concise theological reminders. And AI has to be able to handle both.

SPEAKER_02 10:24

So this deep dive has really shown us this intense, necessary struggle.

SPEAKER_00 10:28

I think so.

SPEAKER_02 10:29

It's the push to make the power of modern AI accountable to the precision and the ethics of a sacred text. The challenge isn't just generating coherent text, it's about generating faithful text.

SPEAKER_00 10:40

And I think the core lesson is that in these high-stakes domains, you have to respect the limits of your tools. LLMs are probabilistic machines, they're pattern matchers, they are not knowledge-grounded reasoners.

SPEAKER_02 10:53

So you need those R RAG frameworks, you need those ethical benchmarks.

SPEAKER_00 10:57

You need them to make sure these models stay as tools for scholarship, not become sources of synthetic doctrine.

From Coherence To Faithfulness

SPEAKER_02 11:03

It all comes back to this concept of trust and responsibility that humans carry. And that's our final thought for you to explore. If humanity was given this trust, this responsibility to be stewards, what does it mean when we delegate interpretation of that very responsibility to a machine?

SPEAKER_00 11:20

Especially a machine that has to be constantly engineered and reminded to be faithful and to know when it should just say, I don't know. Who really holds that trust? The responsibility has to fall back on the human scholar who curates the data that guides the machine. We hope this has sparked your curiosity and encouraged you to explore this intersection further.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

Quran

Quick Start Guide to LLMs: Hands-On with Large Language Models

Anand V