#83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo)

DataTopics Unplugged: All Things Data, AI & Tech

DataTopics Unplugged: All Things Data, AI & Tech
#83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo)
Apr 11, 2025
DataTopics

Send us a text

Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.

Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

In this episode, host Murilo is joined by returning guest Paolo, Data Management Team Lead at dataroots, for a deep dive into the often-overlooked but rapidly evolving domain of unstructured data quality. Tune in for a field guide to navigating documents, images, and embeddings without losing your sanity.

What we unpack:

  • Data management basics: Metadata, ownership, and why Excel isn’t everything.
  • Structured vs unstructured data: How the wild west of PDFs, images, and audio is redefining quality.
  • Data quality challenges for LLMs: From apples and pears to rogue chatbots with “legally binding” hallucinations.
  • Practical checks for document hygiene: Versioning, ownership, embedding similarity, and tagging strategies.
  • Retrieval-Augmented Generation (RAG): When ChatGPT meets your HR policies and things get weird.
  • Monitoring and governance: Building systems that flag rot before your chatbot gives out 2017 vacation rules.
  • Tooling and gaps: Where open source is doing well—and where we’re still duct-taping workflows.
  • Real-world inspirations: A look at how QuantumBlack (McKinsey) is tackling similar issues with their AI for DQ framework.
Episode Artwork #83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo) 50:49 Episode Artwork #82 AI Cracked a Decades-Old Science Problem & Europe’s Push for Digital Sovereignty, The Secret Behind MCP, and The Cloud Hack That Slashes Costs By 62%. 1:21:02 Episode Artwork #81 AI Code Assistants: The Good, The Bad & The Overhyped, plus Python’s UV Glow-Up & Postman’s Existential Crisis 1:12:52 Episode Artwork #80 AI Agents Run Wild, DeepSeek Breaks Records, Polars Cloud Expands, and Perplexity Reinvents Search 1:07:27 Episode Artwork #79 The $6 AI Model? France’s $85B Bet, DeepSeek's Censorship & The Python Upgrades You Need 1:07:42 Episode Artwork #78 The AI Act Lands, Meta Pauses, OpenAI Complains & DeepSeek Rises 44:36 Episode Artwork #77 DeepSeek R1: The ‘Open’ AI That’s Shaking Up OpenAI - Plus OpenAI’s Operator, Stargate, ByteDance, & more 1:15:38 Episode Artwork #76 AI at what cost? Environmental toll, Trump vs AI regulation, creative impact, & poisoned text for AI scrapers. 1:01:44 Episode Artwork #75 Developer Productivity in 2025: AI Replaces Engineers, Biden’s AI Chip Regulations, UV’s Killer Feature, and Doom in a PDF 1:14:17 Episode Artwork #74 Hello 2025! OpenAI’s O3, Deep Seek V3, Bolt.new and Doom Goes Artsy 1:09:35 Episode Artwork #73 LLM Hunger Games: The Ultimate Showdown - Rootsconf recap (Part 3) 32:09 Episode Artwork #72 Mastering Communication in the Workplace – Rootsconf Recap (Part 2) 27:59 Episode Artwork #71 Navigating GenAI: How Organizations Must Adapt to Paradigm Shifts – Rootsconf Recap (part 1) 35:20 Episode Artwork #70 What's Next for AI? A Recap of 2024 and Predictions for 2025 1:52:52 Episode Artwork #69 From Engineer to CEO: Alex Gallego on Building Red Panda 1:04:44 Episode Artwork #68 GenAI meets Minecraft, OpenAI’s O1 Leak, Strava’s AI Moves, HTMX vs. React & Octoverse Trends 1:33:06 Episode Artwork #67 The AI Race: ChatGPT's New Web Search, Meta’s Llama AI Scaling Efforts & Python 3.13's Upgrades 1:09:54 Episode Artwork #66 From Will Smith to Meta's MovieGen: How AI Video Got Real. Plus Claude 3.5’s “Computer Use” & Open Source Tools 59:34 Episode Artwork #65 The Art of Data Storytelling: A Deep Dive with Angelica Lo Duca 1:04:20 Episode Artwork #64 Python WTF moments, Rust rants & Quantum flops 1:06:05 Episode Artwork #63 What’s Next for Open Source? Astral’s business model, WordPress, Deno 2.0 & One Year of DataTopics! 57:31 Episode Artwork #62 The End of Pandas, Rise of Ibis: AI, Function Calling, & Python’s New Tools 1:13:44 Episode Artwork #61 AI is Officially Smarter Than Humans: First Look at OpenAI O1 'Strawberry' 1:24:33 Episode Artwork #60 AI and the Paris 2024 Olympics: From Tech to Yusuf Dikec Memes 47:49 Episode Artwork #59 Did AI Accurately Predict the Euro 2024 Winners? (Part 2) 1:13:25