
The Data Exchange with Ben Lorica
A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].
The Data Exchange with Ben Lorica
Unlocking Unstructured Data with LLMs
•
Ben Lorica
Shreya Shankar is a PhD student at UC Berkeley in the EECS department. This episode explores how Large Language Models (LLMs) are revolutionizing the processing of unstructured enterprise data like text documents and PDFs. It introduces DocETL, a framework using a MapReduce approach with LLMs for semantic extraction, thematic analysis, and summarization at scale.
Subscribe to the Gradient Flow Newsletter 📩 https://gradientflow.substack.com/
Subscribe: Apple · Spotify · Overcast · Pocket Casts · AntennaPod · Podcast Addict · Amazon · RSS.
Detailed show notes - with links to many references - can be found on The Data Exchange web site.