Vantage ViewPoint

Rethinking PDF Accessibility with AI + Human Expertise

Vantage ViewPoint Season 1 Episode 4

In this episode of Vantage ViewPoint, we explore how AI and human expertise can work together to create scalable, high-quality accessible PDFs. Host Sampath sits down with Greg Suprock, AI accessibility workflow expert at Apex CoVantage, to discuss where AI delivers the most value, where it falls short, and why a hybrid approach is the key to sustainable, efficient document accessibility.
From alt text generation to remediation of scanned documents, discover how Apex’s human-in-the-loop model balances speed, accuracy, and compliance. Whether you're in higher ed, publishing, or government, this conversation offers valuable insight into the future of digital accessibility.

Welcome to Vantage Viewpoint, your source for insightful discussions on document accessibility, technology, and more. Join us as we dive deep into ensuring inclusivity through innovation. Brought to you by Apex CoVantage and documenta11y.
Follow us for new episodes and updates:

Website: www.apexcovantage.com

Facebook: https://www.facebook.com/ApexCoVantage
X: https://x.com/Apex_Content

For partnerships or help reach out to: info@apexcovantage.com

Welcome to another episode of Vantage Viewpoint. Today we are diving into a critical conversation at the intersection of accessibility and AI, and how artificial intelligence and human expertise can work hand-in-hand to create better, more efficient PDF Accessibility Outcomes. Joining me today is Greg, an expert in AI powered accessibility workflows here at Apex CoVantage. Hello, Greg. Welcome back to the show. Thanks, Sampath. Very happy to be here. This is a topic that's gaining a lot of traction. There's truly potential in how we can blend technology and human insight to make digital content truly accessible. So where would you like to start sampath? Okay, let's start off from the top, shall we? So what do you think, AI or where do you think AI brings the most value when it comes to making PDFs accessible? That's a good question Sampath. AI works best when you're dealing with high volumes of content. When the documents are structurally consistent, similar to forms or templates, publisher standard formats, when you want to generate alt text for images quickly and ideally when the PDFs are born digital, not scanned so they're clean and machine readable right from the start. In those situations, the power of AI can really speed things up while maintaining a decent level of accuracy. That does sound powerful, Greg, but I believe there's also a cost to getting these AI systems of the ground right? Absolutely correct. The cost to develop and implement AI tools can be significant. That's why it's generally best handled by a software provider, a SAS platform provider, or an experienced vendor. For most organizations, it wouldn't make sense to build from scratch, especially if you're not dealing with enormous or ongoing document volumes. That's great. Greg. Now, I believe with the cost of implementation and considering that, you know, AI is really, really new to the scene. But considering the advantage, the advantages and the leaps and bounds at which it's growing, I'm sure they must also be a lot of limitations. Where does I tend to fall short in this space? There are a few common challenges associated with implementing an AI solution. One of them is in a low volume scenario. In such cases, there's not enough data to train models effectively. You can get very strange results by using very small quantity data to do a proper training. That means you don't account for the variance in the source. The training input also really matters. The more aligned it is with the actual use cases, the better the outcomes from using an AI solution. When you're in a live, production scenario. And, the other rub that's involved in that highly variable content can create some problems or be misinterpreted. It can throw the AI results off, associated with all text generation. Well, that's an improving area for AI. It's done a very good job with recognizing lots of images. It can be inconsistent and sometimes it can provide problems that require human review. The types of responses that are generated can seem to be very mechanical in certain cases. In other instances, there's more data that's needed. Particularly when you're dealing in Stem content, where you may have to look at the context surrounding an image to be able to get an appropriate alt text response. and when you're working with page images or scanned PDFs, you end up relying on the OCR. That's what I mentioned before in digital earlier. When you have scanned objects and you're, you're doing the OCR extraction, there's another potential layer of inaccuracy that's introduced, which could throw off an AI result. So it becomes necessary to be able to look at options that are going to put people in the process, to be able to look at it from a hybrid perspective, rather than just a software solution alone. Now that we have spoken about the limitations, Greg. I believe that the sweet spot seems to be a hybrid approach. I where it makes sense and human oversight where it's needed so that the data can be trained better. The deployed solutions can be better optimized for or any sort of problems that we encountered during accessibility of document remediation. Exactly. Correct. Yeah. At apex we use best of both worlds kind of approach. It's a, hybrid workflow as you described. We start by making sure that we do a complete source analysis, and we're choosing the proper tools to be able to do the work that we have the right AI models that we want to put into place. We then use, AI with the PDF workflow to be able to get to accessibility to create an intermediate accessible PDF file. What that means is that the file has been processed by the AI, it's been reviewed, objects have been identified, and they've been appropriately tagged to the extent that the AI is able to handle the the content there, then we will run a series of automated checks. I mean, we can use things like, Adobe Accessibility Checker, or we could use the PAC checker for PDF and then, take that information where human analysts can step in and look at the issues that may be reported by the checking tools and ensure that their the tagging has been updated as necessary, that the document will pass all criteria. And, additionally can check to make sure that any old text that has been generated is accurate and contextually appropriate. One of the things that we do to improve our process is we consider the human document that has been prepared to be a gold standard document at that juncture. We can take that gold standard document and then the source, and we can feed those back into our AI models. And we do this periodically so that we're improving the AI training as we go forward. So as we do work, we have the potential to expand the quantity of, trained materials that have been used with the AI model. And the objective is to ensure that when we're generating the tagged output from our, our system, that we're going to keep incrementally improving that over time, to be able to get to, having less, the human touch that's involved into, it turns out that's an iterative it's a collaborative process that that goes on between the production folks and our software developers to balance the speed, cost, efficiency, and quality to get to a good end result for, an efficient workflow to award accessible PDFs. Thank you, Greg, for explaining how human and AI oversight works. Now, for all our listeners tuning into this episode, if you have any high volume and large scale PDF remediation needs, you can contact Apex CoVantage. Now, Greg, looking ahead, how do you see the space evolving? It's a good question, Sam, for I see a future where AI is going to handle more and more of the heavy lifting associated with, doing document remediation. And human experts, will continue to act as quality controllers and also trainers. It's an interaction between the folks that are reviewing the output files and the folks who are doing development work with the AIS. The more high quality data that we can feed into the system from these hybrid workflows, the more intelligent the AI will be, and the more accurate will be when it's delivering results. Ultimately, we're moving toward, highly scalable, high quality, document accessibility. Results, and doing so in a way that's going to deliver things in a fast manner. And it's not going to compromise in any way. Human experience at the, delivery side, when the remediated PDF is made available. Thanks, Greg, for that. Now, before we wrap this episode up, what is your one key takeaway for all our listeners? the probably the biggest key takeaway that I would say is that, pdf accessibility because of variation in different types of sources out there, it isn't a one size fits all process. So you can't take one solution and then apply it to every scenario as a huge enabler. But it's only effective when it's used strategically, and you're combining it with skilled human reviewers to ensure that you're going to get the appropriate end result. That's how we make accessibility at Apex efficient, sustainable, and effective. That's great. Greg. Thank you so much. And I think this concludes this episode for today and for for our listeners. If you're exploring scalable solutions for accessible PDFs. Definitely check out how Apex Co Vantage blends I am human expertise to deliver these better outcomes. Hey Greg, thank you so much for being here today with us. Great. Welcome, Sampath. Thank you for having me on. Thank you.