Test Case Scenario

Join us every other week on "Test Case Scenario" presented by Sauce Labs, where our expert panel dives into the exciting and ever-changing landscape of technology, pop culture, and business. Host Jason Baum, Director of Community at Sauce Labs, will lead the discussion with our esteemed recurring panelists: Marcus Merrell, VP of Technology Strategy; Nikolay Advolodkin, Senior Developer Advocate and Evelyn Coleman, Manager of Implementation Engineering. Get ready to uncover the impact of continuous testing in this thrilling exploration of the tech world!

All Episodes

Test Case Scenario

Is AI Living Up to the Hype? A Retrospective Look from Industry Experts

January 22, 2025 • Sauce Labs

0:00 | 19:30

Send us Fan Mail

AI has made waves in testing, but how much has really changed?

In this episode of Test Case Scenario, Jason Baum, Evelyn Coleman, and Marcus Merrell take a critical look at the progress of AI in software testing. They discuss the real capabilities of tools like ChatGPT and Copilot, what’s improved over time, and what still leaves testers frustrated.

You’ll also hear insights into how AI tools are shaping software development workflows, why cost and sustainability are becoming bigger concerns, and what testers can do to separate AI hype from practical solutions.

Join us as we discuss:
(00:00) Introduction

(02:01) Evelyn’s approach to testing New Year’s resolutions

(03:47) Revisiting ChatGPT: AI’s evolution and current limitations

(05:43) The realities of AI costs and challenges in adoption

(08:11) Improvements in AI-generated content and usability

(10:53) Rediscovering AI tools for personal and professional use

(16:38) The future of AI in daily life and automation potential

We’d love to hear from you! Share your thoughts in the comments below or at community-hub@saucelabs.com.

SUBSCRIBE and visit us at https://saucelabs.com/community to dig into the power of testing in software development.

▶ Sauce YouTube channel: / saucelabs

💡 LinkedIn: / sauce-labs

🐦 X: / saucelabs

Jason Baum [00:00:00]:

This is Test Case Scenario with me, your host, Jason Baum. This podcast is the definitive hub for knowledge and stories in the software testing and development communities. If you're new to the channel, hit the subscribe button, and let's dive straight into the episode. Hey, everybody. Welcome back to another episode of Test Case Scenario. I'm your host, Jason Baum, and with me, as always, Marcus Merrell and Evelyn Coleman. Thanks, everybody. Thanks for coming back.

Jason Baum [00:00:36]:

You know, the end of January, so anticlimactic. We just came off of, like, one arguably one of the most enjoyable times of the year. And then what do you get? In many areas of the country, in the United States, at least, cold weather.

Evelyn Coleman [00:00:54]:

Oh, and my birthday. And then Valentine's Day.

Jason Baum [00:00:58]:

And my birthday. Alright, fine. Prove me wrong. That's okay. This is great. This is why we all work into the fiscal quarter.

Marcus Merrell [00:01:02]:

I mean, we. Yeah.

Jason Baum [00:01:03]:

We get to close some deals. Okay, cool. Fair enough.

Evelyn Coleman [00:01:08]:

People finally start leaving the gym.

Jason Baum [00:01:11]:

Yeah, all the. All. All those commitments that she made. Now that we're like, three weeks in, we're going to say screw it. I had a gym that I went to for the longest time. It was a Bally's. Do you remember Bally Tr Directly across the street from the valley? The valleys had, like, all the. The windows, like those giant windows and all the elliptical machines and all that stuff looked right outside directly at a McDonald's.

Evelyn Coleman [00:01:37]:

That would. That would motivate me. That would.

Jason Baum [00:01:40]:

Yeah.

Evelyn Coleman [00:01:41]:

The trick, I believe, to New Year's resolutions is that you test them two months leading up to New Year. So November 1st is your first test run. It's usually a bit of a mulligan. You're like, oh, crap, I forgot it was November 1st. Okay, I gotta get on that then. December 1st is your actual test run that you make sure you've got all the logistics figured out.

Evelyn Coleman [00:02:01]:

And you know it's not gonna work because you got the holidays and things coming up. So there's no pressure. You just have to try out your New Year's resolution for a couple weeks and get all the flaws out of it. Make sure you have all the equipment that you need, the books that you wanted to read, and then Jan first is your actual production version of New Year's. So I believe in testing even in New Year's resolutions.

Jason Baum [00:02:27]:

What about Christmas week? Is that like a mulligan, too? Are you allowed to, like, skip that?

Evelyn Coleman [00:02:33]:

Yeah, I think you skip it, but I think that's what relieves the pressure on your test run, is that, you know, it's not going to go well. So now you have like what do I do on a month that I know isn't going to be great? Because you're going to have months like that throughout the year. You might as well test it out in your harshest conditions.

Jason Baum [00:02:50]:

I always like to do a check in. Usually haven't accomplished too much. I feel like this is going to be the year. Walking was mine. I believe it was cold today. Still went for the walk. Okay, well, let's get into the topic for today. I want to say what has it been? Gosh, almost. Yeah. What is it? Oh, two. Two years and change since ChatGPT came out and so much has changed and yet so little has changed. I guess you could argue in this, in that amount of time and thought it would be good to sort of explore the topic of AI and not necessarily like, I don't know, not like everybody else is kind of doing. I feel like this is more. It's not skeptical. Marcus, you have a talk on AI. What do you call your talk these days?

Marcus Merrell [00:03:47]:

Just like an AI skeptics guide to, to the SDLC. It's like a, you know, it's, it's. I mean it's skeptical, but it's also.

Jason Baum [00:03:55]:

Yours is not skeptical. That's why I asked.

Marcus Merrell [00:03:58]:

Yeah, I mean it really is about understanding what it really is capable for, you know, right now. Not, not the promise. I feel like most of the people who are breathless and talk about it endlessly being the, the way of the future, they're not using the current version every day. They're talking about some imagined future where we have that quantum chip that Google came out with that's actually in place. Like we're not. What can we do with it right now? And we, if we use it right now, what are the things we have to keep in mind using it right now, like hallucinations, non deterministic results and the fact that the token costs we're paying right now is all subsidized and it's going to go up a lot once these companies decide they want a return on their investment. Like you got to deal with those realities.

Evelyn Coleman [00:04:37]:

A 2024 retro skeptic. If you might say of, of AI, like looking back on your talks, looking back on everything that's happened the past year, the past couple of years with AI and how it's been involved in testing. If people took your advice, if there's any advice you would have changed during your talks.

Marcus Merrell [00:05:01]:

I would say that my usage of AI has not changed the tooling improvements that I think could be made in terms of getting it sprinkled more throughout the IDE code code working. So like those things, the results you get from the models are marginally better, but the ergonomics around it. When I use ChatGPT, I still have to copy and paste, I still have to do all this stuff. When I use Copilot, it's a autocompleting. It's doing a pretty good job. It's doing the same job it was before. Like nothing in my opinion has changed except for the tone around what people say it can do, which I haven't really either haven't needed or just haven't been able to believe.

Marcus Merrell [00:05:43]:

Would I change? Any advice? No. I think people have, you know, heard some of what I've said. They've, they've asked me to come and talk to their team specifically about some stuff because there's some companies that are embedding AI in their systems and they're not at all thinking about the fact that their margins might be destroyed once OpenAI decides they want to make a profit, which they've now said that they want to do. You know, they're not going to be a non profit anymore. So I think these are just things to keep in mind. I mean, we all, we all understood when Uber went from being super cheap 15 years ago you could get across town for five bucks and now it's just as much as a cap right now it's like, do I want to deal with the mechanics of the cab where I have to pay at the end of the ride, or do I want everything to be bottled up and I pay for it sort of separately because they're the same now, the same experience with AI, it's not just me as an individual paying more for a product, it's. I'm choosing to embed this in my tooling and I'm calculating ROI and margins for my business based on costs today, when those costs are going to skyrocket once these folks decide that they actually want to turn a profit.

Marcus Merrell [00:06:46]:

So it's a little different. What I'm calling it from Ed Zitron is calling it the Subprime AI crisis.

Jason Baum [00:06:53]:

People have known this is on the horizon, right? But that's getting closer. I guess the reality of it is getting closer.

Marcus Merrell [00:07:00]:

Well, I mean, as long as OpenAI continues to raise billions in funding, it's going to be pushed off. More and more chickens are not coming home to roost, as far as I can tell.

Evelyn Coleman [00:07:10]:

Do you feel that the reason the ergonomics of AI on things like ChatGPT or Copilot have not changed. Is that this. I feel like this year's emphasis was very much on visual tools, so visual and videos in AI and also on customer service apps. And not so much as the early year where it was let's make this thing write code for us. Do you think some of that has faulted the progress for testers and software writers?

Marcus Merrell [00:07:45]:

Yeah, I think so. I mean, also the fact that I think that the stuff that I want isn't going to add to more profit, more revenue, more usage of the system, it's going to cost them more money to produce those features for me. Yet every time I use the product, they're going to lose money. So I think they're probably investing in products that will actually get them a higher return. Like, I think video and stuff like that's probably a, a much safer bet for a company to put together right now, so I don't.

Jason Baum [00:08:11]:

Does it also do it better? Is that why they're investing a little more? Like, I know content, written content is some. Is an area that AI is good at. Like it's improving on, like it's the first iterations of it. Really bad, questionable. And yet we're still interesting enough obviously that we were all interested in it at the time, but I think evolutionary, like we're seeing an evolution in the content that AI is able to produce today over what it was in 22 when it came out.

Marcus Merrell [00:08:44]:

Yeah, that seems safe to say that. I mean, I've read some quotes from rather ominous quotes from OpenAI people that basically say that GPT-5, they think it's going to cost billions to train. They're worried already that they're out, they don't have enough data to do the training that they want to do, and they're only now able to say it's going to be marginally better than 4.0. So I think maybe we're. And even Marc Benioff said that we think we've reached an upper limit to the LLMs as they are today, which means they're. They're not going to stop development, we're not going to stop progress. It's just that we're going to do other kinds of models that splinter off of the main trunk of the centralized model that people are using right now and go more into autonomous and rag models and agents, all sorts of stuff like that. You're going to hear the word agentic a lot this year.

Jason Baum [00:09:31]:

Evelyn, you, you know, for a while there, when the rest of us were getting into ChatGPT and we, we made you get into it. Have you been using it as much lately? Are you still using it or what's your your progress been with it?

Evelyn Coleman [00:09:46]:

In my roles, I, I find that personal emails are really the best way to get responses. They're more sincere. I do actually mean the things that I write. I'm not just writing to get, you know, to check in on, on customers and things. So it's not super, it wasn't super helpful for me. It just didn't feel personal. However, since you ask, I did go back and start using it a little bit and it has changed so much. The responses feel more human, if that's something that can be trained for.

Evelyn Coleman [00:10:22]:

It was just an incredible improvement in terms of the quality of the output and it makes me wonder like if you were an early adopter but you stuck with it, are you in this sort of frog and boiling water scenario where you're so close to it that you can't see the improvements or how mind blowing it is and so you kind of like, ah, it's all right, it's kind of reached its limit. As for someone like me who's dabbled, I think I'm definitely going to use it a lot more now.

Jason Baum [00:10:53]:

It's interesting to hear you say that. I think some of us are just in it all the time or playing with it, you know, playing with different versions of it and other AI tools. I just asked ChatGPT as we were talking to write you a heartfelt note on your performance as a host of Test Case Scenario, and this is what it wrote. Dear Evelyn, I just wanted to take a moment to thank you for all the amazing work you do on the Test Case Scenario podcast. Your thoughtful insights, engaging conversations, ability to make complex topics relatable, truly set the show apart. Your passion and authenticity shine through in every episode and it's clear how much you care about both the content and the community. The podcast has been such a valuable resource and a source of inspiration. Thanks so much for sharing your voice and knowledge. Wishing you continued success. I look forward to hearing more with gratitude, ChatGPT.

Evelyn Coleman [00:11:49]:

Aw, that's such a nice birthday present.

Jason Baum [00:11:50]:

Doesn't that feel so heartfelt?

Evelyn Coleman [00:11:53]:

It's actually heartfelt. Yeah, I remember. I think we used to do it on the podcast. We like test out different ones so it inferred based on models that you wanted something cheesy. You didn't have to write that it needed to be cheesy or that it needed to be positive.

Marcus Merrell [00:12:10]:

Send me the prompt in chat.

Jason Baum [00:12:12]:

Yeah, I'll send you the prompt.

Marcus Merrell [00:12:15]:

So 3.5 is no longer available in the web UI. I had to use the API to get to 3.5, but I'll just put it.

Jason Baum [00:12:21]:

Oh, wow. Really?

Marcus Merrell [00:12:22]:

I hope this message finds you well. I wanted to take a moment to extend my sincere gratitude and appreciation for the fantastic work you were doing with the Test Case Scenario podcast. Each episode not only deepens my understanding of complex topics, but also ignites enthusiasm with its insightful and engaging content. The way you break down intricate subjects into digestible and relatable discussions is commendable. It is clear that a lot of effort goes into research and production, and it truly pays off, making each episode a valuable learning resource. Cool. There's a little bit more, but, like, it's fairly similar, but yours is more concise.

Evelyn Coleman [00:12:52]:

I feel better. And I do feel like this one guessed a little bit more like the. I don't know what the compliments would be.

Jason Baum [00:13:01]:

It felt more personal. The 4O felt. Felt personal. I. I only did that because you said, like, you want to write personal things, and it's only, you know, and. And so my. My challenge was going to be like, all right, can it write personal? Can it make something feel personal? It felt personal.

Evelyn Coleman [00:13:19]:

Oh, I remember what I used it for. I'm making a custom Magic the Gathering card for my partner's 9th anniversary, and I don't know enough about Magic the Gathering, so asking it to help me with the little description. And it did such a good job.

Jason Baum [00:13:37]:

That's a good use case for AI, right?

Evelyn Coleman [00:13:40]:

Anniversary gifts.

Jason Baum [00:13:41]:

Yes. Anniversary on personal anniversary. Well, no, for things. You have no idea what it is, is my point. I think searching. It's gotten really good at searching and parsing that information into digestible. Like you can ask it to scrub a site and then tell you what it's saying in like two paragraphs. I have a higher working knowledge of the subject matter.

Jason Baum [00:14:09]:

You can have it analyze data really, really well, which I think, Marcus, is something you touch on Right. In your presentations, right. That's its sweet spot right now.

Marcus Merrell [00:14:20]:

Yeah. It is good at telling you something about something that has already happened. It's not super good at telling you how to do something in the future. It's never been done.

Jason Baum [00:14:28]:

Yeah. That and being a travel agent. Great travel agent. It is a wonderful, wonderful travel agent.

Evelyn Coleman [00:14:34]:

This actually brings up an idea of, like, the future of it. I know it's a retro skeptic episode, but it's make me wonder. Obviously these AI models are learning off of the Internet and training and things, and they're also Learning off of us, like our interactions with them, but one place that I don't feel like we've explored is I use mine and let's say a family member uses theirs if it could like connect the insights, you know, in a way to like help manage a home together or to help with like, I don't know, like to make things even more personal.

Marcus Merrell [00:15:12]:

To accumulate knowledge about.

Evelyn Coleman [00:15:14]:

Well, maybe it's creepy when I think about it too much, but I feel like there is, there's good ways or bad ways to use that.

Jason Baum [00:15:20]:

On the next episode of Test Case Scenario, we're going to go into all the bad ways to use AI, all the creepy ways to use it. Now, now our iPhones have it right now. Now iPhone has.

Marcus Merrell [00:15:35]:

I haven't used it even once.

Jason Baum [00:15:37]:

I haven't used it either, but now I'm gonna, I'm gonna start asking it some questions.

Marcus Merrell [00:15:41]:

Maybe this is me being old. I'm so thrilled that Alexa actually understands what I'm saying. That to me, that's good enough. Like AI, we're fine.

Marcus Merrell [00:15:50]:

If I can say Alexa, who's singing this song? Or. The main thing I've done recently is I've hooked up all of my fish tank aquarium equipment to voice commands so I can use both hands and say, all right, now turn on the water valve. All right, now turn on the CO2, that kind of thing. So I can, you can do all that stuff. And so I'm like, I need more of that, more of these things when I've got both hands occupied. Things in the kitchen, turn on the oven, that kind of thing. That's, that's kind of what I'm, what I'm headed for right now.

Evelyn Coleman [00:16:15]:

To be able to talk back to my GPS would be really good. Like hey, that's not going to work.

Marcus Merrell [00:16:20]:

Or like, or that that road's actually closed or something.

Jason Baum [00:16:23]:

We've got an AI and automation to essentially get to the point where we could be Knight Rider, right? We could like call up, we could like KITT and have our self driving car come to our house, pick us up, take us to the. We've. We've essentially created either the Batmobile or KITT.

Marcus Merrell [00:16:38]:

Do you ever see the movie, The Graduate?

Jason Baum [00:16:40]:

Yeah.

Marcus Merrell [00:16:40]:

The guy who plays Dustin Hoffman's father was the voice of KITT. So I can't watch that movie now without Daniels. I don't remember his first name. Something Daniels. Oh gosh. George Daniel, Mike, Michael Daniels. I don't know.

Evelyn Coleman [00:16:53]:

See we can have ChatGPT listening in all the time and it could just tell us instead of us struggling.

Marcus Merrell [00:16:58]:

Yeah.

Jason Baum [00:16:59]:

I just find that I end up asking it ages of people, how tall is.

Marcus Merrell [00:17:04]:

How tall is something.

Jason Baum [00:17:05]:

I think the older I get, I just want to know how old everyone else is. Do you find yourself. I'm watching a show. All I care about is how old this person is. I don't know why.

Marcus Merrell [00:17:14]:

I'm actually more at the age where I'm like, wait, wait, how old was this person when they accomplished that amazing thing that I haven't even come close to.

Jason Baum [00:17:21]:

Yeah. Anyway. Yeah. The automation of the home is like, come. So, like, so far, like, last month, I had all the Christmas lights on Alexa. So basically all I had to do was say, Alexa, Merry Christmas. And then I have my. My, you know, the Chevy Chase.

Marcus Merrell [00:17:40]:

Yeah.

Jason Baum [00:17:41]:

Moment.

Evelyn Coleman [00:17:42]:

Good episode, right. Like the automation of holidays, like holiday magic used to be like one of your parents staying up and forcing everybody to put lights up and, like, forcing the dramas and all that stuff depending on the holiday. And now I feel like I, like, export a lot of those tasks to automation or online shopping and, like, that'd be a good idea. Like the AI of colonies.

Marcus Merrell [00:18:12]:

Yeah. And a drone delivered those lights.

Evelyn Coleman [00:18:14]:

Yeah, and put them up.

Jason Baum [00:18:16]:

That's right. All right, this is a good point. I feel like, to wrap up today's episode. Thank you. So this was a fun one. We should. We should revisit the topic, you know, in another few months because I feel like, you know, it's something that is constantly evolving, maybe not to the pace of some people's, you know, hopes and dreams. Certainly we're not, you know, VCs exploring the topic.

Jason Baum [00:18:35]:

I think they're very, very excited still, but, you know, we could put on our skeptics hats and sort of look at it objectively as to where it is today and what it does well and what it doesn't. Thank you so much, Evelyn. Thank you so much, Marcus. And thank you, our listeners, for giving us your time, which is so precious. And we will see you on our next episode of Test Case Scenario. Thank you for joining us on Test Case Scenario. Share your thoughts in the comments.

Jason Baum [00:19:14]:

We'll make sure to respond to each and every single one. Don't forget to subscribe and hit that notification bell to keep in touch. If you missed our last episode, it's popping up on your screen right now. Go. click it. Until next time on Test Case Scenario.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.