Beyond Pipelines: Designing a Testing Culture That Actually Works in Cloud Native with Joseph Karp Artwork

Cloud Native Testing Podcast

The Cloud Native Testing Podcast, sponsored by Testkube, brings you insights from engineers navigating testing in cloud-native environments.

Hosted by Ole Lensmar, it explores test automation, CI/CD, Kubernetes, shifting left, scaling right, and reliability at scale through conversations with testing and cloud native experts.

Learn more about Testkube at http://testkube.io

All Episodes

Cloud Native Testing Podcast

Beyond Pipelines: Designing a Testing Culture That Actually Works in Cloud Native with Joseph Karp

April 30, 2025 • Testkube • Season 1 • Episode 6

In this episode, Ole Lensmar talks with Joe Karp, DevOps Architect at UWM, about what it really takes to build a testing culture inside a modern, cloud-native organization. Joe shares how his team intentionally designed clear ownership between developers, QA engineers, and platform teams—and why empowering devs to own their testing is key to scaling quality across hundreds of services.

From infrastructure-level chaos testing to critical-path monitoring in production, Joe walks through the tooling, strategies, and cultural shifts that enabled UWM to test across multi-cloud, multi-cluster environments. They discuss why off-the-shelf tools often fall short, how internal platforms should be treated like products, and what it means to build resilience into both systems and teams.

This podcast is proudly sponsored by Testkube, the cloud-native, vendor-agnostic test execution and orchestration platform that enables teams to run any type of test automation directly within their Kubernetes infrastructure. Learn more at www.testkube.io

Ole Lensmar (00:47)
Hello and welcome to today's episode. I am super happy to be joined by Joe Karp, who is a DevOps architect at UWM. Joe, welcome to the show.

Joseph Karp (00:56)
Well, thanks for having me. I'm ready and I'm excited to talk about the questions that you have for us today.

Ole Lensmar (01:03)
I mean, the topic of the podcast is cloud native testing, and I'm guessing you're involved with that, or I'm hoping at least so kind of a fruitful discussion. Please, what's your take and kind of where are you in that journey?

Joseph Karp (01:09)
Yeah, so for me, being my particular position, I think most people recognize it probably as a platform architect. Like I live in the middle, usually between the two sides at most places that are traditional. Like you have your operations teams and you have your app dev teams and I live in that space in the middle...

You're not going to find an operations guy really thinking through, “How do I test this change I'm about to push out?” So even when I build my own tooling, it's like—how do we test this thing? A lot of the tooling out there isn’t really geared toward infrastructure testing, so you have to come up with creative ways to do it. That’s where chaos testing comes in. It’s essentially infrastructure-level testing, even though you can use it for application-level testing too.

When you start thinking cloud native—and we started moving into having Kubernetes clusters in different regions and in different cloud providers—we had to invent things to simulate failure scenarios. Like, what happens if Azure goes down? Testing that at the infrastructure level is totally different from testing it at the app level. At the app level, you might be thinking active-active or active-passive. At the infrastructure level, you're asking: do I switch DNS? How do I fail over? You are the magic that makes that happen.

As an app developer, you can rely on your infra or platform teams to make sure things are running somewhere else. But the infra team has to be the ones that make that happen behind the scenes.

Ole Lensmar (02:49)
Yeah, that sounds both challenging and exciting. Were you able to use off-the-shelf tools for chaos testing, or did you need to build your own framework?

Joseph Karp (03:17)
Off-the-shelf tools really don't work once you get into specific company environments—especially slower-moving companies like mine, which is a financial institution. Greenfield? Great. But in our case, I need flexibility and extensibility. I need to build adapters and plugins into whatever orchestrator I’m using.

We have our own orchestrator that doesn’t directly integrate with anything—it just orchestrates. I built workers for it, and then chaos testing is built around that. So now we can run an exercise where we lift and shift all our applications from one Kubernetes cluster to another seamlessly.

It’s kind of my Netflix mentality—like their “Chaos Kong” idea, where they shut off entire regions. But I can’t just use Chaos Monkey or even Istio’s chaos tools. Tools like Litmus don’t really adapt to our needs. That’s why Testkube is great—it’s extensible. I can drop it in where I need it, build containers and custom executors, and test fine-grained infra details and app behavior.

Ole Lensmar (05:34)
Very cool. For people building applications on your infrastructure, do they need to do anything differently knowing their app might move between clusters?

Joseph Karp (05:54)
We try to provide a platform where developers choose the ideal region—like Azure if they rely on Azure APIs, or GCP if they use GCP services. But I can move that workload temporarily if I need to upgrade something. It’s transparent to them. They trust that I’ll keep their app running somewhere.

We collaborate closely to ensure the applications are as cloud-native as possible. But it’s not always perfect—you can’t always lift and shift everything. You have to know your choke points.

Sometimes I’ll pick up a service and move it, and it explodes because it doesn’t have access to a shared on-prem drive. Then I need to figure out how to expose that resource properly. Those are the challenges.

Cloud native sounds great, but in reality—especially in financial companies—there are still hard constraints. For example, our finance team wants things on a shared drive in a very specific place. So we have to build adapters that make those things portable and resilient. That way, when on-prem goes down and comes back up, everything can return to normal seamlessly.

It’s all about building bridges and adapters to bring legacy systems into a cloud-native world.

Ole Lensmar (07:53)
Fascinating. I imagine the level of dependency between infrastructure and applications varies. Some might be easy to move, others very tightly coupled—like with GPUs or storage dependencies?

Joseph Karp (08:06)
Absolutely. I love a stateless app. A pure stateless Kubernetes deployment is like the best-behaved child—you can pick it up and move it anywhere. I build workers for our orchestrator (which is similar to a Kafka consumer) that just pull tasks from a queue. They don’t expose endpoints or require inbound traffic. I can literally move them wherever I want and they resume their work independently.

But then you get into services with databases or persistence layers—it gets trickier. Kafka clusters, for example, can’t just be lifted and shifted. You have to set up replication across regions, and that brings in latency and eventual consistency issues.

Especially in a financial institution, consistency is critical. We can't rely on eventual consistency for anything mission-critical. So we often use synchronous API calls and active-passive database setups to ensure everything stays in sync, even when running in multiple clusters.

Ole Lensmar (09:37)
That makes sense. Let’s shift to high-level testing. When you run functional or load tests, do you move applications to special environments for that, or test them where they are?

Joseph Karp (10:00)
We usually run load tests off-hours in our lower environments. We have a production-like staging layer and an integration environment for developers. Integration is the first stop after a merge. Staging is where we do load and performance testing since it’s more stable and releases less frequently.

We’ve started building out hermetic environments, but that’s still in progress. Right now, we use Testkube's extensibility—we have runners deployed across four locations. After a deploy, our orchestrator tells Testkube to fork and run tests from each location so we get a good picture of performance across the globe.

That includes things like latency between GCP and Azure. It helps us identify networking issues—is the VPN behaving? Is there hairpinning through on-prem? How is each environment talking to each other? This kind of testing helps us ensure performance stays consistent no matter where a service runs.

Ole Lensmar (12:20)
Not a good thing. You mentioned there's an integration environment, a staging environment… is promotion between them automated? Do you run a test suite and then promote, or is it manual?

Joseph Karp (12:35)
So with all the microservices we have, we've basically created a self-deploy process. Every time a developer lands a commit on the develop branch, it produces a new version. They can then choose to deploy that version to whatever environment they want. It automatically goes to integration. From there, they can promote it to staging or production whenever they’d like, depending on what their PO wants or the release schedule.

The dev teams currently have full control, but we are moving toward a fully automated CD process. It'll start in integration, run tests and contract testing, then move on to the next environment and run load tests, etc. That’s in development now. We basically took our one deployment workflow, copied it three times, and linked them together.

So they'll have full CD as an option later this year. We want to move to automated prod deploys, but we need mature testing practices first. You don’t want to go straight to production until your test suites are reliable and in place.

Ole Lensmar (13:40)
Yeah, definitely. In that case, you need things to be really well-tested. One thing we’ve talked about in previous episodes is testing in production. Some folks say you shouldn’t do it, but still do it anyway. Not as the only type of testing, but maybe running integration tests in prod on a schedule just to catch anything that slips through. Is that something you’ve done or considered?

Joseph Karp (14:27)
We actually had Google in-house helping us adopt SRE practices. They've helped guide us on how to do production testing. The idea is to identify your critical paths and make sure they’re always being tested. Think of the status pages from companies like Harness or other vendors—those are often testing bots checking whether a user can log in and move through a flow.

We’ve been identifying critical user journeys and making sure certain actions work and certain pages are always up. If a deployment breaks something, we catch it immediately. Those checks aren’t tied to deployment; they’re always running, so we can set metrics, build SLOs, and eventually SLAs.

So if something breaks, we can say with confidence when we’ll be back up. We’re starting to set up Testkube to ping specific endpoints and run synthetic user flows. The idea is that if something fails at 3 a.m., someone gets alerted.

You don’t want to run a full load test in production. That would be madness. But you do want to run checks on critical paths regularly so you're not waiting for a user ticket to discover something broke.

Ole Lensmar (16:28)
Yeah, totally agree. You’ve got to be pragmatic. I guess the question is, where do you draw the line between monitoring and testing in production? A lot of monitoring tools let you do user flows or basic synthetic transactions.

So who on your team is responsible for building and maintaining those kinds of tests? Is it the app team, QA, ops? How does that work for you?

Joseph Karp (17:21)
It depends on what part of the stack we’re talking about. There are some shared gray areas. Unit tests, integration tests, and load tests at the application level are handled by the app dev team. Those are concerns specific to their services. I wouldn’t know what a “critical user journey” is for their app—I see it as a black box I need to deploy.

Of course, I end up knowing more about some of the apps because when things go wrong, I’m usually involved. But we do have both QAs and QEs. The QEs are responsible for building automated tests that run constantly. The QAs might do smoke testing for things we haven’t automated yet.

If it’s a single app concern and it’s at the app layer, it’s the dev team’s job to test it. For infra changes—say I have a change control and need to upgrade something—I need to know which apps are on that infrastructure. I reach out to the teams and ask, “What can I run to verify your app is fine afterward?” They’ll point me to the right test suites. I also have basic tests like DNS checks or endpoint reachability on my side.

Ole Lensmar (18:49)
Right. We’ve seen with other Testkube users that load testing often ends up falling to ops instead of devs. But it sounds like in your case, it still sits with the app team?

Joseph Karp (19:25)
Yes. That’s because we’ve built it that way. Each team has an IAC folder with controls to scale their application. So load testing is on them—they define how many instances their service runs, whether it scales based on CPU, and so on. We even gave them the ability to use KEDA for more advanced scaling.

They can configure it to scale off a Kafka queue or any other metric. My job is to make sure the cluster can handle the growth. I can do that because I have visibility into everything deployed there. If all services spike at once, I just need to make sure the cluster has the capacity to handle it.

But I can’t know what a good load test looks like for every app. What’s acceptable for one team might be completely wrong for another. Document uploads and processing are long-running tasks. But a calc engine? That needs to be fast. If I asked an infra person to load test that, they’d have no idea what “good” looks like.

So we push that responsibility to the teams and give them the tools to tune and validate their services properly.

Ole Lensmar (20:52)
Makes a lot of sense—give them the responsibility and the levers to tune performance to their needs. It sounds like you’ve come a long way in your journey toward building a mature cloud native testing approach.

Ole Lensmar (21:34)
Something that comes up a lot in testing is culture. Has it been easy to get everyone on board with how you’re doing things, or has it been more of a long process?

Joseph Karp (21:58)
It’s definitely been a long process. I’ve been at UWM for five years. When I joined, we didn’t even have a Kubernetes cluster. It took two years just to get our first one up. And every step forward required proving value—again and again.

One of the biggest turning points was building a real internal platform. We gave dev teams a minimal viable product to build microservices on and then kept adding capabilities over time. It wasn’t all upfront. We said, “Here’s how to tune things. Here’s how to automate tests.” We added Testkube. We kept building on top of it.

We treated the platform like a product. That’s something KubeCon really emphasizes. We got a solid skeleton in place, automated the deployment of services, and gave devs essentially a vending machine for microservices. When they want something new, I just need to add it.

Now we’re up to 400+ microservices in about three years. Devs are migrating older services over all the time. But none of this happened overnight. Like I said, it’s been a five-year journey, and we still have plenty of room to grow.

Ole Lensmar (24:02)
It’s such a fast-moving space too. How do you stay current? Is that something the business cares about, or is it more of an engineering-driven initiative?

Joseph Karp (24:21)
It’s a growing concern. We brought in tools like Snyk and Sysdig to scan for vulnerabilities and keep dependencies up to date. And Kubernetes itself moves quickly—cloud vendors upgrade on a six-month cycle.

We try to give engineers the space to keep up. For example, Wednesday mornings are blocked off for “lab time.” Engineers can research, explore new tools, and stay current. That helps a lot.

We also go to events like KubeCon. I tell devs, “Get out of the house.” Some folks have only worked here—they’ve never seen other infrastructure or talked to other engineers. Go to a JavaScript conference, anything. Just see how the rest of the world works.

We also do internal shadowing—one per month. Engineers shadow another team to learn how they work. It helps broaden perspectives and spark ideas.

We have a strong culture of self-improvement. It helps us keep up with the ecosystem. And I actually have time built into my day to stay current. That’s huge.

I used to be a front-end developer, and that world moves way faster than infrastructure. Tooling disappears overnight. So in comparison, this feels slower. But I’m still involved in frontend—we’re currently building a new micro frontend framework.

Ole Lensmar (26:28)
Yeah, frontend moves at breakneck speed. So what’s the backbone of your current delivery setup? Are you using GitOps, Argo, something else? What’s tying it all together?

Joseph Karp (26:41)
It’s a mix. We’ve decoupled CI from CD because we still have some legacy infrastructure—traditional VMs, older services. We use Jenkins for CI. I built a big shared library in Jenkins that loads the right pipeline for the type of service. That way, the teams responsible for each can maintain and test pipelines independently.

I can pull a service into a test environment, modify a pipeline, test it out, and only merge once it’s working. CI is separate. CD depends on the type of pipeline.

Most teams go through Harness. But infra teams—like those deploying HashiCorp Vault or other base services—go through Argo. Those deployments are more complex and need more control.

Infra teams also tend to deal with more complex Helm charts. Devs aren’t really modifying those. We just expose levers for scaling or resource limits. But for infra, they get full Helm control via Argo.

Ole Lensmar (28:51)
And how does CI trigger CD? Is it event-based, or artifact-based?

Joseph Karp (29:07)
We’re using Orkes Conductor, a Netflix OSS project. When Jenkins is done, it triggers a workflow in Orkes, which then coordinates the next steps. Or in some cases, Jenkins just pushes the artifact to Artifactory. Then the dev can go to our portal and choose what version to deploy.

So it’s flexible. Infra pipelines are connected and flow automatically. App devs get more control and can roll back if needed. But eventually, we’ll move to full CD, where everything is automated and orchestrated through our tooling.

Honestly, I’m very close to replacing Jenkins entirely. I’ve almost got all the pieces I need.

Ole Lensmar (30:05)
Very close—nice. So when it comes to CI testing, do app teams decide what to test? Or do you have centralized frameworks or guidelines?

Joseph Karp (30:37)
We do have a few frameworks. In CI, devs run unit and integration tests. They also use Testcontainers. The goal is to keep build times down. Some integration tests take a while, especially when you’re spinning up services with Testcontainers.

We’re exploring ways to run those in parallel or as side jobs to avoid blocking pipelines. For now, devs just test their service or a partially spun-up setup with Testcontainers.

On the DevOps side, we can override pipeline code, point it to a different branch, and test changes independently. That’s harder to test with conventional CI tools, especially when bash scripts or CLIs are involved.

So we treat it as a prototype service. We run it like a real one, but it’s never exposed live. It just runs through a full simulated pipeline to verify everything behaves correctly.

Ole Lensmar (32:27)
Awesome. Joe, this has been fantastic. Thanks for sharing your insights. What you’ve built is super impressive—mature, intentional, and really well thought out. I hope we can keep collaborating and learning from you, and hopefully Testkube can keep contributing to your success.

Joseph Karp (32:57)
It was a pleasure.

Ole Lensmar (32:58)
Thank you. Bye-bye.

People on this episode

Ole Lensmar

Host

Kelly Revenaugh

Producer