The Embedded Frontier

#019 – Modernizing Embedded Systems: Step #3 – Adopt DevOps

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 30:21
This podcast episode explores step three of modernizing embedded software development: adopting DevOps practices to solve the critical problem of late and over-budget project delivery in embedded systems. Host Jacob Beningo discusses the four core DevOps principles, presents a compelling case study of the Ariane 5 rocket failure that cost $500 million, and provides practical guidance for implementing CI/CD pipelines to improve software quality and delivery speed.

Key Takeaways:

• Only 35% of embedded development teams deliver projects on time, with most running 3-6 months late
• DevOps focuses on incremental value delivery, improved collaboration, automation, and continuous improvement
• The Ariane 5 rocket explosion ($500 million loss) could have been prevented with proper integration testing and CI/CD practices
• Start DevOps implementation with automated builds using containers to create unified development environments
• Enforce code quality and standards automatically within CI/CD pipelines using tools like Misra C/C++
• Implement regression testing to catch bugs early when they're easier and less costly to fix
• Use metrics analysis to automatically identify tight coupling and potential bug locations in code
• Artifact management ensures traceability and ability to deliver specific software versions to customers
• Deployment automation should at minimum enable automatic hardware testing, even if not direct customer deployment
• DevOps creates a value feedback loop between companies and customers through observability and telemetry

================================================================================
VIDEO TRANSCRIPT WITH TIMESTAMPS
================================================================================

[00:00:00.000 --> 00:00:06.280]
In the last several podcasts, we've been talking about modernizing embedded software.

[00:00:06.280 --> 00:00:09.200]
How do we get to the frontier of embedded systems?

[00:00:09.200 --> 00:00:11.040]
We do it through modernization, right?

[00:00:11.040 --> 00:00:14.359]
And adopting best practices and new techniques.

[00:00:14.359 --> 00:00:18.960]
Now today, what we are going to do is we are going to talk about step number three, which

[00:00:18.960 --> 00:00:22.080]
is adopting DevOps.

[00:00:22.080 --> 00:00:28.359]
DevOps is actually an important piece in step in modernizing your embedded systems development.

[00:00:28.359 --> 00:00:36.159]
The reason for DevOps, at least from my opinion, is that it is there to help us solve a very

[00:00:36.159 --> 00:00:38.079]
important problem.

[00:00:38.079 --> 00:00:47.480]
That problem really, I think, relates the fact that a lot of us embedded systems developers,

[00:00:47.480 --> 00:00:52.120]
we deliver our products, our projects late.

[00:00:52.120 --> 00:00:55.280]
Not just late, but often over budget.

[00:00:55.280 --> 00:01:00.800]
Sometimes it turns out there are only about 35% of embedded teams that deliver on time.

[00:01:00.800 --> 00:01:04.280]
Now that's pretty abysmal.

[00:01:04.280 --> 00:01:10.400]
I remember one time when I was an undergraduate and I took quantum physics, and I got a 9%

[00:01:10.400 --> 00:01:13.640]
on my first quantum physics exam.

[00:01:13.640 --> 00:01:14.640]
Not terribly happy.

[00:01:14.640 --> 00:01:16.640]
That's pretty low percentage.

[00:01:16.640 --> 00:01:22.099]
Now as it turned out, that ended up being a B. That was the second highest grade in the

[00:01:22.099 --> 00:01:27.219]
class, and the A person got 10% correct.

[00:01:27.219 --> 00:01:29.659]
Now, why am I telling you this story?

[00:01:29.659 --> 00:01:35.259]
I'm telling you the story because that is not the type of curve we want to be working

[00:01:35.259 --> 00:01:40.780]
on, or basing our work off of, when we are developing products.

[00:01:40.780 --> 00:01:45.599]
Just because only 35% of teams are making it, doesn't mean that we grade on a curve and

[00:01:45.599 --> 00:01:47.299]
say, well, that's okay because we're always late.

[00:01:47.299 --> 00:01:48.299]
We're always over budget.

[00:01:48.299 --> 00:01:49.299]
Who cares?

[00:01:49.299 --> 00:01:50.299]
It's just a reality.

[00:01:50.299 --> 00:01:52.340]
We're just grade based on the curve.

[00:01:52.340 --> 00:01:53.819]
That's not what we're doing here.

[00:01:53.819 --> 00:01:58.539]
As it turns out, more and more teams are actually delivering later and later.

[00:01:58.539 --> 00:02:03.219]
On average, most teams are three to six months late in delivering their embedded products.

[00:02:03.219 --> 00:02:05.620]
All right, so how do we fix this?

[00:02:05.620 --> 00:02:07.579]
Certainly, modernization helps, right?

[00:02:07.579 --> 00:02:12.500]
But specifically today, we're framing this for the perspective of DevOps.

[00:02:12.500 --> 00:02:16.019]
DevOps is actually all about improving efficiency.

[00:02:16.019 --> 00:02:19.219]
It's about helping us get there as efficiently as possible.

[00:02:19.219 --> 00:02:22.419]
It's about speeding up the way that we deliver software.

[00:02:22.419 --> 00:02:26.939]
It's about improving the quality of the software that we do actually deliver.

[00:02:26.939 --> 00:02:32.939]
So DevOps overall, this third step is about trying to get you to implement DevOps.

[00:02:32.939 --> 00:02:34.939]
Now, there's lots of ways that you can do this.

[00:02:34.939 --> 00:02:38.539]
Oftentimes, I'm still seeing that for more teams, just trying to get to the point where

[00:02:38.539 --> 00:02:42.419]
they're building automatically.

[00:02:42.419 --> 00:02:43.419]
That's okay.

[00:02:43.419 --> 00:02:44.939]
That's kind of your first step.

[00:02:45.340 --> 00:02:49.819]
Get in the door and make sure you can build automatically when you can make your code.

[00:02:49.819 --> 00:02:53.859]
But there's lots of other things that we can do that we'll talk about in today's webinar.

[00:02:53.859 --> 00:02:54.859]
Okay.

[00:02:54.859 --> 00:03:00.219]
Now, when we think about DevOps overall, I think it's important for us to realize that

[00:03:00.219 --> 00:03:05.740]
there's actually four principles that are actually going to guide all of your DevOps processes.

[00:03:05.740 --> 00:03:12.659]
The first process is that it's all about focusing on providing incremental value to the users

[00:03:12.659 --> 00:03:15.659]
or your customers in small and frequent iterations.

[00:03:15.659 --> 00:03:20.659]
Now, right after that first principle, you might be saying, Jacob, this is crazy because

[00:03:20.659 --> 00:03:22.740]
I'm an embedded software developer.

[00:03:22.740 --> 00:03:26.139]
You can't ship a single feature to the customer.

[00:03:26.139 --> 00:03:27.539]
You've got to have the drivers.

[00:03:27.539 --> 00:03:32.019]
You've got to have all of the middleware and all the pieces together as a whole to actually

[00:03:32.019 --> 00:03:34.099]
ship a whole product.

[00:03:34.099 --> 00:03:35.740]
And the answer is yes.

[00:03:35.740 --> 00:03:40.819]
But who is the user, your customer, what value are you providing?

[00:03:40.819 --> 00:03:44.780]
Oftentimes this can be incremental value that you're providing just to your company, even

[00:03:44.780 --> 00:03:47.419]
if you haven't shipped it to your customer yet.

[00:03:47.419 --> 00:03:51.579]
Other times you could think as a developer that my customer is actually my boss or its

[00:03:51.579 --> 00:03:52.579]
management.

[00:03:52.579 --> 00:03:53.819]
It's the company itself.

[00:03:53.819 --> 00:03:57.099]
And all you're trying to do is make sure that throughout your sprints, every two to four

[00:03:57.099 --> 00:04:02.780]
weeks that you are delivering new features and value at the end of each of those.

[00:04:02.780 --> 00:04:04.780]
Okay.

[00:04:04.780 --> 00:04:07.979]
The second principle is that we need to improve collaboration and communication between

[00:04:07.979 --> 00:04:10.699]
development and operational teams.

[00:04:10.699 --> 00:04:17.099]
So what you're going to find here is that it's not just about us embedded software folks

[00:04:17.099 --> 00:04:18.099]
or delivering the product.

[00:04:18.099 --> 00:04:21.019]
It's about improving collaboration and communication.

[00:04:21.019 --> 00:04:27.659]
Now in some cases, this is counterintuitive to a lot of what comes out of like best practices

[00:04:27.659 --> 00:04:28.659]
for big tech.

[00:04:28.659 --> 00:04:36.819]
Jeff Bezos, for example, has his two pizza team idea where teams shouldn't be larger than

[00:04:36.819 --> 00:04:39.980]
it takes to feed two pizzas.

[00:04:39.980 --> 00:04:43.500]
And the more time we spend collaborating and communicating is less time that we actually

[00:04:43.500 --> 00:04:44.740]
spend delivering.

[00:04:44.740 --> 00:04:48.220]
You want your teams to also be independent and making decisions.

[00:04:48.220 --> 00:04:54.700]
But at the same time, DevOps is designed to help improve communication across teams,

[00:04:54.700 --> 00:04:59.220]
specifically between quality and the development teams.

[00:04:59.220 --> 00:05:04.300]
We're trying to move testing from the very manual labor intensive to something that's more

[00:05:04.300 --> 00:05:09.259]
automated and it doesn't require as much effort.

[00:05:09.259 --> 00:05:12.459]
So that's the third principle is that we want to automate as much other software development

[00:05:12.459 --> 00:05:15.219]
life cycle as possible.

[00:05:15.219 --> 00:05:18.060]
Manual testing seems great in the beginning.

[00:05:18.060 --> 00:05:19.939]
It's very easy going test your features.

[00:05:19.939 --> 00:05:24.579]
But once you have 100 features than the vice, making sure you didn't break previous features,

[00:05:24.579 --> 00:05:27.459]
man, can that be a nightmare?

[00:05:27.459 --> 00:05:31.620]
There's not enough time or enough hours in the day to be able to go back and test all those

[00:05:31.620 --> 00:05:32.620]
features.

[00:05:32.620 --> 00:05:34.539]
So what do we do?

[00:05:34.539 --> 00:05:37.980]
We just cross our fingers and hope that they still work even though we've added a bunch

[00:05:37.980 --> 00:05:40.020]
of stuff.

[00:05:40.020 --> 00:05:44.180]
And you might think that's not affecting other features, but oftentimes, that's just

[00:05:44.180 --> 00:05:45.340]
a hope.

[00:05:45.340 --> 00:05:49.580]
And so the idea here with automation is that we can automate our tests through unit testing,

[00:05:49.580 --> 00:05:54.060]
system integration, through simulation testing, through hardware and loop testing, all of

[00:05:54.060 --> 00:05:57.939]
this so that it's automated as much as possible so that we can ensure that the quality of the

[00:05:57.939 --> 00:06:05.819]
software continues to grow higher and higher as we add new features into our projects.

[00:06:05.819 --> 00:06:10.099]
And the fourth principle of DevOps, which for embedded folks is very controversial,

[00:06:10.099 --> 00:06:14.379]
I think, is that we want to continuously improve the software product through delivery.

[00:06:14.379 --> 00:06:15.379]
Okay?

[00:06:15.379 --> 00:06:20.779]
The delivery side of things, you don't necessarily need to provide firmware updates on a daily

[00:06:20.779 --> 00:06:26.980]
basis to your internet-connected microwave or your internet-connected stove, right?

[00:06:26.980 --> 00:06:31.779]
Connectivity can be great, but a lot of times, if my basic use cases are being covered,

[00:06:31.779 --> 00:06:35.379]
there's no need to continuously deliver new firmware.

[00:06:35.379 --> 00:06:36.379]
Okay?

[00:06:36.379 --> 00:06:37.379]
In fact, sometimes that can be detrimental.

[00:06:37.379 --> 00:06:42.019]
You can break things at work or you can interrupt the user's behavior of the device, being

[00:06:42.019 --> 00:06:45.980]
able to use the device, which just ends up being kind of a big pain.

[00:06:45.980 --> 00:06:46.980]
Okay?

[00:06:46.980 --> 00:06:50.219]
So, continuous improvement, I think, is really important.

[00:06:50.219 --> 00:06:56.379]
Continuous delivery, that oftentimes is not as important for many firmware products.

[00:06:56.379 --> 00:07:00.339]
Not all, but majority of the ones that I encounter in the field.

[00:07:00.339 --> 00:07:01.339]
Okay?

[00:07:01.739 --> 00:07:06.500]
When we think about DevOps, those principles put together, you can really think about DevOps

[00:07:06.500 --> 00:07:12.339]
and composing, bringing quality assurance and testing together, bringing operations together,

[00:07:12.339 --> 00:07:17.699]
and also bringing developers together early on continuously through the development cycle.

[00:07:17.699 --> 00:07:18.699]
These things used to be siloed.

[00:07:18.699 --> 00:07:21.459]
They used to be almost like waterfall, right?

[00:07:21.459 --> 00:07:26.019]
You would develop your code, it would go to the quality assurance folks, and then once

[00:07:26.019 --> 00:07:30.539]
it passed that gate, then it would go to operations to be deployed into the field, right?

[00:07:30.540 --> 00:07:34.980]
Well, now we're saying no, no, no, we can do many cycles with this, and we can go much

[00:07:34.980 --> 00:07:40.220]
faster and develop higher quality code, get better feedback throughout this whole process.

[00:07:40.220 --> 00:07:41.220]
Okay?

[00:07:41.220 --> 00:07:47.900]
Now, I often look at DevOps in general as just a re-way of repurposing what the agile

[00:07:47.900 --> 00:07:49.860]
values actually were, right?

[00:07:49.860 --> 00:07:54.900]
Those four principles, if you look at them, they're very similar to saying that we want

[00:07:54.899 --> 00:08:00.899]
to focus on individuals and interactions over processes and tools.

[00:08:00.899 --> 00:08:03.899]
We want working software over comprehensive documentation.

[00:08:03.899 --> 00:08:08.179]
We want customer collaboration over contract negotiation.

[00:08:08.179 --> 00:08:12.859]
We want to respond to change over following a plan, okay?

[00:08:12.859 --> 00:08:17.500]
That's not saying processing tools, comprehensive documentation, contract negotiation, following

[00:08:17.500 --> 00:08:20.739]
a plan aren't important parts of the development cycle.

[00:08:20.739 --> 00:08:23.019]
It just means that's not where our focus should be.

[00:08:23.019 --> 00:08:29.139]
And DevOps is really about us trying to tighten that cycle up and delivering faster firmware

[00:08:29.139 --> 00:08:30.859]
to our customers, okay?

[00:08:30.859 --> 00:08:34.460]
Or delivering faster, higher quality software.

[00:08:34.460 --> 00:08:40.939]
Now when we think about DevOps principles in action, there's a great diagram from, I

[00:08:40.939 --> 00:08:45.480]
think I found it on like Amazon's website when they were talking about what DevOps was

[00:08:45.480 --> 00:08:47.259]
and testing and that sort of thing.

[00:08:47.259 --> 00:08:51.059]
But I've modified it to really kind of show off what I think is important for us as

[00:08:51.059 --> 00:08:52.739]
embedded software developers.

[00:08:52.739 --> 00:08:59.339]
Ultimately, what we end up with is two actors in DevOps overall, okay?

[00:08:59.339 --> 00:09:03.219]
The company we work for and we have our customer, okay?

[00:09:03.219 --> 00:09:08.379]
The company itself wants to deliver features and value to our customer.

[00:09:08.379 --> 00:09:14.099]
Our customer has a problem and our product is solving and provide a solution to that

[00:09:14.099 --> 00:09:15.419]
problem.

[00:09:15.419 --> 00:09:19.979]
Now that solution ends up being in a form of something like a product with software features

[00:09:19.980 --> 00:09:24.060]
that allow them to do whatever it needs to be, whether it's controlling an engine, whether

[00:09:24.060 --> 00:09:29.820]
it's running a ventilator, whether it's driving a car, you know, allowing someone to listen

[00:09:29.820 --> 00:09:34.740]
to what radio in their car, whatever the features that happen to be, okay?

[00:09:34.740 --> 00:09:40.539]
What we think about how to create a DevOps pipeline, the steps and the sequences that

[00:09:40.539 --> 00:09:47.139]
are necessary to develop the software, test it and actually deliver it to the end customer,

[00:09:47.139 --> 00:09:52.580]
it starts to look like a delivery pipeline, okay?

[00:09:52.580 --> 00:09:56.419]
And so it ends up being essentially if you think in your mind of a message queue almost

[00:09:56.419 --> 00:10:01.259]
between like a company and a customer moving from left to right, we end up with this delivery

[00:10:01.259 --> 00:10:06.659]
pipeline that needs to be able to automate our builds, it needs to be able to test our

[00:10:06.659 --> 00:10:11.700]
software and it needs to be able to deploy it automatically, okay?

[00:10:11.700 --> 00:10:16.740]
When I first started to develop embedded systems 20 years ago, delivery always involves us

[00:10:16.740 --> 00:10:20.580]
putting some firmware image up on a website somewhere where the customers could go download

[00:10:20.580 --> 00:10:25.539]
it, get special tools, go through this process that hopefully updated the firmware and then

[00:10:25.539 --> 00:10:28.340]
they'd have the latest and greatest firmware on their devices.

[00:10:28.340 --> 00:10:33.419]
Today with the internet and being constantly connected, that process is completely streamlined.

[00:10:33.419 --> 00:10:37.980]
All we do is push the latest image to a firmware up to up to the cloud and then allow our

[00:10:37.980 --> 00:10:43.940]
OTA processes to take effect and distribute that firmware update to the various groups and

[00:10:43.940 --> 00:10:46.060]
devices that we have out in the field, okay?

[00:10:46.059 --> 00:10:49.939]
So that's much more automated than it used to be.

[00:10:49.939 --> 00:10:56.179]
The idea is that hopefully at that point, it's this automated delivery pipeline, the customer

[00:10:56.179 --> 00:11:01.179]
gets the latest features and products and they're able to use and enjoy and get greater

[00:11:01.179 --> 00:11:03.979]
and greater value out of the product.

[00:11:03.979 --> 00:11:10.899]
Now thinking about this strictly from a company to customer standpoint, there's a problem.

[00:11:10.899 --> 00:11:17.579]
The problem is that this is a one directional process, right?

[00:11:17.579 --> 00:11:21.860]
I'm delivering to the customer, but if I want to continue to provide value to the customer,

[00:11:21.860 --> 00:11:24.699]
I want to improve their experience with the product.

[00:11:24.699 --> 00:11:27.419]
I need this to be bidirectional.

[00:11:27.419 --> 00:11:30.699]
I need a feedback pipeline.

[00:11:30.699 --> 00:11:36.100]
This is where you're seeing a lot with observability today in embedded systems.

[00:11:36.100 --> 00:11:40.579]
You see things like Persepio's Dev Alert, for example, that monitors the behavior of

[00:11:40.580 --> 00:11:48.020]
a system in the field and allows the company to get data and back, telemetry back from

[00:11:48.020 --> 00:11:52.860]
the devices in the field to understand not just how the customer is using the product,

[00:11:52.860 --> 00:11:57.020]
but to understand what features they're using, to understand the performance of the device,

[00:11:57.020 --> 00:12:01.500]
and to try to also figure out what might be missing that could make the life of the user

[00:12:01.500 --> 00:12:03.660]
better, okay?

[00:12:03.660 --> 00:12:07.940]
And so we look at that feedback pipeline as a way to monitor how the system is behaving.

[00:12:07.940 --> 00:12:12.180]
It generates reports, which then feed back to the company and provide new features, new

[00:12:12.180 --> 00:12:18.700]
ideas, new development processes that maybe we use to then build, test and deploy even

[00:12:18.700 --> 00:12:20.020]
better firmware, right?

[00:12:20.020 --> 00:12:22.500]
And better product features to the customer.

[00:12:22.500 --> 00:12:28.100]
Now this whole thing, like I mentioned, it closes in and provides observability to our

[00:12:28.100 --> 00:12:30.900]
company to understand how the customer is using it.

[00:12:30.900 --> 00:12:35.260]
This creates what I call a value feedback loop, okay?

[00:12:35.259 --> 00:12:38.580]
We have a feedback loop where the company provides value to the customer.

[00:12:38.580 --> 00:12:44.500]
The customer provides feedback through observability to the company, which then allows us to create

[00:12:44.500 --> 00:12:49.220]
even better value for the customer, which we then deliver, which then this loop continues

[00:12:49.220 --> 00:12:54.899]
over and over in a feedback loop that allows us to provide a lot of value to the customer,

[00:12:54.899 --> 00:13:00.299]
allows us to continuously deliver, continuously deploy, continuously integrate our software,

[00:13:00.299 --> 00:13:05.500]
making it so that we can get a minimum viable product into the field and then continue

[00:13:05.500 --> 00:13:10.779]
to update it and upgrade it and improve the experience for the customer, okay?

[00:13:10.779 --> 00:13:15.179]
We don't have to go and ship 100 billion features to the customer all at once.

[00:13:15.179 --> 00:13:19.459]
We can start with a minimum viable product, verify that the customer enjoys the product,

[00:13:19.459 --> 00:13:22.459]
uses it, wants it and then build on it from there.

[00:13:22.459 --> 00:13:27.299]
I've seen many companies over my career that try to deliver everything and the kitchen

[00:13:27.299 --> 00:13:29.299]
sink on their first go.

[00:13:29.299 --> 00:13:32.979]
And then they get in the field and they discover the customer doesn't even want any of this.

[00:13:32.979 --> 00:13:36.579]
We completely went in the wrong direction and just wasted two years of development, cost,

[00:13:36.579 --> 00:13:39.740]
money, et cetera, and now we're in trouble, right?

[00:13:39.740 --> 00:13:41.219]
You don't want to get in that situation.

[00:13:41.219 --> 00:13:46.699]
And that's where DevOps, this modernization, you know, it can help you from that perspective

[00:13:46.699 --> 00:13:49.539]
so that you understand what your customer wants and needs.

[00:13:49.539 --> 00:13:54.979]
But on top of that, it also allows you to improve the quality of your software, deliver

[00:13:54.980 --> 00:14:01.300]
faster and keep the quality very high and the bugs to a minimum, all right?

[00:14:01.300 --> 00:14:04.220]
Okay, so this is great, you know, right?

[00:14:04.220 --> 00:14:06.740]
Everyone's been talking about DevOps for a while.

[00:14:06.740 --> 00:14:10.420]
How does this actually apply to embedded developers?

[00:14:10.420 --> 00:14:13.340]
You might be saying, well, yeah, I can build on my own, whatever.

[00:14:13.340 --> 00:14:16.899]
This whole idea of unit testing, you know, one of the big pieces of this delivery pipeline

[00:14:16.899 --> 00:14:20.500]
is having unit tests, hardware and loop testing that's all automated.

[00:14:20.500 --> 00:14:21.899]
And you might say, well, that's too expensive.

[00:14:21.899 --> 00:14:23.220]
It's not worth it.

[00:14:23.940 --> 00:14:27.940]
You know, has anyone ever really had a problem where the continuous integration pipeline

[00:14:27.940 --> 00:14:32.500]
caught the problem or could have actually prevented the issues in the first place, right?

[00:14:32.500 --> 00:14:36.620]
Well, I'm going to share one of my favorite case studies with you.

[00:14:36.620 --> 00:14:40.019]
I spent, you know, I get an opportunity to work on a lot of different software,

[00:14:40.019 --> 00:14:42.540]
a lot of different embedded products and a lot of different industries.

[00:14:42.540 --> 00:14:48.300]
One of my favorite industries to work in is medical devices and space systems, along with the fence.

[00:14:48.300 --> 00:14:51.420]
Okay, I really like to work on things, both, I like to work on a lot of different things.

[00:14:51.419 --> 00:14:52.419]
I'll leave that at that.

[00:14:52.419 --> 00:15:03.500]
But there was several years ago, they were developing the Arian 5 rocket.

[00:15:03.500 --> 00:15:09.579]
Okay, and this particular rocket costs about $500 million to launch.

[00:15:09.579 --> 00:15:18.779]
All right, and it's very first launch, about 37 seconds into the launch, the rocket exploded.

[00:15:18.779 --> 00:15:20.099]
Okay.

[00:15:20.100 --> 00:15:23.340]
There goes $500 million, right, up in fireworks.

[00:15:23.340 --> 00:15:28.100]
Now, when we look at that, man, that seems like a really big problem, but, you know,

[00:15:28.100 --> 00:15:33.259]
space is hard, rockets explode, you know, whatever, things happen, right?

[00:15:33.259 --> 00:15:38.860]
Except for the fact, when you start to look at the root cause here, the root cause of the

[00:15:38.860 --> 00:15:46.220]
loss of the vehicle goes back to the fact that they were using an approving navigation system

[00:15:46.220 --> 00:15:48.379]
from the Arian 4.

[00:15:48.379 --> 00:15:52.059]
They used the code that they had already proven on previous flights, worked.

[00:15:52.059 --> 00:15:56.659]
And they just said, well, let's take this, we'll put it on the new processor and within

[00:15:56.659 --> 00:16:00.939]
the new system, and we don't need to test it because we already know that it works, right?

[00:16:00.939 --> 00:16:02.740]
Well, wrong.

[00:16:02.740 --> 00:16:09.659]
What ended up happening was that the Arian 4 code on the Arian 5, even though it was proven

[00:16:09.659 --> 00:16:13.100]
code, the flight profiles were different.

[00:16:13.100 --> 00:16:15.980]
What ended up happening was there was an integer overflow.

[00:16:15.980 --> 00:16:22.019]
So there was a 64-bit floating point value that was converted to a 16-bit signed integer.

[00:16:22.019 --> 00:16:23.019]
Okay?

[00:16:23.019 --> 00:16:27.220]
So the value was larger than expected due to a difference in the flight trajectories between

[00:16:27.220 --> 00:16:30.940]
the way that they flew the Arian 4 and the Arian 5.

[00:16:30.940 --> 00:16:35.420]
Now this is, of course, a problem, right?

[00:16:35.420 --> 00:16:36.980]
But you might say, well, that's not a big deal.

[00:16:36.980 --> 00:16:39.019]
These spacecraft, they're these rockets.

[00:16:39.019 --> 00:16:42.060]
They typically have backup computers on them, right?

[00:16:42.060 --> 00:16:44.659]
You have two different computers, or safety critical systems, right?

[00:16:44.819 --> 00:16:49.339]
So you have two different computers that are supposed to be checking each other's work

[00:16:49.339 --> 00:16:53.059]
and make sure that they arrive at the correct answers.

[00:16:53.059 --> 00:17:01.339]
Well, as it turned out, the backup computer had the exact same identical code as the primary

[00:17:01.339 --> 00:17:02.339]
computer.

[00:17:02.339 --> 00:17:03.339]
Okay?

[00:17:03.339 --> 00:17:09.659]
And so at that point, because it was identical, it failed in the exact same way.

[00:17:09.659 --> 00:17:11.460]
And so I mean, that's certainly another lesson, right?

[00:17:11.460 --> 00:17:15.740]
If you have a safety critical system, have two different teams write the code, right?

[00:17:15.740 --> 00:17:19.660]
For one, that's not really the lesson we're trying to discuss today.

[00:17:19.660 --> 00:17:23.700]
But kind of bringing it back, what was the problem here?

[00:17:23.700 --> 00:17:28.180]
We had working code, we ended up with overflows.

[00:17:28.180 --> 00:17:30.420]
How could we have prevented this?

[00:17:30.420 --> 00:17:36.019]
Well, there was no integration testing of reuse components in the new context for the Arian

[00:17:36.019 --> 00:17:37.660]
5 rocket.

[00:17:37.660 --> 00:17:42.500]
And so had they gone through and done additional system testing, had they leveraged at least

[00:17:42.500 --> 00:17:46.740]
continuous integration, maybe not continuous deployment, but at least continuous integration,

[00:17:46.740 --> 00:17:52.259]
they should have been able to verify that, oh, there's a problem here.

[00:17:52.259 --> 00:17:54.540]
But they didn't do that.

[00:17:54.540 --> 00:17:57.460]
They didn't have those processes in place.

[00:17:57.460 --> 00:18:02.460]
It certainly just occurred, you know, happened prior to the existence of CICD in its modern

[00:18:02.460 --> 00:18:03.460]
form.

[00:18:04.180 --> 00:18:11.579]
But ultimately, modern CICD with tests, would have tested components, would have done integration

[00:18:11.579 --> 00:18:18.779]
testing, all automatically in the actual test environment that the spacecraft would have

[00:18:18.779 --> 00:18:19.980]
experienced.

[00:18:19.980 --> 00:18:23.340]
And those test cases would most likely have exposed this fact.

[00:18:23.340 --> 00:18:28.579]
And the 64-bit floating point value was being converted to a 16-bit sign integer, and

[00:18:28.579 --> 00:18:32.740]
that it was overflowing and causing problems with the spacecraft.

[00:18:33.740 --> 00:18:35.700]
All of that could have been automated.

[00:18:35.700 --> 00:18:38.620]
One of the reasons why they didn't go through this level of system testing was because

[00:18:38.620 --> 00:18:40.819]
they were doing everything manual.

[00:18:40.819 --> 00:18:44.539]
And they said, oh, well, we don't have the money and budget in the time to go through

[00:18:44.539 --> 00:18:46.460]
and do this testing, right?

[00:18:46.460 --> 00:18:48.140]
Famous last words.

[00:18:48.140 --> 00:18:50.019]
We don't have the time and budget.

[00:18:50.019 --> 00:18:55.779]
Well, they at least had enough time and budget to blow up a $500 million rocket and have

[00:18:55.779 --> 00:18:57.259]
to redo it all over again.

[00:18:57.259 --> 00:19:01.339]
Not to mention whatever the payload may have been on the rocket in the first place.

[00:19:02.339 --> 00:19:08.299]
Automated testing would likely have caught the edge cases across various scenarios.

[00:19:08.299 --> 00:19:13.139]
And so the lesson really to learn here is that integration testing can't be skipped,

[00:19:13.139 --> 00:19:15.539]
even for proven code.

[00:19:15.539 --> 00:19:20.259]
And so as embedded software developers, I really think that makes us step back and say,

[00:19:20.259 --> 00:19:23.419]
OK, I can't do this automated process anymore.

[00:19:23.419 --> 00:19:28.379]
And certainly you might have not half $500 million on the device that you're working on,

[00:19:28.380 --> 00:19:31.780]
but you certainly have users who want to make sure that the code works the way that's

[00:19:31.780 --> 00:19:36.260]
opposed to that doesn't cause them an inconvenience that doesn't damage your brand because they

[00:19:36.260 --> 00:19:40.780]
talk poorly about how bad the system is that you've developed or how late it was delivered.

[00:19:40.780 --> 00:19:42.580]
You don't want those types of things.

[00:19:42.580 --> 00:19:43.580]
OK.

[00:19:43.580 --> 00:19:49.380]
And so this modernization of DevOps, you don't necessarily have to do it all at once.

[00:19:49.380 --> 00:19:56.140]
What I'm suggesting is that you go through a phased process when you implement DevOps.

[00:19:56.140 --> 00:19:57.140]
OK.

[00:19:57.140 --> 00:20:03.900]
Maybe the first thing to do is to actually identify what are your main goals for DevOps, right?

[00:20:03.900 --> 00:20:06.740]
Certainly you want to be able to build your code.

[00:20:06.740 --> 00:20:10.220]
You might want to be able to perform unit tests automatically.

[00:20:10.220 --> 00:20:12.580]
You might want to be able to simulate a test.

[00:20:12.580 --> 00:20:13.580]
Maybe not.

[00:20:13.580 --> 00:20:16.860]
You might want to be able to do hardware and loop testing automatically.

[00:20:16.860 --> 00:20:20.100]
And then you may or may not want to be able to push automatically to your customers in

[00:20:20.100 --> 00:20:21.100]
devices.

[00:20:21.100 --> 00:20:22.100]
OK.

[00:20:22.100 --> 00:20:25.380]
Every team's goals are going to be slightly different from the others.

[00:20:25.380 --> 00:20:30.740]
So my recommendation for you is that you at least start by saying, here's what my goals

[00:20:30.740 --> 00:20:31.740]
are.

[00:20:31.740 --> 00:20:36.340]
Because how you identify what your goals for your CI, CD and for your DevOps processes

[00:20:36.340 --> 00:20:41.260]
are going to be, that's going to really dictate what you do to design your CI, CD solution.

[00:20:41.260 --> 00:20:46.540]
Now the one thing I would say you should probably look at is at least automating your builds.

[00:20:46.540 --> 00:20:47.540]
OK.

[00:20:47.540 --> 00:20:48.540]
What does that buy you?

[00:20:48.540 --> 00:20:51.980]
Well, automated build, make sure that you can consistently build your code.

[00:20:51.980 --> 00:20:52.980]
All right.

[00:20:53.620 --> 00:20:57.460]
It helps to produce manual processes.

[00:20:57.460 --> 00:20:58.539]
Right.

[00:20:58.539 --> 00:21:03.259]
And it can actually have automated builds.

[00:21:03.259 --> 00:21:09.059]
Oftentimes with our DevOps pipelines, we will use containers to create an environment that

[00:21:09.059 --> 00:21:10.059]
we can reuse.

[00:21:10.059 --> 00:21:14.460]
Now, those containers, whether it's Docker or Podmin or whatever container or technology

[00:21:14.460 --> 00:21:19.740]
you want to use, that often creates a unified development environment.

[00:21:19.740 --> 00:21:23.099]
So when you get a new developer on board, they don't have to install a whole bunch of tools

[00:21:23.099 --> 00:21:27.500]
and libraries and all that fun stuff for three days so that they can get up and running.

[00:21:27.500 --> 00:21:29.259]
You hand them the container.

[00:21:29.259 --> 00:21:33.980]
And now they have a development environment that's ready to rock and roll.

[00:21:33.980 --> 00:21:36.700]
So a container right there can save up time.

[00:21:36.700 --> 00:21:43.059]
It can make sure and eliminate the issue of, it works fine on my machine.

[00:21:43.059 --> 00:21:47.660]
I can't tell you over the course of my career, how many times early on we had issues where

[00:21:47.660 --> 00:21:51.259]
it's working fine for me, but you give it to your colleague and it's hard, you're getting

[00:21:51.259 --> 00:21:52.259]
hard faults.

[00:21:52.259 --> 00:21:53.259]
Well, why is that?

[00:21:53.259 --> 00:22:01.500]
Oh, they have a compiler version that's 0.0.1 different than the one that we're currently

[00:22:01.500 --> 00:22:02.980]
using, right?

[00:22:02.980 --> 00:22:04.980]
Or some other goofy thing.

[00:22:04.980 --> 00:22:09.019]
Oh, they installed some other library or their systems configured in a way that it affects

[00:22:09.019 --> 00:22:11.940]
the build system and things are just breaking.

[00:22:11.940 --> 00:22:13.860]
It gets rid of all of that.

[00:22:13.859 --> 00:22:18.819]
So using containers makes it so that developers are all working in the exact same environment.

[00:22:18.819 --> 00:22:20.339]
And that's just at least the same build environment.

[00:22:20.339 --> 00:22:24.539]
You can all be using VS Code or whatever it is that you want to use for your interface

[00:22:24.539 --> 00:22:25.539]
to develop.

[00:22:25.539 --> 00:22:30.500]
Your workflow can be totally different, but the key is your build environment is the same.

[00:22:30.500 --> 00:22:32.539]
Now that doesn't just help developers.

[00:22:32.539 --> 00:22:38.699]
That helps us move to CICD because in the build process, we can use those containers to

[00:22:38.699 --> 00:22:42.179]
verify that we can build the software successfully.

[00:22:42.180 --> 00:22:46.100]
So at least make sure that automated builds are in place.

[00:22:46.100 --> 00:22:51.100]
Super low hanging fruit, especially with today's technology.

[00:22:51.100 --> 00:22:55.740]
From there, figure out how frequently you want to integrate your system.

[00:22:55.740 --> 00:23:00.220]
The more often you integrate, the earlier your error detection is going to be.

[00:23:00.220 --> 00:23:04.700]
The sooner you catch a bug, the easier it is and the less costly it is, and even less

[00:23:04.700 --> 00:23:07.940]
timely to fix the bug.

[00:23:07.940 --> 00:23:12.460]
So figure out what does that get strategy look like?

[00:23:12.460 --> 00:23:19.259]
What is the rate at which you're going to integrate your code and test it?

[00:23:19.259 --> 00:23:24.820]
Other things I like to do, I like to enforce code quality or at least code standards within

[00:23:24.820 --> 00:23:25.980]
my pipelines.

[00:23:25.980 --> 00:23:30.059]
So oftentimes I will make sure I can build the code.

[00:23:30.059 --> 00:23:36.580]
I will analyze it to ensure that it is adhering to the coding style that everyone is supposed

[00:23:36.580 --> 00:23:37.580]
to be using.

[00:23:37.579 --> 00:23:44.500]
And that it is also meeting coding standards like maybe Mizra's C or Mizra's C++, maybe

[00:23:44.500 --> 00:23:52.500]
CERT, or if we're using C++, some of the C++ style guides like the Google's or some of

[00:23:52.500 --> 00:23:58.139]
the ones that are put out by the C++ folks.

[00:23:58.139 --> 00:24:03.059]
So you can enforce your coding standards and styles within your pipelines.

[00:24:03.059 --> 00:24:06.299]
A lot of pipelines today are starting to adopt AI.

[00:24:06.299 --> 00:24:10.579]
So that means that things that are found wrong, you could automatically have them use

[00:24:10.579 --> 00:24:13.740]
AI to help fix it.

[00:24:13.740 --> 00:24:18.299]
You could have it fixed styles, you could have it add documentation, all of those types

[00:24:18.299 --> 00:24:19.299]
of things.

[00:24:19.299 --> 00:24:24.339]
But the ultimate idea here is that you should be improving the code quality of your system.

[00:24:24.339 --> 00:24:25.339]
All right.

[00:24:25.339 --> 00:24:27.539]
And we can perform those checks automatically.

[00:24:27.539 --> 00:24:32.259]
It used to be we'd sit down in a room and spend hours doing code reviews, right?

[00:24:32.740 --> 00:24:39.779]
Well, you can leverage your pipeline to perform automatic analyses, to do metrics analysis.

[00:24:39.779 --> 00:24:42.099]
I can't tell you how many times I've gone into a team.

[00:24:42.099 --> 00:24:45.579]
And one of the first things I do is I perform metrics analysis of their software code

[00:24:45.579 --> 00:24:46.740]
basis.

[00:24:46.740 --> 00:24:52.180]
And the metrics that I get out of that tell me where the tight coupling is in their software,

[00:24:52.180 --> 00:24:55.339]
where the bugs are most likely to be, and so on and so forth.

[00:24:55.339 --> 00:24:57.500]
I don't even have to look at their code.

[00:24:57.500 --> 00:24:59.940]
I can just pull some metrics.

[00:24:59.940 --> 00:25:03.580]
And I analyze the metrics and then that points me to the areas of their code that I should

[00:25:03.580 --> 00:25:05.940]
actually take a look at.

[00:25:05.940 --> 00:25:10.420]
You can have this automatically done for you using a CI CD pipeline.

[00:25:10.420 --> 00:25:15.100]
And it will improve the overall quality of your software.

[00:25:15.100 --> 00:25:18.580]
Another thing, regression testing.

[00:25:18.580 --> 00:25:23.180]
I can't tell you how many times again where I've added a new feature to the product, only

[00:25:23.180 --> 00:25:27.820]
to discover I broke a different existing feature.

[00:25:27.819 --> 00:25:32.419]
It's a little bit more in the past than more recently, but it still happens.

[00:25:32.419 --> 00:25:36.819]
But if you have regression tests that run automatically in your pipeline, it'll catch

[00:25:36.819 --> 00:25:37.819]
that.

[00:25:37.819 --> 00:25:38.819]
All right.

[00:25:38.819 --> 00:25:42.539]
And then you can catch it right away, say, oh man, what happened?

[00:25:42.539 --> 00:25:44.099]
Oh yeah, this is what's going on here.

[00:25:44.099 --> 00:25:46.899]
Let me fix that real quick game over.

[00:25:46.899 --> 00:25:49.299]
Versus someone finding the bug three months from now.

[00:25:49.299 --> 00:25:52.419]
And you have no clue what just changed in that code.

[00:25:52.419 --> 00:25:53.779]
What might have caused that bug?

[00:25:53.779 --> 00:25:54.779]
What was added?

[00:25:54.779 --> 00:25:55.779]
What was removed?

[00:25:55.779 --> 00:25:56.779]
I don't know.

[00:25:56.819 --> 00:26:02.779]
Just will delete the bug on my running software starting...

[00:26:02.779 --> 00:26:09.700]
What the bug may be like for example.

[00:26:09.700 --> 00:26:17.819]
I don't know what, which machine structure I Indianazek Boehm or whatever magic means,

[00:26:17.819 --> 00:26:18.819]
horrible.

[00:26:18.819 --> 00:26:20.539]
This mess is my real trick.

[00:26:20.539 --> 00:26:22.539]
the

[00:26:22.539 --> 00:26:26.700]
You know, elf files, binaries, whatever it is you're producing, whatever the artifact output is

[00:26:26.700 --> 00:26:29.819]
We want to be able to store that somewhere in a safe and traceable manner

[00:26:30.460 --> 00:26:35.740]
It might not be that in every case you need to have really strict tracing requirements or traceability

[00:26:36.139 --> 00:26:40.460]
To be able to prove what code and what version of software was being ran on a system, okay?

[00:26:40.740 --> 00:26:42.579]
not every

[00:26:42.579 --> 00:26:44.579]
Embedded system is heavily regulated

[00:26:44.659 --> 00:26:45.619]
but

[00:26:45.619 --> 00:26:47.619]
It's still useful for yourself internally

[00:26:48.179 --> 00:26:53.619]
To be able to manage those artifacts in a way that you can easily track and get back to a specific state if you need to

[00:26:54.019 --> 00:26:56.019]
There's been times where I delivered

[00:26:56.019 --> 00:27:01.659]
Code to a customer and then they came back four months later asking for a specific version that was you know

[00:27:01.659 --> 00:27:06.779]
Because there was some feature that we had added or that they you know that they want to go back and not have that feature for some other customer

[00:27:07.299 --> 00:27:09.699]
If I didn't have good artifact management I

[00:27:10.419 --> 00:27:16.500]
Want to be able to go back and give them the right code right so artifact management. I think is really important

[00:27:17.299 --> 00:27:21.539]
And then the piece that I think is pretty optional is deployment automation

[00:27:22.259 --> 00:27:26.140]
We don't necessarily need to deploy automatically to our end customers

[00:27:26.140 --> 00:27:31.859]
I would argue that a highly that I shouldn't even say highly sophisticated. I would argue that a

[00:27:32.579 --> 00:27:36.700]
Well developed and defined DevOps process

[00:27:37.619 --> 00:27:44.460]
Would at least deploy so that you can automatically test your your software on the hardware, okay?

[00:27:45.460 --> 00:27:52.740]
At a minimal from there if it goes to the customer cool, you know at least have it test go through a release process and then

[00:27:52.740 --> 00:27:57.819]
It's in some way it finds its way to any customer but have some type of deployment automation built into your pipeline

[00:27:58.779 --> 00:28:04.299]
Okay, now this of course creates CI and it creates a CD options for us

[00:28:05.660 --> 00:28:07.660]
From there

[00:28:07.660 --> 00:28:13.340]
You know you just have to decide what you need in your CIT CICD pipelines in your DevOps processes

[00:28:13.579 --> 00:28:16.059]
Actually go through and implement them, okay?

[00:28:17.099 --> 00:28:19.099]
Now for today

[00:28:19.259 --> 00:28:25.819]
That's about as deep into DevOps as I'm going to get I hope this at least gives you some ideas of what you can do in this third step

[00:28:26.220 --> 00:28:28.699]
As you try to modernize the way that you develop your firmware

[00:28:29.339 --> 00:28:32.220]
Now some of the stuff again is we're looking at it from a high level

[00:28:33.179 --> 00:28:35.179]
But if you need help

[00:28:35.179 --> 00:28:39.419]
Implementing it or getting ideas or doing the implementation actually getting it done

[00:28:39.820 --> 00:28:43.500]
Feel free to reach out to me at jgbap.com. I do a lot of consulting training

[00:28:44.220 --> 00:28:47.660]
I even do design review so if you design something but you're not sure if it will work

[00:28:48.380 --> 00:28:50.380]
You know give me a call. I'm happy to help you

[00:28:50.860 --> 00:28:55.740]
You know, it's what I'm here for. I'm trying to help you develop faster, smarter firmware

[00:28:56.220 --> 00:29:03.259]
And the way that I see that happening for a lot of teams today in the industry is through modernizing the way that they develop their embedded systems

[00:29:03.740 --> 00:29:09.019]
So all right, thanks for your time and attention today and look forward next time

[00:29:09.820 --> 00:29:15.660]
We are going to be talking in a little bit more detail about cicd. I'll have a special guest that will talk to me

[00:29:16.460 --> 00:29:19.900]
That will talk to us about their experiences so that you can get a little bit

[00:29:20.460 --> 00:29:26.300]
Some other additional ideas outside my own experiences of developing embedded systems using DevOps

[00:29:26.940 --> 00:29:28.940]
Until then