The Embedded Frontier
The Embedded Frontier, hosted by embedded systems expert Jacob Beningo, is a cutting-edge podcast dedicated to exploring the rapidly evolving world of embedded software and embedded system trends. Each episode delves into the latest technological advancements, industry standards, and innovative strategies that are shaping the future of embedded systems. Jacob Beningo, with his deep industry knowledge and experience, guides listeners through complex topics, making them accessible for both seasoned developers and newcomers alike.
This podcast serves as an educational platform, offering insights, interviews, and discussions with leading experts and innovators in the field. Listeners can expect to gain valuable knowledge on how to modernize their embedded software, implement best practices, and stay ahead in this dynamic and critical sector of technology. Whether you're an embedded software developer, a systems engineer, or simply a tech enthusiast, "The Embedded Frontier" is your go-to source for staying updated and inspired in the world of embedded systems. Join Jacob Beningo as he navigates the intricate and fascinating landscape of embedded technologies, providing a unique blend of technical expertise, industry updates, and practical advice.
The Embedded Frontier
#019 – Modernizing Embedded Systems: Step #3 – Adopt DevOps
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Key Takeaways:
• Only 35% of embedded development teams deliver projects on time, with most running 3-6 months late
• DevOps focuses on incremental value delivery, improved collaboration, automation, and continuous improvement
• The Ariane 5 rocket explosion ($500 million loss) could have been prevented with proper integration testing and CI/CD practices
• Start DevOps implementation with automated builds using containers to create unified development environments
• Enforce code quality and standards automatically within CI/CD pipelines using tools like Misra C/C++
• Implement regression testing to catch bugs early when they're easier and less costly to fix
• Use metrics analysis to automatically identify tight coupling and potential bug locations in code
• Artifact management ensures traceability and ability to deliver specific software versions to customers
• Deployment automation should at minimum enable automatic hardware testing, even if not direct customer deployment
• DevOps creates a value feedback loop between companies and customers through observability and telemetry
================================================================================
VIDEO TRANSCRIPT WITH TIMESTAMPS
================================================================================
[00:00:00.000 --> 00:00:06.280]
In the last several podcasts, we've been talking about modernizing embedded software.
[00:00:06.280 --> 00:00:09.200]
How do we get to the frontier of embedded systems?
[00:00:09.200 --> 00:00:11.040]
We do it through modernization, right?
[00:00:11.040 --> 00:00:14.359]
And adopting best practices and new techniques.
[00:00:14.359 --> 00:00:18.960]
Now today, what we are going to do is we are going to talk about step number three, which
[00:00:18.960 --> 00:00:22.080]
is adopting DevOps.
[00:00:22.080 --> 00:00:28.359]
DevOps is actually an important piece in step in modernizing your embedded systems development.
[00:00:28.359 --> 00:00:36.159]
The reason for DevOps, at least from my opinion, is that it is there to help us solve a very
[00:00:36.159 --> 00:00:38.079]
important problem.
[00:00:38.079 --> 00:00:47.480]
That problem really, I think, relates the fact that a lot of us embedded systems developers,
[00:00:47.480 --> 00:00:52.120]
we deliver our products, our projects late.
[00:00:52.120 --> 00:00:55.280]
Not just late, but often over budget.
[00:00:55.280 --> 00:01:00.800]
Sometimes it turns out there are only about 35% of embedded teams that deliver on time.
[00:01:00.800 --> 00:01:04.280]
Now that's pretty abysmal.
[00:01:04.280 --> 00:01:10.400]
I remember one time when I was an undergraduate and I took quantum physics, and I got a 9%
[00:01:10.400 --> 00:01:13.640]
on my first quantum physics exam.
[00:01:13.640 --> 00:01:14.640]
Not terribly happy.
[00:01:14.640 --> 00:01:16.640]
That's pretty low percentage.
[00:01:16.640 --> 00:01:22.099]
Now as it turned out, that ended up being a B. That was the second highest grade in the
[00:01:22.099 --> 00:01:27.219]
class, and the A person got 10% correct.
[00:01:27.219 --> 00:01:29.659]
Now, why am I telling you this story?
[00:01:29.659 --> 00:01:35.259]
I'm telling you the story because that is not the type of curve we want to be working
[00:01:35.259 --> 00:01:40.780]
on, or basing our work off of, when we are developing products.
[00:01:40.780 --> 00:01:45.599]
Just because only 35% of teams are making it, doesn't mean that we grade on a curve and
[00:01:45.599 --> 00:01:47.299]
say, well, that's okay because we're always late.
[00:01:47.299 --> 00:01:48.299]
We're always over budget.
[00:01:48.299 --> 00:01:49.299]
Who cares?
[00:01:49.299 --> 00:01:50.299]
It's just a reality.
[00:01:50.299 --> 00:01:52.340]
We're just grade based on the curve.
[00:01:52.340 --> 00:01:53.819]
That's not what we're doing here.
[00:01:53.819 --> 00:01:58.539]
As it turns out, more and more teams are actually delivering later and later.
[00:01:58.539 --> 00:02:03.219]
On average, most teams are three to six months late in delivering their embedded products.
[00:02:03.219 --> 00:02:05.620]
All right, so how do we fix this?
[00:02:05.620 --> 00:02:07.579]
Certainly, modernization helps, right?
[00:02:07.579 --> 00:02:12.500]
But specifically today, we're framing this for the perspective of DevOps.
[00:02:12.500 --> 00:02:16.019]
DevOps is actually all about improving efficiency.
[00:02:16.019 --> 00:02:19.219]
It's about helping us get there as efficiently as possible.
[00:02:19.219 --> 00:02:22.419]
It's about speeding up the way that we deliver software.
[00:02:22.419 --> 00:02:26.939]
It's about improving the quality of the software that we do actually deliver.
[00:02:26.939 --> 00:02:32.939]
So DevOps overall, this third step is about trying to get you to implement DevOps.
[00:02:32.939 --> 00:02:34.939]
Now, there's lots of ways that you can do this.
[00:02:34.939 --> 00:02:38.539]
Oftentimes, I'm still seeing that for more teams, just trying to get to the point where
[00:02:38.539 --> 00:02:42.419]
they're building automatically.
[00:02:42.419 --> 00:02:43.419]
That's okay.
[00:02:43.419 --> 00:02:44.939]
That's kind of your first step.
[00:02:45.340 --> 00:02:49.819]
Get in the door and make sure you can build automatically when you can make your code.
[00:02:49.819 --> 00:02:53.859]
But there's lots of other things that we can do that we'll talk about in today's webinar.
[00:02:53.859 --> 00:02:54.859]
Okay.
[00:02:54.859 --> 00:03:00.219]
Now, when we think about DevOps overall, I think it's important for us to realize that
[00:03:00.219 --> 00:03:05.740]
there's actually four principles that are actually going to guide all of your DevOps processes.
[00:03:05.740 --> 00:03:12.659]
The first process is that it's all about focusing on providing incremental value to the users
[00:03:12.659 --> 00:03:15.659]
or your customers in small and frequent iterations.
[00:03:15.659 --> 00:03:20.659]
Now, right after that first principle, you might be saying, Jacob, this is crazy because
[00:03:20.659 --> 00:03:22.740]
I'm an embedded software developer.
[00:03:22.740 --> 00:03:26.139]
You can't ship a single feature to the customer.
[00:03:26.139 --> 00:03:27.539]
You've got to have the drivers.
[00:03:27.539 --> 00:03:32.019]
You've got to have all of the middleware and all the pieces together as a whole to actually
[00:03:32.019 --> 00:03:34.099]
ship a whole product.
[00:03:34.099 --> 00:03:35.740]
And the answer is yes.
[00:03:35.740 --> 00:03:40.819]
But who is the user, your customer, what value are you providing?
[00:03:40.819 --> 00:03:44.780]
Oftentimes this can be incremental value that you're providing just to your company, even
[00:03:44.780 --> 00:03:47.419]
if you haven't shipped it to your customer yet.
[00:03:47.419 --> 00:03:51.579]
Other times you could think as a developer that my customer is actually my boss or its
[00:03:51.579 --> 00:03:52.579]
management.
[00:03:52.579 --> 00:03:53.819]
It's the company itself.
[00:03:53.819 --> 00:03:57.099]
And all you're trying to do is make sure that throughout your sprints, every two to four
[00:03:57.099 --> 00:04:02.780]
weeks that you are delivering new features and value at the end of each of those.
[00:04:02.780 --> 00:04:04.780]
Okay.
[00:04:04.780 --> 00:04:07.979]
The second principle is that we need to improve collaboration and communication between
[00:04:07.979 --> 00:04:10.699]
development and operational teams.
[00:04:10.699 --> 00:04:17.099]
So what you're going to find here is that it's not just about us embedded software folks
[00:04:17.099 --> 00:04:18.099]
or delivering the product.
[00:04:18.099 --> 00:04:21.019]
It's about improving collaboration and communication.
[00:04:21.019 --> 00:04:27.659]
Now in some cases, this is counterintuitive to a lot of what comes out of like best practices
[00:04:27.659 --> 00:04:28.659]
for big tech.
[00:04:28.659 --> 00:04:36.819]
Jeff Bezos, for example, has his two pizza team idea where teams shouldn't be larger than
[00:04:36.819 --> 00:04:39.980]
it takes to feed two pizzas.
[00:04:39.980 --> 00:04:43.500]
And the more time we spend collaborating and communicating is less time that we actually
[00:04:43.500 --> 00:04:44.740]
spend delivering.
[00:04:44.740 --> 00:04:48.220]
You want your teams to also be independent and making decisions.
[00:04:48.220 --> 00:04:54.700]
But at the same time, DevOps is designed to help improve communication across teams,
[00:04:54.700 --> 00:04:59.220]
specifically between quality and the development teams.
[00:04:59.220 --> 00:05:04.300]
We're trying to move testing from the very manual labor intensive to something that's more
[00:05:04.300 --> 00:05:09.259]
automated and it doesn't require as much effort.
[00:05:09.259 --> 00:05:12.459]
So that's the third principle is that we want to automate as much other software development
[00:05:12.459 --> 00:05:15.219]
life cycle as possible.
[00:05:15.219 --> 00:05:18.060]
Manual testing seems great in the beginning.
[00:05:18.060 --> 00:05:19.939]
It's very easy going test your features.
[00:05:19.939 --> 00:05:24.579]
But once you have 100 features than the vice, making sure you didn't break previous features,
[00:05:24.579 --> 00:05:27.459]
man, can that be a nightmare?
[00:05:27.459 --> 00:05:31.620]
There's not enough time or enough hours in the day to be able to go back and test all those
[00:05:31.620 --> 00:05:32.620]
features.
[00:05:32.620 --> 00:05:34.539]
So what do we do?
[00:05:34.539 --> 00:05:37.980]
We just cross our fingers and hope that they still work even though we've added a bunch
[00:05:37.980 --> 00:05:40.020]
of stuff.
[00:05:40.020 --> 00:05:44.180]
And you might think that's not affecting other features, but oftentimes, that's just
[00:05:44.180 --> 00:05:45.340]
a hope.
[00:05:45.340 --> 00:05:49.580]
And so the idea here with automation is that we can automate our tests through unit testing,
[00:05:49.580 --> 00:05:54.060]
system integration, through simulation testing, through hardware and loop testing, all of
[00:05:54.060 --> 00:05:57.939]
this so that it's automated as much as possible so that we can ensure that the quality of the
[00:05:57.939 --> 00:06:05.819]
software continues to grow higher and higher as we add new features into our projects.
[00:06:05.819 --> 00:06:10.099]
And the fourth principle of DevOps, which for embedded folks is very controversial,
[00:06:10.099 --> 00:06:14.379]
I think, is that we want to continuously improve the software product through delivery.
[00:06:14.379 --> 00:06:15.379]
Okay?
[00:06:15.379 --> 00:06:20.779]
The delivery side of things, you don't necessarily need to provide firmware updates on a daily
[00:06:20.779 --> 00:06:26.980]
basis to your internet-connected microwave or your internet-connected stove, right?
[00:06:26.980 --> 00:06:31.779]
Connectivity can be great, but a lot of times, if my basic use cases are being covered,
[00:06:31.779 --> 00:06:35.379]
there's no need to continuously deliver new firmware.
[00:06:35.379 --> 00:06:36.379]
Okay?
[00:06:36.379 --> 00:06:37.379]
In fact, sometimes that can be detrimental.
[00:06:37.379 --> 00:06:42.019]
You can break things at work or you can interrupt the user's behavior of the device, being
[00:06:42.019 --> 00:06:45.980]
able to use the device, which just ends up being kind of a big pain.
[00:06:45.980 --> 00:06:46.980]
Okay?
[00:06:46.980 --> 00:06:50.219]
So, continuous improvement, I think, is really important.
[00:06:50.219 --> 00:06:56.379]
Continuous delivery, that oftentimes is not as important for many firmware products.
[00:06:56.379 --> 00:07:00.339]
Not all, but majority of the ones that I encounter in the field.
[00:07:00.339 --> 00:07:01.339]
Okay?
[00:07:01.739 --> 00:07:06.500]
When we think about DevOps, those principles put together, you can really think about DevOps
[00:07:06.500 --> 00:07:12.339]
and composing, bringing quality assurance and testing together, bringing operations together,
[00:07:12.339 --> 00:07:17.699]
and also bringing developers together early on continuously through the development cycle.
[00:07:17.699 --> 00:07:18.699]
These things used to be siloed.
[00:07:18.699 --> 00:07:21.459]
They used to be almost like waterfall, right?
[00:07:21.459 --> 00:07:26.019]
You would develop your code, it would go to the quality assurance folks, and then once
[00:07:26.019 --> 00:07:30.539]
it passed that gate, then it would go to operations to be deployed into the field, right?
[00:07:30.540 --> 00:07:34.980]
Well, now we're saying no, no, no, we can do many cycles with this, and we can go much
[00:07:34.980 --> 00:07:40.220]
faster and develop higher quality code, get better feedback throughout this whole process.
[00:07:40.220 --> 00:07:41.220]
Okay?
[00:07:41.220 --> 00:07:47.900]
Now, I often look at DevOps in general as just a re-way of repurposing what the agile
[00:07:47.900 --> 00:07:49.860]
values actually were, right?
[00:07:49.860 --> 00:07:54.900]
Those four principles, if you look at them, they're very similar to saying that we want
[00:07:54.899 --> 00:08:00.899]
to focus on individuals and interactions over processes and tools.
[00:08:00.899 --> 00:08:03.899]
We want working software over comprehensive documentation.
[00:08:03.899 --> 00:08:08.179]
We want customer collaboration over contract negotiation.
[00:08:08.179 --> 00:08:12.859]
We want to respond to change over following a plan, okay?
[00:08:12.859 --> 00:08:17.500]
That's not saying processing tools, comprehensive documentation, contract negotiation, following
[00:08:17.500 --> 00:08:20.739]
a plan aren't important parts of the development cycle.
[00:08:20.739 --> 00:08:23.019]
It just means that's not where our focus should be.
[00:08:23.019 --> 00:08:29.139]
And DevOps is really about us trying to tighten that cycle up and delivering faster firmware
[00:08:29.139 --> 00:08:30.859]
to our customers, okay?
[00:08:30.859 --> 00:08:34.460]
Or delivering faster, higher quality software.
[00:08:34.460 --> 00:08:40.939]
Now when we think about DevOps principles in action, there's a great diagram from, I
[00:08:40.939 --> 00:08:45.480]
think I found it on like Amazon's website when they were talking about what DevOps was
[00:08:45.480 --> 00:08:47.259]
and testing and that sort of thing.
[00:08:47.259 --> 00:08:51.059]
But I've modified it to really kind of show off what I think is important for us as
[00:08:51.059 --> 00:08:52.739]
embedded software developers.
[00:08:52.739 --> 00:08:59.339]
Ultimately, what we end up with is two actors in DevOps overall, okay?
[00:08:59.339 --> 00:09:03.219]
The company we work for and we have our customer, okay?
[00:09:03.219 --> 00:09:08.379]
The company itself wants to deliver features and value to our customer.
[00:09:08.379 --> 00:09:14.099]
Our customer has a problem and our product is solving and provide a solution to that
[00:09:14.099 --> 00:09:15.419]
problem.
[00:09:15.419 --> 00:09:19.979]
Now that solution ends up being in a form of something like a product with software features
[00:09:19.980 --> 00:09:24.060]
that allow them to do whatever it needs to be, whether it's controlling an engine, whether
[00:09:24.060 --> 00:09:29.820]
it's running a ventilator, whether it's driving a car, you know, allowing someone to listen
[00:09:29.820 --> 00:09:34.740]
to what radio in their car, whatever the features that happen to be, okay?
[00:09:34.740 --> 00:09:40.539]
What we think about how to create a DevOps pipeline, the steps and the sequences that
[00:09:40.539 --> 00:09:47.139]
are necessary to develop the software, test it and actually deliver it to the end customer,
[00:09:47.139 --> 00:09:52.580]
it starts to look like a delivery pipeline, okay?
[00:09:52.580 --> 00:09:56.419]
And so it ends up being essentially if you think in your mind of a message queue almost
[00:09:56.419 --> 00:10:01.259]
between like a company and a customer moving from left to right, we end up with this delivery
[00:10:01.259 --> 00:10:06.659]
pipeline that needs to be able to automate our builds, it needs to be able to test our
[00:10:06.659 --> 00:10:11.700]
software and it needs to be able to deploy it automatically, okay?
[00:10:11.700 --> 00:10:16.740]
When I first started to develop embedded systems 20 years ago, delivery always involves us
[00:10:16.740 --> 00:10:20.580]
putting some firmware image up on a website somewhere where the customers could go download
[00:10:20.580 --> 00:10:25.539]
it, get special tools, go through this process that hopefully updated the firmware and then
[00:10:25.539 --> 00:10:28.340]
they'd have the latest and greatest firmware on their devices.
[00:10:28.340 --> 00:10:33.419]
Today with the internet and being constantly connected, that process is completely streamlined.
[00:10:33.419 --> 00:10:37.980]
All we do is push the latest image to a firmware up to up to the cloud and then allow our
[00:10:37.980 --> 00:10:43.940]
OTA processes to take effect and distribute that firmware update to the various groups and
[00:10:43.940 --> 00:10:46.060]
devices that we have out in the field, okay?
[00:10:46.059 --> 00:10:49.939]
So that's much more automated than it used to be.
[00:10:49.939 --> 00:10:56.179]
The idea is that hopefully at that point, it's this automated delivery pipeline, the customer
[00:10:56.179 --> 00:11:01.179]
gets the latest features and products and they're able to use and enjoy and get greater
[00:11:01.179 --> 00:11:03.979]
and greater value out of the product.
[00:11:03.979 --> 00:11:10.899]
Now thinking about this strictly from a company to customer standpoint, there's a problem.
[00:11:10.899 --> 00:11:17.579]
The problem is that this is a one directional process, right?
[00:11:17.579 --> 00:11:21.860]
I'm delivering to the customer, but if I want to continue to provide value to the customer,
[00:11:21.860 --> 00:11:24.699]
I want to improve their experience with the product.
[00:11:24.699 --> 00:11:27.419]
I need this to be bidirectional.
[00:11:27.419 --> 00:11:30.699]
I need a feedback pipeline.
[00:11:30.699 --> 00:11:36.100]
This is where you're seeing a lot with observability today in embedded systems.
[00:11:36.100 --> 00:11:40.579]
You see things like Persepio's Dev Alert, for example, that monitors the behavior of
[00:11:40.580 --> 00:11:48.020]
a system in the field and allows the company to get data and back, telemetry back from
[00:11:48.020 --> 00:11:52.860]
the devices in the field to understand not just how the customer is using the product,
[00:11:52.860 --> 00:11:57.020]
but to understand what features they're using, to understand the performance of the device,
[00:11:57.020 --> 00:12:01.500]
and to try to also figure out what might be missing that could make the life of the user
[00:12:01.500 --> 00:12:03.660]
better, okay?
[00:12:03.660 --> 00:12:07.940]
And so we look at that feedback pipeline as a way to monitor how the system is behaving.
[00:12:07.940 --> 00:12:12.180]
It generates reports, which then feed back to the company and provide new features, new
[00:12:12.180 --> 00:12:18.700]
ideas, new development processes that maybe we use to then build, test and deploy even
[00:12:18.700 --> 00:12:20.020]
better firmware, right?
[00:12:20.020 --> 00:12:22.500]
And better product features to the customer.
[00:12:22.500 --> 00:12:28.100]
Now this whole thing, like I mentioned, it closes in and provides observability to our
[00:12:28.100 --> 00:12:30.900]
company to understand how the customer is using it.
[00:12:30.900 --> 00:12:35.260]
This creates what I call a value feedback loop, okay?
[00:12:35.259 --> 00:12:38.580]
We have a feedback loop where the company provides value to the customer.
[00:12:38.580 --> 00:12:44.500]
The customer provides feedback through observability to the company, which then allows us to create
[00:12:44.500 --> 00:12:49.220]
even better value for the customer, which we then deliver, which then this loop continues
[00:12:49.220 --> 00:12:54.899]
over and over in a feedback loop that allows us to provide a lot of value to the customer,
[00:12:54.899 --> 00:13:00.299]
allows us to continuously deliver, continuously deploy, continuously integrate our software,
[00:13:00.299 --> 00:13:05.500]
making it so that we can get a minimum viable product into the field and then continue
[00:13:05.500 --> 00:13:10.779]
to update it and upgrade it and improve the experience for the customer, okay?
[00:13:10.779 --> 00:13:15.179]
We don't have to go and ship 100 billion features to the customer all at once.
[00:13:15.179 --> 00:13:19.459]
We can start with a minimum viable product, verify that the customer enjoys the product,
[00:13:19.459 --> 00:13:22.459]
uses it, wants it and then build on it from there.
[00:13:22.459 --> 00:13:27.299]
I've seen many companies over my career that try to deliver everything and the kitchen
[00:13:27.299 --> 00:13:29.299]
sink on their first go.
[00:13:29.299 --> 00:13:32.979]
And then they get in the field and they discover the customer doesn't even want any of this.
[00:13:32.979 --> 00:13:36.579]
We completely went in the wrong direction and just wasted two years of development, cost,
[00:13:36.579 --> 00:13:39.740]
money, et cetera, and now we're in trouble, right?
[00:13:39.740 --> 00:13:41.219]
You don't want to get in that situation.
[00:13:41.219 --> 00:13:46.699]
And that's where DevOps, this modernization, you know, it can help you from that perspective
[00:13:46.699 --> 00:13:49.539]
so that you understand what your customer wants and needs.
[00:13:49.539 --> 00:13:54.979]
But on top of that, it also allows you to improve the quality of your software, deliver
[00:13:54.980 --> 00:14:01.300]
faster and keep the quality very high and the bugs to a minimum, all right?
[00:14:01.300 --> 00:14:04.220]
Okay, so this is great, you know, right?
[00:14:04.220 --> 00:14:06.740]
Everyone's been talking about DevOps for a while.
[00:14:06.740 --> 00:14:10.420]
How does this actually apply to embedded developers?
[00:14:10.420 --> 00:14:13.340]
You might be saying, well, yeah, I can build on my own, whatever.
[00:14:13.340 --> 00:14:16.899]
This whole idea of unit testing, you know, one of the big pieces of this delivery pipeline
[00:14:16.899 --> 00:14:20.500]
is having unit tests, hardware and loop testing that's all automated.
[00:14:20.500 --> 00:14:21.899]
And you might say, well, that's too expensive.
[00:14:21.899 --> 00:14:23.220]
It's not worth it.
[00:14:23.940 --> 00:14:27.940]
You know, has anyone ever really had a problem where the continuous integration pipeline
[00:14:27.940 --> 00:14:32.500]
caught the problem or could have actually prevented the issues in the first place, right?
[00:14:32.500 --> 00:14:36.620]
Well, I'm going to share one of my favorite case studies with you.
[00:14:36.620 --> 00:14:40.019]
I spent, you know, I get an opportunity to work on a lot of different software,
[00:14:40.019 --> 00:14:42.540]
a lot of different embedded products and a lot of different industries.
[00:14:42.540 --> 00:14:48.300]
One of my favorite industries to work in is medical devices and space systems, along with the fence.
[00:14:48.300 --> 00:14:51.420]
Okay, I really like to work on things, both, I like to work on a lot of different things.
[00:14:51.419 --> 00:14:52.419]
I'll leave that at that.
[00:14:52.419 --> 00:15:03.500]
But there was several years ago, they were developing the Arian 5 rocket.
[00:15:03.500 --> 00:15:09.579]
Okay, and this particular rocket costs about $500 million to launch.
[00:15:09.579 --> 00:15:18.779]
All right, and it's very first launch, about 37 seconds into the launch, the rocket exploded.
[00:15:18.779 --> 00:15:20.099]
Okay.
[00:15:20.100 --> 00:15:23.340]
There goes $500 million, right, up in fireworks.
[00:15:23.340 --> 00:15:28.100]
Now, when we look at that, man, that seems like a really big problem, but, you know,
[00:15:28.100 --> 00:15:33.259]
space is hard, rockets explode, you know, whatever, things happen, right?
[00:15:33.259 --> 00:15:38.860]
Except for the fact, when you start to look at the root cause here, the root cause of the
[00:15:38.860 --> 00:15:46.220]
loss of the vehicle goes back to the fact that they were using an approving navigation system
[00:15:46.220 --> 00:15:48.379]
from the Arian 4.
[00:15:48.379 --> 00:15:52.059]
They used the code that they had already proven on previous flights, worked.
[00:15:52.059 --> 00:15:56.659]
And they just said, well, let's take this, we'll put it on the new processor and within
[00:15:56.659 --> 00:16:00.939]
the new system, and we don't need to test it because we already know that it works, right?
[00:16:00.939 --> 00:16:02.740]
Well, wrong.
[00:16:02.740 --> 00:16:09.659]
What ended up happening was that the Arian 4 code on the Arian 5, even though it was proven
[00:16:09.659 --> 00:16:13.100]
code, the flight profiles were different.
[00:16:13.100 --> 00:16:15.980]
What ended up happening was there was an integer overflow.
[00:16:15.980 --> 00:16:22.019]
So there was a 64-bit floating point value that was converted to a 16-bit signed integer.
[00:16:22.019 --> 00:16:23.019]
Okay?
[00:16:23.019 --> 00:16:27.220]
So the value was larger than expected due to a difference in the flight trajectories between
[00:16:27.220 --> 00:16:30.940]
the way that they flew the Arian 4 and the Arian 5.
[00:16:30.940 --> 00:16:35.420]
Now this is, of course, a problem, right?
[00:16:35.420 --> 00:16:36.980]
But you might say, well, that's not a big deal.
[00:16:36.980 --> 00:16:39.019]
These spacecraft, they're these rockets.
[00:16:39.019 --> 00:16:42.060]
They typically have backup computers on them, right?
[00:16:42.060 --> 00:16:44.659]
You have two different computers, or safety critical systems, right?
[00:16:44.819 --> 00:16:49.339]
So you have two different computers that are supposed to be checking each other's work
[00:16:49.339 --> 00:16:53.059]
and make sure that they arrive at the correct answers.
[00:16:53.059 --> 00:17:01.339]
Well, as it turned out, the backup computer had the exact same identical code as the primary
[00:17:01.339 --> 00:17:02.339]
computer.
[00:17:02.339 --> 00:17:03.339]
Okay?
[00:17:03.339 --> 00:17:09.659]
And so at that point, because it was identical, it failed in the exact same way.
[00:17:09.659 --> 00:17:11.460]
And so I mean, that's certainly another lesson, right?
[00:17:11.460 --> 00:17:15.740]
If you have a safety critical system, have two different teams write the code, right?
[00:17:15.740 --> 00:17:19.660]
For one, that's not really the lesson we're trying to discuss today.
[00:17:19.660 --> 00:17:23.700]
But kind of bringing it back, what was the problem here?
[00:17:23.700 --> 00:17:28.180]
We had working code, we ended up with overflows.
[00:17:28.180 --> 00:17:30.420]
How could we have prevented this?
[00:17:30.420 --> 00:17:36.019]
Well, there was no integration testing of reuse components in the new context for the Arian
[00:17:36.019 --> 00:17:37.660]
5 rocket.
[00:17:37.660 --> 00:17:42.500]
And so had they gone through and done additional system testing, had they leveraged at least
[00:17:42.500 --> 00:17:46.740]
continuous integration, maybe not continuous deployment, but at least continuous integration,
[00:17:46.740 --> 00:17:52.259]
they should have been able to verify that, oh, there's a problem here.
[00:17:52.259 --> 00:17:54.540]
But they didn't do that.
[00:17:54.540 --> 00:17:57.460]
They didn't have those processes in place.
[00:17:57.460 --> 00:18:02.460]
It certainly just occurred, you know, happened prior to the existence of CICD in its modern
[00:18:02.460 --> 00:18:03.460]
form.
[00:18:04.180 --> 00:18:11.579]
But ultimately, modern CICD with tests, would have tested components, would have done integration
[00:18:11.579 --> 00:18:18.779]
testing, all automatically in the actual test environment that the spacecraft would have
[00:18:18.779 --> 00:18:19.980]
experienced.
[00:18:19.980 --> 00:18:23.340]
And those test cases would most likely have exposed this fact.
[00:18:23.340 --> 00:18:28.579]
And the 64-bit floating point value was being converted to a 16-bit sign integer, and
[00:18:28.579 --> 00:18:32.740]
that it was overflowing and causing problems with the spacecraft.
[00:18:33.740 --> 00:18:35.700]
All of that could have been automated.
[00:18:35.700 --> 00:18:38.620]
One of the reasons why they didn't go through this level of system testing was because
[00:18:38.620 --> 00:18:40.819]
they were doing everything manual.
[00:18:40.819 --> 00:18:44.539]
And they said, oh, well, we don't have the money and budget in the time to go through
[00:18:44.539 --> 00:18:46.460]
and do this testing, right?
[00:18:46.460 --> 00:18:48.140]
Famous last words.
[00:18:48.140 --> 00:18:50.019]
We don't have the time and budget.
[00:18:50.019 --> 00:18:55.779]
Well, they at least had enough time and budget to blow up a $500 million rocket and have
[00:18:55.779 --> 00:18:57.259]
to redo it all over again.
[00:18:57.259 --> 00:19:01.339]
Not to mention whatever the payload may have been on the rocket in the first place.
[00:19:02.339 --> 00:19:08.299]
Automated testing would likely have caught the edge cases across various scenarios.
[00:19:08.299 --> 00:19:13.139]
And so the lesson really to learn here is that integration testing can't be skipped,
[00:19:13.139 --> 00:19:15.539]
even for proven code.
[00:19:15.539 --> 00:19:20.259]
And so as embedded software developers, I really think that makes us step back and say,
[00:19:20.259 --> 00:19:23.419]
OK, I can't do this automated process anymore.
[00:19:23.419 --> 00:19:28.379]
And certainly you might have not half $500 million on the device that you're working on,
[00:19:28.380 --> 00:19:31.780]
but you certainly have users who want to make sure that the code works the way that's
[00:19:31.780 --> 00:19:36.260]
opposed to that doesn't cause them an inconvenience that doesn't damage your brand because they
[00:19:36.260 --> 00:19:40.780]
talk poorly about how bad the system is that you've developed or how late it was delivered.
[00:19:40.780 --> 00:19:42.580]
You don't want those types of things.
[00:19:42.580 --> 00:19:43.580]
OK.
[00:19:43.580 --> 00:19:49.380]
And so this modernization of DevOps, you don't necessarily have to do it all at once.
[00:19:49.380 --> 00:19:56.140]
What I'm suggesting is that you go through a phased process when you implement DevOps.
[00:19:56.140 --> 00:19:57.140]
OK.
[00:19:57.140 --> 00:20:03.900]
Maybe the first thing to do is to actually identify what are your main goals for DevOps, right?
[00:20:03.900 --> 00:20:06.740]
Certainly you want to be able to build your code.
[00:20:06.740 --> 00:20:10.220]
You might want to be able to perform unit tests automatically.
[00:20:10.220 --> 00:20:12.580]
You might want to be able to simulate a test.
[00:20:12.580 --> 00:20:13.580]
Maybe not.
[00:20:13.580 --> 00:20:16.860]
You might want to be able to do hardware and loop testing automatically.
[00:20:16.860 --> 00:20:20.100]
And then you may or may not want to be able to push automatically to your customers in
[00:20:20.100 --> 00:20:21.100]
devices.
[00:20:21.100 --> 00:20:22.100]
OK.
[00:20:22.100 --> 00:20:25.380]
Every team's goals are going to be slightly different from the others.
[00:20:25.380 --> 00:20:30.740]
So my recommendation for you is that you at least start by saying, here's what my goals
[00:20:30.740 --> 00:20:31.740]
are.
[00:20:31.740 --> 00:20:36.340]
Because how you identify what your goals for your CI, CD and for your DevOps processes
[00:20:36.340 --> 00:20:41.260]
are going to be, that's going to really dictate what you do to design your CI, CD solution.
[00:20:41.260 --> 00:20:46.540]
Now the one thing I would say you should probably look at is at least automating your builds.
[00:20:46.540 --> 00:20:47.540]
OK.
[00:20:47.540 --> 00:20:48.540]
What does that buy you?
[00:20:48.540 --> 00:20:51.980]
Well, automated build, make sure that you can consistently build your code.
[00:20:51.980 --> 00:20:52.980]
All right.
[00:20:53.620 --> 00:20:57.460]
It helps to produce manual processes.
[00:20:57.460 --> 00:20:58.539]
Right.
[00:20:58.539 --> 00:21:03.259]
And it can actually have automated builds.
[00:21:03.259 --> 00:21:09.059]
Oftentimes with our DevOps pipelines, we will use containers to create an environment that
[00:21:09.059 --> 00:21:10.059]
we can reuse.
[00:21:10.059 --> 00:21:14.460]
Now, those containers, whether it's Docker or Podmin or whatever container or technology
[00:21:14.460 --> 00:21:19.740]
you want to use, that often creates a unified development environment.
[00:21:19.740 --> 00:21:23.099]
So when you get a new developer on board, they don't have to install a whole bunch of tools
[00:21:23.099 --> 00:21:27.500]
and libraries and all that fun stuff for three days so that they can get up and running.
[00:21:27.500 --> 00:21:29.259]
You hand them the container.
[00:21:29.259 --> 00:21:33.980]
And now they have a development environment that's ready to rock and roll.
[00:21:33.980 --> 00:21:36.700]
So a container right there can save up time.
[00:21:36.700 --> 00:21:43.059]
It can make sure and eliminate the issue of, it works fine on my machine.
[00:21:43.059 --> 00:21:47.660]
I can't tell you over the course of my career, how many times early on we had issues where
[00:21:47.660 --> 00:21:51.259]
it's working fine for me, but you give it to your colleague and it's hard, you're getting
[00:21:51.259 --> 00:21:52.259]
hard faults.
[00:21:52.259 --> 00:21:53.259]
Well, why is that?
[00:21:53.259 --> 00:22:01.500]
Oh, they have a compiler version that's 0.0.1 different than the one that we're currently
[00:22:01.500 --> 00:22:02.980]
using, right?
[00:22:02.980 --> 00:22:04.980]
Or some other goofy thing.
[00:22:04.980 --> 00:22:09.019]
Oh, they installed some other library or their systems configured in a way that it affects
[00:22:09.019 --> 00:22:11.940]
the build system and things are just breaking.
[00:22:11.940 --> 00:22:13.860]
It gets rid of all of that.
[00:22:13.859 --> 00:22:18.819]
So using containers makes it so that developers are all working in the exact same environment.
[00:22:18.819 --> 00:22:20.339]
And that's just at least the same build environment.
[00:22:20.339 --> 00:22:24.539]
You can all be using VS Code or whatever it is that you want to use for your interface
[00:22:24.539 --> 00:22:25.539]
to develop.
[00:22:25.539 --> 00:22:30.500]
Your workflow can be totally different, but the key is your build environment is the same.
[00:22:30.500 --> 00:22:32.539]
Now that doesn't just help developers.
[00:22:32.539 --> 00:22:38.699]
That helps us move to CICD because in the build process, we can use those containers to
[00:22:38.699 --> 00:22:42.179]
verify that we can build the software successfully.
[00:22:42.180 --> 00:22:46.100]
So at least make sure that automated builds are in place.
[00:22:46.100 --> 00:22:51.100]
Super low hanging fruit, especially with today's technology.
[00:22:51.100 --> 00:22:55.740]
From there, figure out how frequently you want to integrate your system.
[00:22:55.740 --> 00:23:00.220]
The more often you integrate, the earlier your error detection is going to be.
[00:23:00.220 --> 00:23:04.700]
The sooner you catch a bug, the easier it is and the less costly it is, and even less
[00:23:04.700 --> 00:23:07.940]
timely to fix the bug.
[00:23:07.940 --> 00:23:12.460]
So figure out what does that get strategy look like?
[00:23:12.460 --> 00:23:19.259]
What is the rate at which you're going to integrate your code and test it?
[00:23:19.259 --> 00:23:24.820]
Other things I like to do, I like to enforce code quality or at least code standards within
[00:23:24.820 --> 00:23:25.980]
my pipelines.
[00:23:25.980 --> 00:23:30.059]
So oftentimes I will make sure I can build the code.
[00:23:30.059 --> 00:23:36.580]
I will analyze it to ensure that it is adhering to the coding style that everyone is supposed
[00:23:36.580 --> 00:23:37.580]
to be using.
[00:23:37.579 --> 00:23:44.500]
And that it is also meeting coding standards like maybe Mizra's C or Mizra's C++, maybe
[00:23:44.500 --> 00:23:52.500]
CERT, or if we're using C++, some of the C++ style guides like the Google's or some of
[00:23:52.500 --> 00:23:58.139]
the ones that are put out by the C++ folks.
[00:23:58.139 --> 00:24:03.059]
So you can enforce your coding standards and styles within your pipelines.
[00:24:03.059 --> 00:24:06.299]
A lot of pipelines today are starting to adopt AI.
[00:24:06.299 --> 00:24:10.579]
So that means that things that are found wrong, you could automatically have them use
[00:24:10.579 --> 00:24:13.740]
AI to help fix it.
[00:24:13.740 --> 00:24:18.299]
You could have it fixed styles, you could have it add documentation, all of those types
[00:24:18.299 --> 00:24:19.299]
of things.
[00:24:19.299 --> 00:24:24.339]
But the ultimate idea here is that you should be improving the code quality of your system.
[00:24:24.339 --> 00:24:25.339]
All right.
[00:24:25.339 --> 00:24:27.539]
And we can perform those checks automatically.
[00:24:27.539 --> 00:24:32.259]
It used to be we'd sit down in a room and spend hours doing code reviews, right?
[00:24:32.740 --> 00:24:39.779]
Well, you can leverage your pipeline to perform automatic analyses, to do metrics analysis.
[00:24:39.779 --> 00:24:42.099]
I can't tell you how many times I've gone into a team.
[00:24:42.099 --> 00:24:45.579]
And one of the first things I do is I perform metrics analysis of their software code
[00:24:45.579 --> 00:24:46.740]
basis.
[00:24:46.740 --> 00:24:52.180]
And the metrics that I get out of that tell me where the tight coupling is in their software,
[00:24:52.180 --> 00:24:55.339]
where the bugs are most likely to be, and so on and so forth.
[00:24:55.339 --> 00:24:57.500]
I don't even have to look at their code.
[00:24:57.500 --> 00:24:59.940]
I can just pull some metrics.
[00:24:59.940 --> 00:25:03.580]
And I analyze the metrics and then that points me to the areas of their code that I should
[00:25:03.580 --> 00:25:05.940]
actually take a look at.
[00:25:05.940 --> 00:25:10.420]
You can have this automatically done for you using a CI CD pipeline.
[00:25:10.420 --> 00:25:15.100]
And it will improve the overall quality of your software.
[00:25:15.100 --> 00:25:18.580]
Another thing, regression testing.
[00:25:18.580 --> 00:25:23.180]
I can't tell you how many times again where I've added a new feature to the product, only
[00:25:23.180 --> 00:25:27.820]
to discover I broke a different existing feature.
[00:25:27.819 --> 00:25:32.419]
It's a little bit more in the past than more recently, but it still happens.
[00:25:32.419 --> 00:25:36.819]
But if you have regression tests that run automatically in your pipeline, it'll catch
[00:25:36.819 --> 00:25:37.819]
that.
[00:25:37.819 --> 00:25:38.819]
All right.
[00:25:38.819 --> 00:25:42.539]
And then you can catch it right away, say, oh man, what happened?
[00:25:42.539 --> 00:25:44.099]
Oh yeah, this is what's going on here.
[00:25:44.099 --> 00:25:46.899]
Let me fix that real quick game over.
[00:25:46.899 --> 00:25:49.299]
Versus someone finding the bug three months from now.
[00:25:49.299 --> 00:25:52.419]
And you have no clue what just changed in that code.
[00:25:52.419 --> 00:25:53.779]
What might have caused that bug?
[00:25:53.779 --> 00:25:54.779]
What was added?
[00:25:54.779 --> 00:25:55.779]
What was removed?
[00:25:55.779 --> 00:25:56.779]
I don't know.
[00:25:56.819 --> 00:26:02.779]
Just will delete the bug on my running software starting...
[00:26:02.779 --> 00:26:09.700]
What the bug may be like for example.
[00:26:09.700 --> 00:26:17.819]
I don't know what, which machine structure I Indianazek Boehm or whatever magic means,
[00:26:17.819 --> 00:26:18.819]
horrible.
[00:26:18.819 --> 00:26:20.539]
This mess is my real trick.
[00:26:20.539 --> 00:26:22.539]
the
[00:26:22.539 --> 00:26:26.700]
You know, elf files, binaries, whatever it is you're producing, whatever the artifact output is
[00:26:26.700 --> 00:26:29.819]
We want to be able to store that somewhere in a safe and traceable manner
[00:26:30.460 --> 00:26:35.740]
It might not be that in every case you need to have really strict tracing requirements or traceability
[00:26:36.139 --> 00:26:40.460]
To be able to prove what code and what version of software was being ran on a system, okay?
[00:26:40.740 --> 00:26:42.579]
not every
[00:26:42.579 --> 00:26:44.579]
Embedded system is heavily regulated
[00:26:44.659 --> 00:26:45.619]
but
[00:26:45.619 --> 00:26:47.619]
It's still useful for yourself internally
[00:26:48.179 --> 00:26:53.619]
To be able to manage those artifacts in a way that you can easily track and get back to a specific state if you need to
[00:26:54.019 --> 00:26:56.019]
There's been times where I delivered
[00:26:56.019 --> 00:27:01.659]
Code to a customer and then they came back four months later asking for a specific version that was you know
[00:27:01.659 --> 00:27:06.779]
Because there was some feature that we had added or that they you know that they want to go back and not have that feature for some other customer
[00:27:07.299 --> 00:27:09.699]
If I didn't have good artifact management I
[00:27:10.419 --> 00:27:16.500]
Want to be able to go back and give them the right code right so artifact management. I think is really important
[00:27:17.299 --> 00:27:21.539]
And then the piece that I think is pretty optional is deployment automation
[00:27:22.259 --> 00:27:26.140]
We don't necessarily need to deploy automatically to our end customers
[00:27:26.140 --> 00:27:31.859]
I would argue that a highly that I shouldn't even say highly sophisticated. I would argue that a
[00:27:32.579 --> 00:27:36.700]
Well developed and defined DevOps process
[00:27:37.619 --> 00:27:44.460]
Would at least deploy so that you can automatically test your your software on the hardware, okay?
[00:27:45.460 --> 00:27:52.740]
At a minimal from there if it goes to the customer cool, you know at least have it test go through a release process and then
[00:27:52.740 --> 00:27:57.819]
It's in some way it finds its way to any customer but have some type of deployment automation built into your pipeline
[00:27:58.779 --> 00:28:04.299]
Okay, now this of course creates CI and it creates a CD options for us
[00:28:05.660 --> 00:28:07.660]
From there
[00:28:07.660 --> 00:28:13.340]
You know you just have to decide what you need in your CIT CICD pipelines in your DevOps processes
[00:28:13.579 --> 00:28:16.059]
Actually go through and implement them, okay?
[00:28:17.099 --> 00:28:19.099]
Now for today
[00:28:19.259 --> 00:28:25.819]
That's about as deep into DevOps as I'm going to get I hope this at least gives you some ideas of what you can do in this third step
[00:28:26.220 --> 00:28:28.699]
As you try to modernize the way that you develop your firmware
[00:28:29.339 --> 00:28:32.220]
Now some of the stuff again is we're looking at it from a high level
[00:28:33.179 --> 00:28:35.179]
But if you need help
[00:28:35.179 --> 00:28:39.419]
Implementing it or getting ideas or doing the implementation actually getting it done
[00:28:39.820 --> 00:28:43.500]
Feel free to reach out to me at jgbap.com. I do a lot of consulting training
[00:28:44.220 --> 00:28:47.660]
I even do design review so if you design something but you're not sure if it will work
[00:28:48.380 --> 00:28:50.380]
You know give me a call. I'm happy to help you
[00:28:50.860 --> 00:28:55.740]
You know, it's what I'm here for. I'm trying to help you develop faster, smarter firmware
[00:28:56.220 --> 00:29:03.259]
And the way that I see that happening for a lot of teams today in the industry is through modernizing the way that they develop their embedded systems
[00:29:03.740 --> 00:29:09.019]
So all right, thanks for your time and attention today and look forward next time
[00:29:09.820 --> 00:29:15.660]
We are going to be talking in a little bit more detail about cicd. I'll have a special guest that will talk to me
[00:29:16.460 --> 00:29:19.900]
That will talk to us about their experiences so that you can get a little bit
[00:29:20.460 --> 00:29:26.300]
Some other additional ideas outside my own experiences of developing embedded systems using DevOps
[00:29:26.940 --> 00:29:28.940]
Until then