Streaming Audio: Apache Kafka® & Real-Time Data

Automating DevOps for Apache Kafka and Confluent ft. Pere Urbón-Bayes

October 07, 2021 Confluent, original creators of Apache Kafka® Season 1 Episode 180
Streaming Audio: Apache Kafka® & Real-Time Data
Automating DevOps for Apache Kafka and Confluent ft. Pere Urbón-Bayes
Show Notes Transcript

Autonomy is key in building a sustainable and motivated team, and this core principle also applies to DevOps. Building self-serve Apache Kafka® and Confluent Platform deployments require a streamlined process with unrestricted tools—a centralized processing tool that allows teams in large or mid-sized organizations to automate infrastructure changes while ensuring shared standards are met. With more than 15 years of engineering and technology consulting experience, Pere Urbón-Bayes (Senior Solution Architect, Professional Services, Confluent) built an open source solution—JulieOps—to enable a self-serve Kafka platform as a service with data governance. 

JulieOps is one of the first solutions available to realize self-service for Kafka and Confluent with automation. Development, operations, security teams often face hurdles when deploying Kafka. How can a user request the topics that they need for their applications? How can the operations team ensure compliance and role-based access controls? How can schemas be standardized and structured across environments? Manual processes can be cumbersome with long cycle times. Automation reduces unnecessary interactions and shortens processing time, enabling teams to be more agile and autonomous in solving problems from a localized team level. 

Similar to Terraform, JulieOps is declarative. It's a centralized agent that uses the GitOps philosophy, focusing on a developer-centric experience with tools that developers are already familiar with, to provide abstractions to each product personas. All changes are documented and approved within the change management process to streamline deployments with timely and effective audits, as well as ensure security and compliance across environments.  

The implementation of a central software agent, such as JulieOps, helps you automate the management of topics, configuration, access controls, Confluent Schema Registry, and more within Kafka. It’s multi tenant out of the box and supports on-premises clusters and the cloud with CI/CD practices. 

Tim and Pere also discuss the steps necessary to build a self-service Kafka with an automatic Jenkins process that will empower development teams to be autonomous.

EPISODE LINKS

Tim Berglund:
Large organizations, and even not-so-large ones, need to automate infrastructure change. That's kind of a fact of life. On the other hand, you don't want those tools to be too restrictive on development teams. You'd like those teams to have as much autonomy as possible because that's usually when they get their best work done. I talked today to my colleague Pere Urbon-Bayes about a pretty cool open source tool he's built and some community he's built around the tool that tries to find the sweet spot between these two extremes.

Tim Berglund:
Before we get to that conversation, a word from our sponsor and that sponsor is Confluent Developer, that's developer.confluent.io. A website that we think has everything you'll need to get started using Kafka and Confluent. There are video tutorials, executable tutorials that you actually type code out and it works and it runs. You can get started using Confluent Cloud. There's a library of event driven architecture patterns, kind of a new thing that people have found super useful. You really should check it out. And if you do any of those exercises or, tutorials, or examples and sign up on Confluent Cloud, you can use the code PODCAST100 to get an extra $100 of free credit as you get started. Now let's get to the show.

Tim Berglund:
Hello and welcome to another episode of Streaming Audio. I'm your host, Tim Berglund. And I'm joined here today by my friend and coworker Pere Urbon-Bayes, Pere is a solutions architect here at Confluent, lives in Berlin, Germany. Pere, welcome to the show.

Pere Urbon-Bayes:
Thank you, Tim. It's a pleasure to be here with you today.

Tim Berglund:
Good to have you here. Now, you're originally from Barcelona, right?

Pere Urbon-Bayes:
Yep. Born and raised in Barcelona. Actually in a small town nearby, but nobody knows. So, I was born and raised in Barcelona.

Tim Berglund:
I say I live in Denver because nobody really knows where Littleton is. Nobody really needs to, you know, it's okay unless you're from here. But awesome, I've been to Barcelona twice, I think. And that needs to not be the lifetime maximum for me. There definitely needs to be more exposure to Barcelona.

Pere Urbon-Bayes:
It's a beautiful city and everybody asks me, what the hell are you doing in Berlin when you are from Barcelona?

Tim Berglund:
I didn't want to say that like on-air, you know, I didn't want to throw shade at Berlin or ask you an embarrassing question. It does kind of seem like you could live in Barcelona and you don't, but let's assume that there are good reasons for it.

Pere Urbon-Bayes:
Yeah, good reasons.

Tim Berglund:
Let them be your reasons and yours alone.

Pere Urbon-Bayes:
Sorry, this broken sense of humor, Tim.

Tim Berglund:
Oh no, it's good. Let's get into what we want to talk about today, which, I mean, I think your thoughts on the subject are really subtle. When I kind of hear you talk, I hear you talking about deployment automation and things, which is a very tactical, kind of simple approach. And I think your goal is really something more subtle than that. So tell me about, you're a solution architect, which means... Actually, before we get into the meat, what do you do? Make sure everybody knows what a solution architect is.

Pere Urbon-Bayes:
What do I do? That's a very good question. I work mostly with customers of Confluent but before that, I was a freelancer and a consultant. So I used to work for my own customers and my line of work is to help people deploy Kafka and write applications and make sure that what they are building is successful. And I try to do this, not only with having no specific technology or one technology process in mind but bringing agility to the process. So making sure that the organization is agile and it's able to adapt and develop quickly to change because change is what is happened all the time in software processes.

Tim Berglund:
Cool. So I want to frame the problem and I think I understand the problem you're trying to solve. So in your work with customers as an independent and now as a solution architect, helping people who buy Confluent Cloud, Confluent platform, helping them build stuff with it and figure out how to use it. What do you see happen in larger organizations when individual development teams have to go through a lot of processes to make things happen? Because anytime I talk to a big customer, there's like, well, if you want a topic, you go fill out a form. And you don't use the producer API, you use our wrapper and it's got all this stuff, and there's a lot of control. Which it seems to be, there are reasons for that, I mean, I wouldn't want that job, but it happens. But how do you see that play out? That idea of where decisions get made? Do they get made in a centralized place, do they get made as a team? What goes well there and what doesn't go well there?

Pere Urbon-Bayes:
There is always a bind for decision to control. However, the way I try to encourage people to work and the way we try to encourage projects to be deployed and organized is with daily of autonomy in mind. As a team, as a developer, what do I need and how do I need it? Do I need the topic? Do I need a configuration? Do I need to deploy a connector or call a ksql query? I can solve the problems myself because if I can solve the problem for myself, I then have to wait for more cycles before another one is able to solve the problem for me. And in that way, teams are, especially, agile teams are more autonomous in the sense that they can control what they can solve and they know their project.

Pere Urbon-Bayes:
So these play out in different technology implementations. However, the most important, like core functionality of this, is to allow the teams to solve their own problems. That it does not involve node control, but it involves you being able to submit a request or request for resources, aka topic configurations, as I said, they fit an artifact that, that you can create and deploy, but yourself. And then there is an approval process that can be more strict or less strict in order to tell you a green light, go ahead, or no, you have to please do changes because what you are requesting, doesn't fulfill the policy.

Tim Berglund:
Got it, got it. This kind of reminds me of a couple of things. First is, this is an old idea now, but the agile, vertical slice idea of a team that isn't some sort of infrastructure team or data team or front end team, but they've got all those pieces and they deliver a slice of application vertically through the architectural layer cake that does something valuable that a customer or user can see. It sounds like you're talking about that. It also reminds me of... I never know what order episodes are going to air in. So we don't always have that totally planned when we're recording. But last week, so it's June 22nd, 2021, when you and I are talking right now, last week, I talked to Zhamek Dehghani, who's the thinker behind data mesh. And she was kind of explaining that same sort of thing. Like this stuff needs to be localized to a team, and it's not a new idea. I mean, we've been saying this for a long time, as long as there's been agile-

Pere Urbon-Bayes:
Tim, so the ideas are changing, but you are sticking up with the principles from a long time ago.

Tim Berglund:
And I feel like what you're doing here is you're taking those and you're kind of calling us to say, "Hey, let's do those. Let's be faithful to those principles that we kind of all agree on."

Tim Berglund:
So anyway, with the principles and the ideas on the table, walk us through what you've seen built, what you've encouraged people to build. And I kind of feel like we're talking specifically about Confluent and Kafka and Confluent Cloud, Confluent platform infrastructure. People building stuff out of Kafka in whatever form, the tension between central control and local autonomy. Flesh that out, talk to the developer or the architect in the big organization to give him or her an idea of how this works.

Pere Urbon-Bayes:
So I think that's a very interesting question, Tim. And it's actually one conversation that I have with every other customer I work with. The most important thing is you will get a central team, a central platform team that manages the Kafka or the Confluent platform in general. And they are responsible for providing a service. How do you request such a service? You can have different interfaces. It's very common, and I've seen it over the years where people sent an email. Please give me a topic with this configuration. And then a poor human has to, not forget the email, do the right thing. You know, it's nothing would happen, usually.

Tim Berglund:
If I was on the other side of that email, doing the admin, that development effort would grind to a halt. I think, as my close coworkers know, Ale, Victoria, this message is for you. Anyway, go on.

Pere Urbon-Bayes:
But it happens, I've seen it. Other processes, other traditions that happen in such cases, please open a generic ticket. Someone creates a generic ticket, they have to go through planning, then you solve your problems two weeks after. That's not agile. And in the end, this kind of operation is very simple. Create a topic is less than a minute's task, but doing it in order, like, which is the structure of the name of space, which is the outcome configuration, how many topics or how many partitions can you have? All these are nuances that can be, for example, standardized by a platform team. You can only, oops, sorry. I apologize for this. Sorry.

Tim Berglund:
It is. It is the background sound of podcasts. It's either slack or a dog or dishes. Dishes is the big one for me. Because they, that sound comes through the microphone. These things happen. We live in houses. We have computers, don't worry at all.

Pere Urbon-Bayes:
But what I was trying to mention is you're going to have different interfaces. But then these processes are something boring. No operator likes to do this. It's not exciting. You prefer it to be focused on, resiliency planning, on cast testing. That's more fun as an operator, and it gives the company more value than you just creating topics.

Tim Berglund:
That's a good point. So like, kind of from my standpoint, if you're an operator who likes to give conference talks, which one are you going to get a talk out of? I created a topic or here's my next-gen resiliency plan. Okay.

Pere Urbon-Bayes:
And as a developer, as an application developer, who I need something, I know what I need. I need the topic, I maybe need it, maybe I don't need to go to a specific configuration so I can have a label. I can say, I need the topic category gold or I need the topic category silver. And you know the applications you are deploying too. You know you have a Kafka Streams application that reads from these topics and drives to these other topics.

Pere Urbon-Bayes:
All this is in your experience and knowledge. So you can change a file or you can submit a request through a webpage, yeah? That tells the operator or the deployment steward. This is what I need, is this okay? In most common organizations at the very beginning of these processes, this was done by a human. So human gets the confidence to learn and to know what they can deploy et cetera, but at the meet long term solutions, like the one that they created JulieOps or others that are now available for doing this job, they allow you do have validations. So you as a human, then have to validate everything, but you can say, I know I'd like all my topics' names to be color case, just to invent something. Or I like partitions to be always a minimum of three.

Tim Berglund:
Sure. Or always prime numbers or something.

Pere Urbon-Bayes:
Yeah, and then as soon as you know you can do requests, you have abstractions that serve you. The request goes to the deployment steward or the operator he or she says midnight, deploy. Then this is just the magic of admin, a custom agent, a custom software that is able to read what is in the target cluster. What do you want, calculate the difference and apply the changes? As an operator, I got one less admin task to bother. I can focus on the really good things. Scalability, resiliency, and all things that bring joy and fun to work. As a developer, I solve my own problems. So I might I'm, I've done [inaudible 00:13:43] and I have joy and fun to work. There is one book that they want to mention that I really, really liked by Daniel H. Pink it's called Drive. It defines the three things that people like to have that motivates people. Autonomy, mastery, and purpose. So we have mastery. The purpose comes to the project and the autonomy, it's brought by tools like what we are talking about today.

Tim Berglund:
Yes. Tell us about JulieOps a little bit. And by the way, we should get a link to Drive in the show notes. It's a great book, I can't recommend it enough, just a set of ideas you ought to have in your head and think about. And also clearly we'll have a link to JulieOps. So tell us about that.

Pere Urbon-Bayes:
Sorry. I had to mention the book because I'm a book nerd.

Tim Berglund:
Hey, no. We don't mention enough books on this podcast, in my opinion. So I'm glad that came up.

Pere Urbon-Bayes:
JulieOps is an open source project that we will have the link on the podcast for sure. And it is started by our work in the PS team with customers. It started very slowly by me here in Germany, but we have many contributors internally from Confluent and external to Confluent. And it's a tool that allows you to describe as a channel, how the world looks like with your topics, your configurations et cetera. And it then applies the changes into Kafka, but not only Kafka, it does topics, configurations, [inaudible 00:15:20] , role-based access control, and it does a schema registry. It does connectors and ksql requests.

Tim Berglund:
So it's integrated at the library level with admin client and so forth.

Pere Urbon-Bayes:
Yes, admin client, customer STP client, and it has a daily office state. It's like, do you know Terraform, Tim?

Tim Berglund:
Yeah.

Pere Urbon-Bayes:
It could be like a custom Terraform to do these things.

Tim Berglund:
Got it. It's not Terraform, but for people who don't know Terraform, explain what you mean by that.

Pere Urbon-Bayes:
Yeah, Terraform is a tool to deploy or request machines into cloud environment. Not only cloud as well as well VMware, and to say, I want the machine that way, please give it to me. I don't know how you do it, but please give it to me.

Tim Berglund:
Right.

Pere Urbon-Bayes:
And it can do what it needs, an update to take the configuration and do that.

Tim Berglund:
You have a declarative rather than an imperative DSL.

Pere Urbon-Bayes:
Sorry. I forgot about this. It's like Terraform, that's declarative. JulieOps is as well, declarative. You explain what you want, and it's the responsibility of the tool to decide how to get it.

Tim Berglund:
And does JulieOps then go look at the cluster and see what it currently is and make deltas or...

Pere Urbon-Bayes:
Yeah, that's correct.

Tim Berglund:
Just like Terraform.

Pere Urbon-Bayes:
It's like Terraform, but we started like this, it could as well have been a Terraform provider.

Tim Berglund:
Nice. Okay. So what's the interface like? The interface then is a DSL?

Pere Urbon-Bayes:
YAML file, you put YAML file and it's a CLI. It's usually deployed with the Jenkins process. So from the gate, you get the hook to Jenkins and Jenkins gets the YAML from the gate and applies the changes but it as well has been deployed in deployment with an open shell. Open shell operator. In the end, it's basically a tool that was born to help, started by one customer, started to growing and nowadays it's just in quite a lot of places.

Tim Berglund:
Well, YAML file, Bruno Borges, call your office. You know, Bruno?

Pere Urbon-Bayes:
I know, Bruno.

Tim Berglund:
He's a huge supporter and advocate of YAML and is always speaking out about how more things should be done in YAML. And I think we, in the community, we all appreciate his strong advocacy in favor of YAML.

Pere Urbon-Bayes:
I have to pick one [crosstalk 00:17:48] It's everywhere as well.

Tim Berglund:
It's not like it's controversial, especially if you're trying to live in the Kubernetes world, then that's the format that everybody's doing.

Pere Urbon-Bayes:
The ATL, for example, that's the one that Terraform uses, but YAML was the easiest one to pick up, and especially for people who are not that into some things, you know, it's more standard. I know what you mean, Tim. It's not always complex.

Tim Berglund:
You know, I hate looking at Jason. I mean, we all hate looking at XML. You could do it as some custom Ruby DSL, nobody's ever tried that with deployment automation tools before that's gone well, so.

Pere Urbon-Bayes:
It was doing a very good job.

Tim Berglund:
It actually did fill an important role, but having an imperative language in there, it's just, Pere, we're not capable of resisting that kind of temptation. You know, you just don't do that. You're trying to quit smoking and you keep a carton of cigarettes on your desk. You're going to smoke them, you know, it's going to happen. And if you have an imperative language in your DSL, you're going to do things that you shouldn't.

Pere Urbon-Bayes:
Like migrations. There is a process through the same thing with migrations or with declarative language. I prefer declarative because as a developer, the biggest, why you even start at this is giving autonomy to developers.

Tim Berglund:
Right.

Pere Urbon-Bayes:
And as a developer I don't care how it is implemented underneath, I care what I want. And here is a former Ruby developer. I did a lot of rails migration in my past and didn't grow to like it at all because of what can go wrong when you get 200 or even 500 migrations in a Ruby rails project. So I like a way that's descriptive so that as a developer I know what I want and how to describe it. And then someone else with the magic.

Tim Berglund:
Yeah, and there is strictly speaking a function that transforms that into the side effects in the world that you want.

Pere Urbon-Bayes:
Into the vendor.

Tim Berglund:
And that's just easier. I think it's a consensus that most of us are converged on now that that's easier to reason about the world and when it comes to operations, we would like an easier reasoning process. [crosstalk 00:20:14]

Pere Urbon-Bayes:
It's about empathy. About trying to make the life of developers easier.

Tim Berglund:
Yeah, so maybe cap it all off, walk us through a scenario of things working well. So here's JulieOps, that's a potential part of the solution. If you're doing that or if you're not in a big company, what does my life look like as a developer? Am I making infrastructure changes, by poll request? Is there a web form? Like we want there to be automation, but how does that... put that all together. Because I see the pieces, but I don't yet see what it feels like to live in that world.

Pere Urbon-Bayes:
The most common way is, as a developer, I have a GitHub repository or even inside my own GitHub repository from my application, I have a directory where one or multiple files, YAML files, sorry for that, describe the world. They describe the project that you are working on, discuss the topics, discuss the articles, the scheme, et cetera. You'll do a poll request to signal these changes.

Pere Urbon-Bayes:
Then someone in charge will do the approval, the review, and the approval. This review could be more strict or less strict depending on the environment. Why the poll request is important is because many, many organizations are PCI compliant. So they need the change management process. You build something like this without change management, no good process. So they don't care about what it brings them, they need the change management because of auditing and they need to say, okay, I've deployed this type of action, this person, or this process, has been their approval. It was at that time, et cetera. And there's a poll request, you give it to us. So steward approved poll request, poll request gets applied, the changes in these files get merged to master and Git or Bitbucket or whatever is the git server will send a webhook.

Pere Urbon-Bayes:
So we'll send a small accessibility request to Jenkins, for example, or any other CI/CD system. CI/CD system will clone the git repository, read the file, calculate the delta and apply the changes in the target systems.

Tim Berglund:
Okay. That seems pretty good. Yeah. So you still have governance, because you need governance for production stuff. And you've also got, that's now an infrastructure as code story. There's a declarative file that's controlled in the git repository and automation to make it happen. It doesn't seem that hard. And you've helped customers do this and see it work. Nice. Nice.

Pere Urbon-Bayes:
And it's kind of interesting with this data and open source project like this one and suddenly you find that someone from Australia is actually doing poll requests. Like where are you coming from? Thank you!

Tim Berglund:
That's a nice feeling, isn't it? Of course, now you have to work on the poll request and it's not free, but it's nice that people are using it and contributing.

Pere Urbon-Bayes:
I got a couple of people that don't work for Confluent from Norway, yeah? They are really, really good people. They are deploying JulieOps with one of their customers. They are very happy doing multiple poll requests. And the only shame I have is that I don't have enough time to help them. But it's just an amazing feeling to see our community growing up and seeing people finding, basically, the tool helpful.

Tim Berglund:
You're no stranger to open source, so not at all surprising that this would be a success.

Pere Urbon-Bayes:
Not at all. But I like this, the possibility to help.

Tim Berglund:
My guest today has been Pere Urbon-Bayes. Pere, thanks for being a part of Streaming Audio.

Pere Urbon-Bayes:
Thank you for making it possible, Tim. I appreciate it.

Tim Berglund:
And there you have it. Thanks for listening to this episode. Now, some important details before you go. Streaming Audio is brought to you by Confluent Developer. That's developer.confluent.io, a website dedicated to helping you learn Kafka, Confluent, and everything in the broader event streaming ecosystem. We've got free video courses, a library of event-driven architecture and design patterns, executable tutorials covering ksqlDB, Kafka streams, and core Kaka APIs. There's even an index of episodes of this podcast. So if you take a course on Confluent Developer, you'll have the chance to use Confluent Cloud. When you sign up, use the code PODCAST100 to get an extra $100 of free Confluent Cloud usage. Anyway, as always, I hope this podcast was helpful to you.

Tim Berglund:
If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on the YouTube video if you're watching and not just listening, or reach out in our community Slack or forum. Both are linked in the show notes. And while you're at it, please subscribe to our YouTube channel and to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So thanks for your support and we'll see you next time.