Streaming Audio: Apache Kafka® & Real-Time Data

Confluent Platform 7.0: New Features + Updates

November 09, 2021 Confluent, original creators of Apache Kafka® Season 1 Episode 185
Streaming Audio: Apache Kafka® & Real-Time Data
Confluent Platform 7.0: New Features + Updates
Show Notes Transcript

Confluent Platform 7.0 has launched and includes Apache Kafka® 3.0, plus new features introduced by KIP-630: Kafka Raft Snapshot, KIP-745: Connect API to restart connector and task, and KIP-695: Further improve Kafka Streams timestamp synchronization. Reporting from Dubai, Tim Berglund (Senior Director, Developer Advocacy, Confluent) provides a summary of new features, updates, and improvements to the 7.0 release, including the ability to create a real-time bridge from on-premises environments to the cloud with Cluster Linking. 

Cluster Linking allows you to create a single cluster link between multiple environments from Confluent Platform to Confluent Cloud, which is available on public clouds like AWS, Google Cloud, and Microsoft Azure, removing the need for numerous point-to-point connections. Consumers reading from a topic in one environment can read from the same topic in a different environment without risks of reprocessing or missing critical messages. This provides operators the flexibility to make changes to topic replication smoothly and byte for byte without data loss. Additionally, Cluster Linking eliminates any need to deploy MirrorMaker2 for replication management while ensuring offsets are preserved. 

Furthermore, the release of Confluent for Kubernetes 2.2 allows you to build your own private cloud in Kafka. It completes the declarative API by adding cloud-native management of connectors, schemas, and cluster links to reduce the operational burden and manual processes so that you can instead focus on high-level declarations. Confluent for Kubernetes 2.2 also enhances elastic scaling through the Shrink API.  

Following ZooKeeper’s removal in Apache Kafka 3.0, Confluent Platform 7.0 introduces KRaft in preview to make it easier to monitor and scale Kafka clusters to millions of partitions. There are also several ksqlDB enhancements in this release, including foreign-key table joins and the support of new data types—DATE and TIME— to account for time values that aren’t TIMESTAMP. This results in consistent data ingestion from the source without having to convert data types.

EPISODE LINKS

Tim Berglund:
Hi, I'm Tim Berglund with Confluent here in the desert of the United Arab Emirates. just outside Dubai, to tell you all about Confluent Platform 7.0. 

Tim Berglund:
You know, the big theme in 7.0 is hybrid cloud. This is the thing that is becoming more and more common. It's a little bit of an obvious thing to say that people are migrating to the cloud. That's very well known, but about 55%, according to a recent Forrester Survey, 55% of organizations are doing hybrid cloud deployments. And there's a lot of reasons for this, right?

Tim Berglund:
Your brand new event-driven microservices application, that's a greenfield design. You know, that thing's pretty easy to migrate to the cloud, or you probably built it to run in the cloud in the first place, but there are a lot of legacy applications that you don't just pick up and put in a cloud somewhere. Running those things, they may stay on-prem really for the foreseeable future. So getting the data in your enterprise that lives in those legacy applications connected to the data that's in your newer cloud-native things, that's a real problem. That's something we're trying to address with 7.0. 

Tim Berglund:
Bridging legacy on-prem things and cloud things, and even modernizing parts of those legacy applications and migrating to the cloud, absolutely a mission critical activity these days, but it's kind of hard. There are a few things that get in the way. We often find these are situations where point-to-point integration technologies might be things that get entrenched. And as we know, those are one point to one point. They tend to proliferate into the classical spaghetti mess of lots of point-to-point integrations. And that's hard to scale. That's hard to maintain. It's just not a pleasant way to live. 

Tim Berglund:
There can be batch processes that get data from the on-prem system into the cloud. And again, that's slow data, you're behind, this is again, not a good way for these systems to operate, not how we aspire for our cloud transitions to work out. 

Tim Berglund:
So what you're looking for really is something that's of course, highly available, consistent, secure, and real-time, not an old school batch integration technology, but something that's gonna let us operate our legacy on-prem things and our cloud native future as one system that behaves like one system, where the availability of the data is the way it would be if we had built it as one system. 

Tim Berglund:
And the big news in 7.0 is the general availability of Cluster Linking. This is a feature that's been in preview. It's GA now, and it's here. So that's going to enable the kinds of things I've been talking about. And also, here's another thing, self-service access to data. And this is just a trend. It's really just a Kafka thing. Once data is in a topic and that topic is accessible, subject to data governance concerns, other developers, other business units, other applications they can get at that data in a self-serve way, not something that is overly centrally controlled. And this allows new application functionality, new business units, new ventures really to kind of emerge growing up around that data. That's the dynamic we want. 

Tim Berglund:
And of course, you want that from an operational, from a business perspective, as something that is not so expensive to operate. So we'd like a low TCO and that's something that Cluster Linking and Confluent Platform 7.0 are gonna make a little bit easier. 

Tim Berglund:
So generally available Cluster Linking is the big news here, but we've got other features to talk about. And as usual, they fit into these three buckets, Everywhere, Cloud-Native and Complete. Let's take those one at a time.

Tim Berglund:
Under Everywhere is Cluster Linking. So let's dive into that in a little bit more detail. I've been talking about bridge to cloud as kind of the primary use case for Cluster Linking, but don't forget also cluster migrations. If I've got data in one cluster and I need simply to move it to another, this is a fine way to do that. This is a much better way to do that, frankly. Because you've got a feature operating at the broker level, that's making byte for byte copies of messages and topics, it's just a bulletproof way to do migrations. 

Tim Berglund:
We've also got now source-initiated links. So you don't have to be initiating the link on the destination. And sometimes networking considerations like might be harder to poke a hole in a firewall here rather than there. And you don't want to be constrained to only be able to initiate from one side. So we've got source-initiated links to give you options there for networking. And those are always nice to have. 

Tim Berglund:
The way this would have worked of course, in the old days is your on-prem data and some kind of batch process that's going to sort of do some nightly bulk extract and some transformations, and then loading into the new system. You know, the old school ETL sort of thing. The way we would rather have things work now with Cluster Linking is that data being extracted from systems in real time through Confluent and now available to self-service consumers, that is new applications that can read that data, do what they want with it in a way it's simply available there in Confluent for them to consume. 

Tim Berglund:
Now with Confluent Replicator, we had offset preservation. Offset preservation is absolutely essential for disaster recovery use cases, or really any hope of failing over, in some sense, an application from one cluster to another, you have to have offsets preserved. Replicator is based, based on Kafka Connect. There's a little bit of extra infrastructure there. And then sort of a post-processing step where offsets are updated. 

Tim Berglund:
Here now, with this slightly better way of doing it at the broker level, offsets are natively preserved in that, in that replication process from the beginning, in that byte for byte copy of the messages from source to destination topic. So a little bit smoother should be a lot easier to operate. A lot easier to operate of course means lower cost to operate. So the business will be happy with this, and if you're the architect designing it or the operator running it, you're gonna be a lot happier with the simplified architecture. 

Tim Berglund:
In the Cloud-Native bucket, we've got updates to Confluent for Kubernetes and updates to KRaft. Let's talk about both of those. Confluent for Kubernetes of course was a part of the previous Confluent Platform release in the spring, but it now offers API level support for connectors, schemas, and cluster links. We're making such a big deal about Cluster Linking. Of course, that needs to be supported so you can define your cluster links in YAML the way you've always wanted to. And of course, now this can be deployed in an automatic way in your Kubernetes cluster if you're using Confluent for Kubernetes. And this is certified to run in VMware Tanzu and Red Hat OpenShift, of course, should work in any CNCF conformant Kubernetes distribution, but certainly Tanzu and OpenShift are certified things, it's labeled on the tin, those should work great. 

Tim Berglund:
KRaft is kind of the emerging name for the net effect of the work going on under KIP-500. KIP-500 is a massive effort in Apache Kafka to remove ZooKeeper. So ZooKeeper, if you don't know, is the separate distributed system that maintains the metadata, you know, to what topics consist of what partitions, where are the partitions located? Where's the lead replica, all that kind of stuff. And more. That's traditionally been stored in ZooKeeper. No one loves operating ZooKeeper. It's served faithfully. It's been a wonderful part of the system, but it's not, it's a separate distributed system that you have to run. And Kafka is pulling that functionality into itself in its own custom raft implementation, which is called KRaft. 

Tim Berglund:
So you've got Kafka, Apache Kafka 3.0 functionality. So the current state of KRaft as of AK 3.0 now in Confluent Platform 7.0. This is still a developer preview feature. If you want to know more about it, you should check out the Kafka 2.4 and Kafka 3.0 release videos that are out there on YouTube, should not be hard to find. I dig into the current state of those things a little bit more in those videos. It's developer preview, not ready for production yet, but if you've got the bandwidth, you really should be looking at this because it's gonna be an important part of how you operate Kafka. It will be a substantial improvement to the way you operate Kafka. So check it out. You'll be able to scale to larger numbers of partitions. You'll be able to fail over nodes faster, all kinds of great things will be going on. So definitely something to check out. 

Tim Berglund:
Now under the complete bucket, we've got reduced infrastructure mode for control center, and as always updates to ksqlDB. Now in the previous release of Confluent Platform, we introduced Health+. That was a feature that would allow your on-prem cluster to send metrics to the cloud, to Confluent securely and we could do monitoring from there and intelligent alerting and all things based on our experience in operating Confluent Cloud, you know, we're sort of bringing that to you now as a feature of Confluent Platform, the monitoring and alerting and things that we have to do to operate our cloud service, you get to kind of participate in the technology behind that in your cluster. 

Tim Berglund:
Well, reduced infrastructure mode for C3 allows you to disable monitoring in Control Center and offload it to a cloud-based service in Health+. So if you want to do your monitoring without operating local infrastructure to support it, you're now able to do so. That should be an improvement to your life. A little bit of infrastructure you don't have to manage. So if you're running your cluster on-prem, then monitoring can become a cloud service. 

Tim Berglund:
And in ksqlDB, we've got 0.21. Remember KC will DB operates as an independent project. It's got its own website, ksqldb.io. Something that I strongly recommend you check out. You should look at release 0.21 because that's what you've got in Confluent Platform 7.0. So a couple of high points here. There are type system updates, there are always type system updates. You've got improvements in the date and time types, which are very important types in any SQL. The other one and this one kind of snuck up on me is a foreign key joins. It's funny, 'cause the joins that ksql's been doing have always been primary key joins, 1:1 joins. Now I can join, I can do a 1:N join on a foreign key. So that's, should be a powerful thing and should unlock some new use cases and some new capabilities for you to check out. 

Tim Berglund:
There's more to say about ksql than that. Again, I want to direct you to that website and the 0.21 release if you want to find more details. So with that, stay in touch. These are the key links that you need. If you have questions, of course, if you're a customer you've got existing support mechanisms. If you're not, if you just want to know more there's forum.confluent.io. If you have questions about any of these features or others, you can post there, somebody will get back with you in fairly short order, it's a great place to get help and to interact with other people. You can always reach out in Confluent Community Slack, lots of channels for you. And as always, I look forward to hearing what you build.

Tim Berglund:
And there you have it. Thanks for listening to this episode. Now, some important details before you go. Streaming Audio is brought to you by Confluent Developer. That's developer.confluent.io, a website dedicated to helping you learn Kafka, Confluent, and everything in the broader event streaming ecosystem. We've got free video courses, a library of event-driven architecture and design patterns, executable tutorials covering ksqlDB, Kafka streams, and core Kaka APIs. There's even an index of episodes of this podcast. So if you take a course on Confluent Developer, you'll have the chance to use Confluent Cloud. When you sign up, use the code PODCAST100 to get an extra $100 of free Confluent Cloud usage. Anyway, as always, I hope this podcast was helpful to you.

Tim Berglund:
If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on the YouTube video if you're watching and not just listening, or reach out in our community Slack or forum. Both are linked in the show notes. And while you're at it, please subscribe to our YouTube channel and to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So thanks for your support and we'll see you next time.