Processing real-time event streams enables countless use cases big and small. With a day job designing and building highly available distributed data systems, Simon Aubury (Principal Data Engineer, Thoughtworks) believes stream-processing thinking can be applied to any stream of events. 

In this episode, Simon shares his Confluent Hackathon ’22 winning project—a wildlife monitoring system to observe population trends over time using a Raspberry Pi, along with Apache Kafka®, Kafka Connect, ksqlDB, TensorFlow Lite, and Kibana. He used the system to count animals in his Australian backyard and perform trend analysis on the results. Simon also shares ideas on how you can use these same technologies to help with other real-world challenges.

Open-source, object detection models for TensorFlow, which appropriately are collected into "model zoos," meant that Simon didn't have to provide his own object identification as part of the project, which would have made it untenable. Instead, he was able to utilize the open-source models, which are essentially neural nets pretrained on relevant data sets—in his case, backyard animals.

Simon's system, which consists of around 200 lines of code, employs a Kafka producer running a while loop, which connects to a camera feed using a Python library. For each frame brought down, object masking is applied in order to crop and reduce pixel density, and then the frame is compared to the models mentioned above. A Python dictionary containing probable found objects is sent to a Kafka broker for processing; the images themselves aren't sent. (Note that Simon's system is also capable of alerting if a specific, rare animal is detected.) 

On the broker, Simon uses ksqlDB and windowing to smooth the data in case the frames were inconsistent for some reason (it may look back over thirty seconds, for example, and find the highest number of animals per type). Finally, the data is sent to a Kibana dashboard for analysis, through a Kafka Connect sink connector. 

Simon’s system is an extremely low-cost system that can simulate the behaviors of more expensive, proprietary systems. And the concepts can easily be applied to many other use cases. For example, you could use it to estimate traffic at a shopping mall to gauge optimal opening hours, or you could use it to monitor the queue at a coffee shop, counting both queued patrons as well as impatient patrons who decide to leave because the queue is too long.


In this week's podcast, we are talking to a streaming data expert from Thoughtworks. You might know him, it's Simon Aubury and he has recently won the Confluent Hackathon with a really great project. It's a fun use of machine learning and some field hardware Apache Kafka. He has been using Google's TensorFlow to do some AI stuff to track the animals of the Australian Outback, technically the suburbs of Sydney, but it's outback enough for me.

I thought we'd get him on because I was on the judging panel for the Hackathon and this project really hit that sweet spot between doing something fun and cool and playful. But then you step back for a second and you can easily see some real world news cases that he's addressing with really just a few hundred lines of code. It's really great and we're going to link to the project in the show notes. But before we get started, this podcast is brought to you by Confluent Developer, which is our education site for Kafka. We get more about that at the end, but for now I'm your host, Kris Jenkins. This is Streaming Audio. Let's get into it.

Joining me today is Simon Aubury. Simon, welcome to the show.

Hi Kris. It's an absolute pleasure to join you today. Thanks for having me on as a guest.

It's great to have you. Last time we met, we were in person. I was on the other side of the planet with you in Sydney. We'll have to make due with a wired link this time.

Yes. But I do appreciate that you made the long voyage out to Australia. It was so good to see you in person.

It's always great to meet people in person and I used to live in Sydney, it was nice to be back, if only for a while.

I'm sure it's changed a bit. Anyway, it's definitely great to be able to join you on Streaming Audio today.

It's a pleasure. Let's say in Sydney you work for Thoughtworks as a principle data engineer. What is a principle data engineer?

I think I'd describe my job as doing cool things with data, if I was to explain.

Isn't that the whole of computer science?

Absolutely. But I think if I was trying and do this as an elevator pitch, I'd describe it as doing interesting things with highly available distributed data systems. And that typically gets me in the door with clients and they might be in the worlds of finance or transport or healthcare or insurance. But it's definitely an interesting overlap with interesting business problems and the ability, the opportunity to play with really cool technology such as Apache Kafka. And I also love to mix it up and play with some concepts that come out of data mesh and the concepts of data mesh and data streaming platform seems to be a nice sort of overlap. It's an interesting place to play.

It sounds like a sweet gig to be honest.

Yeah, for sure.

The idea of play is why you're joining us this week because we recently held a Hackathon and you are our glorious crowned winner.

It was a fabulous opportunity to have some of the typical constraints of day to day project life removed and start thinking about, left unshackled, what cool things could you build? And I think the challenge out of the Confluent Hackathon, could be boiled down to do something cool and something that interests you with Apache Kafka and it's great to have the opportunity to play with cool tech.

Always. Tell people what you built and then we're going to go through how you built it.

Absolutely. I think the short summary would be this is a wildlife monitoring system and as any good Hackathon would be mixing technology with a real world problem and I essentially wanted to solve for how could you identify animals in the world, count them and work out where they were and maybe draw some insights around population trends and animal movements. And it's particularly good to be able to use something like Apache Kafka, streaming technologies and an excuse to use a Raspberry Pi.

It's all you need for a Hackathon project.

And you live in the best place in the world for wildlife tracking because doesn't Australia have more things that can kill you than any other country?

That might be quite a stretch, but it was definitely a bit of an advantage of having an Australian back garden. When I literally put a Raspberry Pi in the back garden with a camera to see what animals could wander across, there really were local birds, local wildlife, and some stray cats and dogs that happened to wander past the camera. I did have the advantage of having a large collection of wildlife in our back garden to play with.

Diverse flora and fauna.

Kris Jenkins: (05:31)
Simon Aubury: (05:44)
Kris Jenkins: (06:44)
Simon Aubury: (06:56)
Kris Jenkins: (07:41)
Simon Aubury: (07:50)
Kris Jenkins: (08:25)
Simon Aubury: (08:29)
Kris Jenkins: (09:03)
Simon Aubury: (09:14)
Kris Jenkins: (10:00)
Simon Aubury: (10:10)

That's what we're talking about? Okay.

Exactly. Yeah. And this is quite accessible and for a deployment such as to Raspberry Pi, having rapid libraries in Python makes it quite accessible. Actually for this project, I think to be completely accurate, I used a cut down version of TensorFlow, called TensorFlow Lite, which is a slim down model evaluation library, which is optimized for battery powered devices that you actually want to deploy into a back garden or a field somewhere.

It's a power thing rather than you're just trying to save money.

Exactly. But when I want to multiply my real estate, I've now got the capability of deploying hundreds of these to hundreds of back gardens.

All you need is to become a property tycoon and you're away.

I know. But I've got the dream and the vision. It's just execution between here and there.

If anyone would like to fund Simon, please contact us directly at the podcast.

Kris Jenkins: (11:19)
Simon Aubury: (11:34)
Kris Jenkins: (12:43)
Simon Aubury: (12:50)
Kris Jenkins: (13:18)
Simon Aubury: (13:22)
Kris Jenkins: (13:28)
Simon Aubury: (13:38)
Kris Jenkins: (14:24)
Simon Aubury: (14:34)
Kris Jenkins: (15:58)
Simon Aubury: (16:12)
Kris Jenkins: (16:59)
Simon Aubury: (17:03)
Kris Jenkins: (17:27)
Simon Aubury: (17:30)

Simon Aubury: (17:32)
Kris Jenkins: (18:23)
Simon Aubury: (18:25)
Kris Jenkins: (19:21)
Simon Aubury: (19:23)

How's that actually coded?

I chose to use ksqlDB and coded it as a windowing function, which, again, in the context of a Hackathon, is a quick way of achieving an outcome.

Simon Aubury: (19:43)
Kris Jenkins: (20:07)
Simon Aubury: (20:18)
Kris Jenkins: (21:10)
Simon Aubury: (21:15)
Kris Jenkins: (21:16)
Simon Aubury: (21:21)
Kris Jenkins: (21:37)
Simon Aubury: (21:42)
Kris Jenkins: (21:51)
Simon Aubury: (22:03)
Simon Aubury: (22:52)
Kris Jenkins: (23:51)
Simon Aubury: (23:55)
Kris Jenkins: (24:30)
Simon Aubury: (24:38)
Kris Jenkins: (25:05)
Simon Aubury: (25:29)
Kris Jenkins: (26:04)
Simon Aubury: (26:17)
Kris Jenkins: (26:38)
Simon Aubury: (26:50)
Kris Jenkins: (27:08)
Simon Aubury: (27:11)
Kris Jenkins: (27:23)
Simon Aubury: (27:36)
Kris Jenkins: (28:11)
Simon Aubury: (28:20)
Kris Jenkins: (28:38)
Simon Aubury: (28:40)
Kris Jenkins: (28:48)
Simon Aubury: (28:56)
Kris Jenkins: (29:48)
Simon Aubury: (29:50)
Kris Jenkins: (30:06)
Simon Aubury: (30:16)
Kris Jenkins: (30:41)
Simon Aubury: (30:53)
Kris Jenkins: (31:06)
Simon Aubury: (31:10)
Kris Jenkins: (31:32)
Simon Aubury: (31:40)
Kris Jenkins: (31:55)
Simon Aubury: (32:03)
Kris Jenkins: (32:08)
Simon Aubury: (32:10)
Thanks very much.
Kris Jenkins: (33:08)
