Episode 14 - AI and the future of log analysis, bug detection, forensics and AI ethical considerations with Jonathan Thompson Artwork

TLP - The Digital Forensics Podcast

Get involved in the exciting world of Digital Forensics and Incident Response with: Traffic Light Protocol. The Digital Forensics Podcast.

In each episode, we sit down with seasoned DFIR professionals, the blueteamers who work around the clock to investigate cyber intrusions. From data breaches to cyberattacks, they share firsthand accounts of some of the most intense investigations they've ever tackled, how they deal with burnout and the added pressure of cat and mouse while they learn about new attack chains.

All Episodes

TLP - The Digital Forensics Podcast

Episode 14 - AI and the future of log analysis, bug detection, forensics and AI ethical considerations with Jonathan Thompson

September 22, 2024 • Clint Marsden • Season 1 • Episode 14

Send us a text

In this episode of Traffic Light Protocol, Clint Marsden is joined by Jonathan Thompson, a developer and AI enthusiast currently studying at Macquarie University.

Together, they dive into how artificial intelligence (AI) is transforming the cybersecurity landscape and discuss Jon’s insights into AI’s potential applications in digital forensics, incident response, and everyday IT operations.

The conversation touches on ethical considerations, potential job impacts, and how AI can be harnessed to streamline tasks like log analysis, bug detection, and threat identification.

Daniel Kahneman - Thinking Fast and Slow
https://amzn.to/47Cpfjo

The pyramid of pain by David J Bianco: http://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html

(0:00 - 0:53)
Well, John, thank you for coming on. It's been a long time coming. I know we've been speaking about this.

You've got a bit of a break in between semesters at uni right now. You're obviously studying AI and that's going to be what I want to just chat about tonight and have a bit of an AI convo and see how we can kind of line it up with cyber and just talk about where the future's going and maybe we'll get into some ethical discussions about AI, the future of what AI could do for us from a forensic standpoint, working a bit quicker because, as you know, when an incident drops in the heat of the moment, it's pretty important to respond to that as quickly as possible. Yeah, mate, thank you for having me on the podcast.

(0:54 - 3:25)
It's exciting to be here. And yeah, I think what you've just said sums it up very well. So I work in IT in the development side and I'm currently studying AI.

I'm at Macquarie Uni doing a Bachelor of IT major in artificial intelligence. Yeah, when I was just getting started in IT, I was sort of tossing up, do I want to pursue security or development because both were very interesting to me when I sort of went the development route. I've always sort of had a bit of an interest on the security side as well.

And yeah, I think there's definitely space for AI development in the security sphere. So who knows, maybe that's where I might be sitting in the end of my degree. So AI in development is that, I mean, you've just started your degree, you're not talking about AI taking our jobs already or is that what you mean? Look, I hope not.

I think, yeah, there's always going to be space for humans in the workforce, of course. There's some things that just need the human touch, but I mean, AI can definitely work as a tool to help to make things more efficient and maybe pick up some of the things that a human error might have not been so good for. So yeah, I mean, there's definitely space for AI, I think, in just about every industry.

It's just to different degrees, but it's not going to take all of our jobs. You mentioned before where you're telling me that story of the programmer who was teaching his daughter how to code when she was eight years old. Yeah.

What was the thing? It'll tell you there's a missing semicolon on line 28 and the daughter says, why doesn't it just put the semicolon in? Exactly, exactly. And look, I thought it was a very good question. Like if you read an error message, sometimes you get one that's just so straightforward and you sort of think, come on computer, have a bit of common sense.

You can see what I'm talking about. And who knows, maybe there's space for AI in the background of programming to fix up all those bugs. I mean, just looking at ChatGPT, sometimes if I'm debugging, I might just copy a script, paste it in ChatGPT and say, where's the bug? And then rather than banging my head on the wall for however long to find it, I've got an answer straight away.

(3:26 - 6:03)
Sometimes it likes to rub it in a little bit and say, by the way, you can improve your code over here. Are you serious? Yeah, occasionally, yeah. I mean, it's a very effective tool for just little snippets with programming.

I have found myself with having issues with it in the past where it said, oh, just use this function from this library. And I've read the code that it has given me and thought, this is perfect. This is exactly what I want.

Gone to implement it, getting an error. And you start googling and reading forums and all this sort of stuff, only to find about half an hour later that the library it suggested doesn't actually exist. So those ones are a little bit annoying.

But for all the time that it saves you versus the time that it costs you, I think we're coming out on top. And I think it'll only get better as time goes forward. I mean, that one was with ChatGPT3.

And already I'm seeing big steps forward to what is it now, I think. We're on 4-0. How do you pronounce it? 4-0 or 4, I don't know.

There was a 4-0. Yeah. Something that's coming out that's meant to be thinking for itself.

And one of the professors at the institution that I work at commented on the post by someone else. I think it was their marketing promotional thing saying, with this new version, ChatGPT will pause and think about the answer before responding to you. And they were making a reference to a book by Daniel someone.

Sorry, I can't remember their surname. The book title is Thinking Fast and Slow. Daniel Kahneman, I think it is.

Is that right? Let's have a look. Let's have a squiz. Yeah, Daniel Kahneman.

K-A-H-N-E-M-A-N. I will put that in the notes. And that book is incredible.

It shows you the two types of thinking, system one thinking and system two thinking. And I don't want to digress into it. But the professor basically said this is offensive to the work of Danny Kahneman because ChatGPT is not thinking.

It is just following something else. I can't paraphrase him because I don't recall the exact words that they used. But you can see where they're trying to go.

(6:03 - 7:57)
And it's nice that it's a little bit more considered. But yeah, I've had the same problems as you from using PowerShell. I had a task where I was just reviewing the emails that were sent from a group mailbox.

And I wanted to get a listing of the subjects of those emails because I'd read, say, 15 of them and wanted to note them down, tick them off. And as part of a review task to check on how we talk with our clients, our internal customers. And so I'm like, look, I don't want to have to manually save every single email, drop it into a folder, then pull a DIR, dump it out to a text file, strip the formatting just to get the subject line of the email.

Oh, it's ChatGPT. So sure enough, go in, spits out all this PowerShell stuff that I've never even heard of before. Okay, it might be right.

No, it doesn't even exist or it's deprecated. And so, yeah, probably by this stage, I would have been better off just doing it the manual way. But with all things, I think there's probably, I mean, you and me, with you doing AI as your unit of study, your degree, we're looking for ways to do things better and to be leveraging the technology to help our business or personal lives.

I mean, is that what got you started in going for AI? I think one of the things that got me really interested in AI, it was a video that I was watching on YouTube. Who was it that published this one? Sorry. I feel like it might have been Mark Rober that published a video.

(7:57 - 13:35)
And Mark Rober, he's got some really cool stuff. He's an engineer. And he's got a pretty impressive resume.

He used to work for NASA. He worked for Apple. And now he's a major YouTube celebrity with like 40 million views on every post that he makes.

He does all the fake parcels that have fart spray and bombs. Yes, that's him. That's him.

He's pretty cool. Yeah, I think one time his package was stolen and he's for years now been perfecting a revenge on the parcel thief. But he's got a bunch of really cool stuff.

Like I remember one study he did, he bought 50 wallets and put a few bucks in each of them or something and sent them all around the country and asked his subscribers to just go and drop them somewhere in public. And then he just did a study to see how many people returned the wallet and how many returned it with cash in it. And basically, was there any variants in different areas of the country? And anyway, the one that piqued my interest for the AI, he was looking at a baseball pitcher and catcher and the hand signs that they were doing and then hand signs that they were doing and then how that would relate to the type of pitch that the pitcher threw.

And he asked a friend of his who was an AI developer to try and crack the code. And it took him like two minutes to decipher this whole language. And I thought, wow, I'm looking at this and just seeing random gibberish sign language, but the computer can work out the answer to a problem straight away.

And then you start thinking, what else can the computer work out? Normally, you have to be able to calculate a solution and you can write a program that comes to the answer. But with this AI, you can just feed it a problem and the computer works it out on its own. That's brilliant.

And so I guess I went from there and started learning about different types of AI and what different possibilities are. And yeah, I guess that sort of birthed my interest and it's just sort of grown since then. And I've written a few little machine learning scripts and I've had varying degrees of success with different projects, nothing major.

And I've sort of gone, okay, I'm getting to a point where my self-learning is slowing down. So let me go enroll at uni and start learning through the formal education system and push my own learning on the side and see where it lands me at the end of the degree. I think it's so exciting.

I'm really inspired by that. And it's such a great way where it all kind of started just from a YouTube guy. Yeah.

Yeah. It's pretty cool. Actually, if I can sort of elaborate and tell me if I'm talking too long on YouTube, but there was another video I watched who posted this one.

I'll have to look it up so that we can, I guess, put it in the credits. But it was somebody made a program that played Pokemon and all it was doing was more or less putting in random buttons and then seeing what would happen. And they'd launched like thousands and thousands and thousands of games and just watched the way that it got better each time it played.

And the other thing I found interesting was looking at the failures. Like one of the things in, it was the old Pokemon red and blue series. And one thing that they wanted to encourage was exploring.

And so whenever an input was pressed, the more parts of the screen that changed, the bigger the reward for exploring. And one thing that was kind of a bug, but kind of almost nice in its own sort of artistic way, when the character in Pokemon walked to a river, the river had a constant current going up, down, up, down, up, down with the water. And so the AI learned if it just stood there and did nothing, then it would be getting these rewards because the content on the screen is shifting.

The value on each pixel is changing and it would be getting rewards that it's doing well. So the way a human would interpret what the character did was it just walked to the edge of the water and just sat and watched the water for infinity if it could. Obviously, it's not the output you want.

Everybody wants the max level Charizard, but it was just cool to watch the way the AI interpreted its commands and I guess its targets and then found different ways to achieve those. Similar one that I watched, sorry, I didn't watch this one, this one I heard about, it was somebody who made an AI bot to play Tetris. And one of the conditions was don't let the blocks hit the roof or the top of the screen.

And it found that one of the inputs was P for pause. And if it hit pause, then the height of the blocks would never increase. And so it got to the point where you would run the bot and it would just literally immediately pause.

And that was it. And I mean, hey, it's achieved its goal of not letting the blocks hit the top of the screen. This is Gigo, G-Joe, right? We learned this in year 10 computing, garbage in, garbage out.

(13:36 - 17:17)
Exactly, exactly. It's like, well, hang on, is it really that you don't want the blocks to hit the top? Or are you trying to clear as many rows as you want? And I think it sort of ties quite well into programming in general, because whatever you write in your script, the computer is just going to do exactly as you tell it. So you've got to make sure you're giving it good commands to get the output that you're chasing.

And if you don't get the output you're chasing, you have to know how to interpret what it is giving you. So I guess you put in the human side of things into the data and you work out the resolution of whatever problem you're facing. So coming back to the types of AI, with the Pokemon example of Ash, or if it's still Ash, Pokemon, if he's next to the water and he's getting those rewards, what type of AI is implemented in that scenario? Because I know about generative AI, and then you've got machine learning.

So this one was talking about neural networks, which is, to be honest, probably the bit that interests me the most, and probably the bit that I have the most yet to learn about. But effectively, with a neural network, you've got, if you can imagine a chart with just dots all over the chart, lined up in vertical columns, and each of these dots represents either an input or an output, or both. On the far right side, you've got your final output, which in the Pokemon example would be like press left, press right, press up, press A and B at the same time, whatever the output is.

On the far left, you've got your very first inputs, which is basically what can I see at a very, very basic level, what's on the screen in front of me. And then in between, you've got all these varying nodes, I guess, to calculate, okay, I've got this input where there's a fence to my right, that means I probably can't walk straight through the fence, and that might influence that you're a little bit less likely to press right, and so on, with different levels of weights applied to each of the nodes, and different, I guess, degrees of relevance applied to each of them. And as you pass through all your calculations, it gets to a point where on the far right end of this field, or the very output of the bot, you've got a likelihood between zero and one of how likely an output is to be correct.

That'll give you your output from the neural net. That whole definition I've just given, that's from my very early education understanding level, so I mean there's probably a whole lot more to it, but that's, I guess, a high level of how neural nets work. And to me, I think they're the most interesting, because they seem to be the best for calculating and problem solving in that regard.

It's not, I mean, I guess all AI can solve problems, but it just depends on the problem you're bringing it. Like if your problem is not knowing what to write on a birthday card, you can ask ChatGPT or a generative AI, and that can generate some text for you, or an image, or a sound, or whatever. And generative AI is really cool for doing that.

(17:18 - 18:43)
But then if you were to ask it, you know, hey which crypto should I buy, or you know, which something along those lines, it's not so much what generative AI is built for, whereas a neural network that can calculate all these things, you know, give you a good estimate. That's one important factor, it's an estimate. It's never an answer, it's always an estimate with some sort of confidence level between zero and one.

It's never going to be a hundred percent. At least. One, yeah, yeah, well that's it, that's it.

And I mean that's why, you know, you'll get cases like, you know, could a brain surgeon be replaced by a robot with AI? It's like, well, you can build a robot and put a scalpel in its hand and, you know, say have at it. But, you know, generally speaking, you probably don't want to allow for guesses all the way through brain surgery. That's right, that's a little bit of a high risk scenario.

A little bit. If we were to talk about how AI can fit into forensics, like digital forensics and incident response. And, you know, from our chats in the past about cyber and forensics, and that's something that I've been doing for a while, I'm really curious about how I can maximise AI in my day-to-day work.

(18:45 - 22:41)
And it's the kind of thing that I can't really plan. I kind of am doing proactive work in between waiting for an incident to drop. But if I keep it high level, what can AI do for us looking at large data sets, large amounts of log files or finding the root cause of an attack, if we could kind of feed it just huge amounts of data and there are ways we can kind of get a bit more out of it than what's available to us now.

I mean, there's, I've seen some people using AI agents and there's talk of bots, where they kind of run through these iterative processes to keep refining the data. And I just wonder if you can talk a little bit about that as well. Yeah, 100%.

So, I think the way that I approach bringing AI into forensics and cybersecurity, I'd start off looking at the really, really small jobs that are really simple and repetitive and start looking at how those can be automated. And also what you would actually look for as a human, as an abnormality that would sort of prick your interest and start training AI on data like that. So, for example, if you're looking through log files of which IP addresses a device is communicating with, those log files can grow really, really fast.

And as a human, you're seeing a whole bunch of numbers come up and you might recognize some of the IP addresses, but a lot of the time you're not going to know off the top of your head where this communication is coming from or going to. I think if you pass a log file like that through an AI program, it could go through and analyze all the various IP addresses and all of the flags and all of the responses and basically just all the output in the log file. And it could highlight basically what you would train it to look for in terms of maybe you're aware of a malicious IP that's launched an attack on you before.

It can go to find any IP that's related to that one. And things that a human just wouldn't be able to do. Look, you can read 10,000 lines of code and sure, you can hit control F and try and find a specific IP.

But if you could train AI on billions of lines of log files and say, these are the kinds that I want to hear about. Say, for example, a pattern that you might not be able to find normally like for a brute force attack or something similar. AI could pick those out for you a lot more efficiently than a human could.

Once you get it to the point where it's finding those real red flags, you can have that running on autopilot while you keep on working on the productive work that you're doing to the side. So you don't even have to go and check. You can just wait for the AI to bring real flags to you.

That's the main thing I would focus on, especially in early days. It's just those very small, tedious tasks and getting it to do them at a very high level at a very rapid pace. And then have it come and poke you with an email or something that says, hey, I found something that you should check out.

So often we get emails saying, this login was blocked from somebody who tried to access your network from such and such country where you know it shouldn't exist or things like that. And you get so many emails, you almost get flooded with warnings like that. And a lot of the time it's reasonably safe to ignore them.

(22:41 - 23:41)
You get a lot of false flags. If you could have some way to only pull out the, I guess, the non-false flags, the real issues, it would allow you to focus a lot more quickly if there is actual incident taking place. Yeah.

So false positives, I think, is what you mean. That's a huge problem. And that's what generates just such large volumes of log data.

And then we rely on using your security operations center and you have a SIEM. And then starting off with this funnel approach, I guess, you get reports where you go, well, this month we've, as an example, we've ingested 1 billion logs. And you might get this report from a vendor.

We've ingested a billion logs for you. And then we've reduced that to 10 incident reports for your team to actually investigate and do a bit of a deep dive. And that's where the value is.

(23:42 - 25:26)
And I think, and also what you're touching on is log enrichment. So enriching the logs by taking IPs and then adding additional data like geo. So understanding what country it's coming from.

Have you seen these IPs in the past? Kind of training it on data that take these post-incident reports that we've been writing, read about them, look at the TTPs, the tactics, techniques, and procedures of different threat groups, different threat actors, and then sweep our environment looking for evidence of compromise and indicators of compromise. I mean, these days you kind of have gone from a, let's stop the breaches to we probably have already been breached. And we go to an assume, at least in a threat hunting kind of mindset, we assume breach.

And then we go out looking for evidence of that taking place. Like with threat hunting, you build a hypothesis. This is where I think using those agents to automate those simple tasks is going to be massive because there's probably, if you follow like one of the regular threat hunting frameworks, I'm going to say there's probably about 150 techniques.

And then you've got to break that down. And that's at least 150 separate hunts that need to take place in the whole organization. I'm just thinking, like in terms of training these systems, I know the GPUs are massive.

(25:27 - 27:47)
They are the focus for training AI platforms. And I've actually had a go at making my own local hosted AI using Olama. And yeah, it was hard to get going.

And then when I finally built it, I tried to upload probably, I don't know, a two megabyte text file and then ask questions of it. And I didn't actually get the results that I was hoping for. It just didn't have that breadth of flexibility that something like ChatGPT had.

Obviously it's offline LLM, large, is that a large language model? But yeah, I don't know where I went wrong, but I would love to try and keep going with that. Imagine putting my visionary hat on for a moment, which I love to do. Could I use my local LLM to take those 150 techniques, which are kind of prescribed, it's all listed out.

Look for suspicious files in these locations. I could go on and on, but can we kind of get it to give it that sample dataset and then generate these use cases? What do you think is holding us back? When we type something into ChatGPT and you get that response and you're like, that's not what I asked for. And you're just like, you're not listening.

And then you, I'm sorry, I'll try again. And it keeps trying again, keeps giving you the wrong stuff. How do we get better results on this? One thing that I learned about with the very first script I was trying to write, took in a whole lot of data on crypto.

And my idea was if I could calculate everything going on in the market and find a trade that's going to return me 1%, then just get it to do that on repeat. I'll be printing money. It's going to be great.

And spoiler alert, I'm not printing money. Because that's very bad. And you're not the Federal Reserve.

(27:47 - 31:09)
That's it. So what happened? I'm just learning and more or less copying bits of script and pasting them into my program to see how I went. I wrote a bot that would log the current market position for every coin on a particular exchange.

It was logging how many buy orders, how many sell orders, what's the price. Can I just ask for a sec? What's the log source here? Where are you getting this data? Is it from an API? Are you getting static text files? Yeah, it's from an API call or from a lot of API calls. It would make a couple of calls to the exchange for each coin every hour.

I'm going to say on the hour. I think my script was taking like two minutes to run. So on the hour, give or take two minutes.

And it was running for, I think I had about 48 hours worth of data. So multiply that 48 by every coin on the exchange by all the data points that I had. And I had a pretty huge matrix to train on.

And I was only looking for what's going to make me a 1% profit if I hold a coin for an hour. Micro trading? Yes. Yeah, exactly.

High frequency trading. Exactly. The phone should be ringing from Optiva.

They should be calling you if you're interested in that stuff. If I get it right, maybe better. So what I was thinking was if I get it to do one trade every hour for me, whether I'm asleep, awake, working, having fun, whatever, I'll have this autopilot trader going for me.

So I trained it on all this data and I was particularly interested in the number of buy orders and sell orders and the prices and quantities that those were going at. Because I thought that's going to really drive the price short term. And that's where I'll win.

And I was getting really, really, really strong predictions on what was going to be good or bad. And I was excited. I thought this is it.

And I called up a friend of mine who... I don't know if I should say his name. I'll say his name. Don't say his name.

Make up a name. Okay. His name's Ronald McDonald.

Anyway, this friend of mine, he tutors AI. And he's kind of been my, I guess, go-to if I've just got a quick question. And I told him, hey, I've got my first AI script and there's no bugs and I'm getting the output and it's great.

And his advice to me, he said, great, now go and break it. Deliberately mess up a portion of the data and see if it's still giving you a strong prediction. Oh, nice.

Okay. I thought, okay, this is good. So I went and I think I took 20% of the coin values.

And if they were low, I made them high. And if they were high, I made them low. And like two massive unrealistic degrees.

Yeah. So it was very, very wrong. Very obvious that it was wrong.

(31:09 - 32:45)
Extremely obvious. Yeah. Yeah.

And I ran my script again, and I'm still getting really strong, you know, 0.99 or 0.98 level predictions. And I'm going like, well, that's not right. I should get at least 20% dropping confidence, but if not a lot more.

And I went back through my script and I found it was the stupidest thing. It was, I put a typo on the line where I was trying to predict a particular column of data. I was trying to predict what time the next piece of data would come from.

Yeah. So I updated the scripts, made it so that I was actually trying to predict where the price would go and suddenly accuracy dropped dramatically. And it was all just that one tiny little bug.

I mean, in the end, just for the sake of closure, I did get to the point where whatever the market value, the market overall was going down, I was able to hold the same sort of dollar value in my portfolio. So if the price of a coin dropped by 10%, then my quantity that I was holding would go up by 10%, give or take. The problem came when the market started to go back up, I'd be doing bad trades and I'd be losing at the same rate that the market climbed.

So it was effectively like a savings account where the balance just sort of mildly fluctuated up and down, you know, while the market was moving rapidly. So, I mean, it was a fun project and I learned about error checking. You know, that one is back on the shelf gathering dust at the moment.

(32:45 - 33:56)
Yeah, but it's all about learning and the lessons that you learned. I mean, the big takeaway was try and break it. Yeah, yeah, that was huge.

Because I mean, had he not given me that advice, I'd have said, great, let's throw in the life savings and pick how many mansions we want to buy next weekend. Yeah, that was a huge bit of advice, break it and make sure that the output comes back broken. And if the output isn't broken, then something else is.

So if you start thinking about, I don't know, bringing AI, I mean, this kind of makes sense now why Microsoft call their AI co-pilot. How cool it would be to kind of talk to it like it's your mate who's sitting next to you and you go, look, this is what I'm doing. I'm trying to detect systems, you know, maybe even using a bit of statistical analysis, like frequency stacking and go, I'm looking for these outliers.

(33:58 - 36:07)
And so I guess this is what comes down to the prompt, right? I'm a digital forensics professional. I want to do some stack analysis to try and detect fire threat hunting, previously undetected incidents. What do you reckon? What should we be looking for? And that kind of getting it to kind of respond to you in a little bit more of a, like a less scripted way, because at the moment it's great.

It provides a lot of information. It goes, here's the top 20 windows event IDs. And that's cool.

That's great. I can build that into my dashboard. Now I can build that into my pie chart in elk and I've got it, but kind of getting to the point of using AI to stop threats before they start by going, okay, if that's your objective, this is what you can look for.

I've identified in the logs these types of things, which could be the start of an attack. How do we kind of train the AI systems? And is this like, well, we have so many LLMs that exist because there are different use cases for each LLM and that's how we have to do it. My view on why we have so many LLMs is probably more about becoming the leader in the market or leader in the industry.

I don't think it's so much designed around use cases on the scale of, you know, Microsoft Copilot and chat TPT and, you know, the huge name ones, the one, I guess, very general purpose ones. I think you would have a very purpose-driven LLM if you've got a specific problem and all your training data is around this specific problem and you're passing it data to give an output for that. I think a very important thing to focus on with LLMs is that it's not designed to calculate the correct answer to your question.

(36:08 - 36:45)
It's designed to give a response that would be in line with what a human might respond with. Or at least, yeah, I mean, that may vary a little with some LLMs, but in terms of the big ones that, you know, that everyone's looking at these days, I think there's like Leo and chat GPT and all that stuff. If, just as an example, if you took that back in time a thousand years and let's just say hypothetically you've got a device with chat GPT on it and it was trained on data from a thousand years ago and you said, what shape is the world? It's going to tell you the world is flat.

(36:46 - 47:39)
That's not because it's giving a correct answer, it's because it's giving the answer that a human would give, if that makes sense. I mean, another example I saw, this one was just pretty cool, you know, I thought like just a point of interest. I saw one where somebody gave chat GPT a screen snip of a capture and it said, can you read the letters in this capture code? And then chat GPT gave an explanation about how it was, you know, designed so that it would be difficult for anyone apart from a human to read.

You know, you couldn't write a bot that would, you know, read through the letters and numbers in the capture, but if a human was trying to access an account or something, then they'd be able to, no problem, because a human would be able to read the letters, say ABC123. Yeah, it gave this whole spiel about how only a human can read data, a computer can't, and then it gave exactly what the correct answer was at the end. And it's like, okay, well, that definitely is the kind of answer that a human would give, even down to the fact that it's incorrect.

Yeah. Have you seen Twitter's new capture? No, I haven't. I was setting up some, I was setting up a social media account for the podcast, or for the blog website, the DFIR Insights page.

And the new capture is actually quite punishing. There's about eight different steps, and it's all about picking the headlist, three columns, two rows. There's two rows and three columns, and they have different images, and they're all different rotations, and it's monochrome.

Oh, wow. Okay. And you've got to pick what fits, or pick the odd one out, and sometimes you've got to spin them around, and all this stuff.

And if you get one wrong, you start again. No way. Oh, I'm just like, come on, it's been a long day.

I just need to get this done. It's like, what is my IP in a bad IP neighborhood or something? I don't know. But yeah, it's, I guess.

Man, I can imagine one day, you know, I don't know how distant in the future, we'll all have a piece of hardware. It'll plug into your, you know, USB L port, and it'll say, insert one drop of blood right here to prove you're a human. Oh, it'll be like a pinprick, you know, the testing your glucose level? Exactly, exactly.

And it'll analyze to say, is this really blood? And, you know, then there'll be, of course, people who say, oh, that's, you know, the government is trying to harvest everybody's DNA. And there's going to be other people saying that, you know, it's just a cybersecurity thing. And then others are going to say, what if you drop in, you know, orange juice or water or, you know, something else.

But yeah, I mean, that's the thing. When it comes to cybersecurity, I think it's just a constant balance of the more secure you make something, the harder it is for the user. The less secure you make something, you know, obviously the greater risk of attack.

You know, it's that balancing game of how can you make it as easy for the user to do what they want to do as possible, but at the same time as secure so that there are, you know, their data is not at risk. It's a big balancing game that I think is, you know, pretty much going to go on forever. It's cat and mouse.

The moment we are creating new detection rules, there are black hats and pen testers who are finding ways to bypass them. Exactly. But there's this really cool example.

There's a gentleman named David J. Bianco, and he used to work for a company called Squirrel, S-Q-R-R-L, and they actually got acquired by AWS. And Squirrel was a organization that provided guidance and services, I believe, on threat hunting. Probably around, I want to say 2015, maybe 20 to 2018, they were kind of at their peak.

Forgive me if that's incorrect, but I believe that's the case before they got bought out. And they actually released some really cool PDFs on threat hunting methodologies. And that's actually how it got started.

I read one of their PDFs and the rest is history. But I actually forgot the point that I was going to make just then. That's really annoying.

Oh, sorry, the pyramid of pain. Yeah. So David J. Bianco worked for Squirrel, and he invented the pyramid of pain.

And what it is, it basically is a visual description of how difficult it is for attackers as they have to change their indicators of compromise. I doubt they would refer to them as indicators of compromise themselves. But you go from everything from hash values, which is like an MD5 or a domain names, network and host artifacts, tools and TTPs.

So those TTPs are what I was speaking about before the tactics, techniques and procedures, which are the way that they go about doing their work. But what I was thinking about is, it's easy for us to detect these things. Once they're burned, they're burned.

Once a hash value is discovered, once a malware sample is released, it's hashed, you can then train your AI on it, go and hunt for all of these hashes, go and hunt for all these IPs, domain names, network artifacts, tools and TTPs. As you go up the pyramid of pain, you start off trivial to easy, simple, annoying, challenging, and TTPs are considered the toughest at the top. Because what it requires is it requires the threat to go and invest, as you know, from developing code and writing code to do those functions for trading or whatever.

It takes a long time. You've got to debug it. You've got to write the code or you're writing the code, then you're debugging and then trying to fix it and all the rest of it.

So over time, these techniques do get burned. If we look to the future, 10 years from now, is it going to be just like War of the Machines? You've got AI that are creating detection rules at such rapid pace and you've got AI attackers who are launching and it just becomes, well, who can be quicker? Exactly. Yeah, that's it.

AI, ultimately, it's a tool. Yeah, just looking at the, I guess, the question, for lack of a better word, where is AI taking security? It's a fantastic tool in the hands of a white hat, but that tool is just as fantastic in the hands of a black hat. Anything that we're able to access using AI in terms of unlocking speed and data processing and so forth, a black hat can do just the same with whatever they're trying to come up with.

Ultimately, it's like everyone's just been put on the fast track. It's hard to argue it's a better tool in our hand than it is in theirs when it's the same tool. That being said, where the human element, I guess, comes into it is, like you said, writing the code to launch a brand new attack that no one's ever heard of before.

There's people out there who are saying, all right, what kind of damage can we do to people's accounts, to people's private details, their finances and so on. I'm sure they're just as excited as we are about AI peeking through log files to find real threats. I guess the big advantage is that when a new threat does come in, if we're able to identify it quickly, that's where you can do a lot of damage mitigation.

That's more or less automated a lot more now than it has been in the past and will continue to grow in the automation side of things, which should hopefully help the good guys. Help the blue teamers. That's it.

Should we be looking, if I'm looking at better ways of doing things for threat detection, should I be looking at neural nets or generative AI, something else? The way that I imagine it, and just to preface this, I'm not going to say I'm right, but just the way that I imagine it is more like passing log files through a massive machine learning algorithm and then returning a score on each line. I'm going to make up some terms here. A result of a matrix that says, number one, how likely is it that this is an actual threat versus a user error? For example, a user putting in their wrong password a few times from within the building that they're working in is probably just someone forgot their password.

That same user putting in their wrong password a few hundred times from the other side of the world, that's more likely to be something, I guess, malicious. If you pass all of those logs through a machine learning script that's going to give a score, how likely is it that this is a threat and how serious of a threat is this? And then you can sort those scores and say, give me all the scores greater than 0.9 or 0.8 or whatever you deem your threshold to be, I suppose. I guess the bigger your security team, the lower you can set that number.

You can say, we're going to investigate anything that's a five out of 10 or higher. This is very much aligned to a tool that I'm using at the moment called Hayabusa. It flags, it goes through Windows event logs, and then based on a specific rule criteria, we'll rate them as critical, high, medium, and low.

(47:40 - 48:24)
Nice one. And it's actually not AI-based. So I can only imagine what could be augmented if that went to the next level.

So that's good. But you're on the right track if that's what they're doing already and you didn't know about that tool then. Yeah.

I mean, I've seen similar sort of tools at work. I mean, just to highlight, I'm not part of the security team. I'm in the development team.

We sit next to each other and occasionally we'll cross over a bit with, hey, what are you working on? And show each other a bit of that sort of stuff. We need more of this. We need developers and cyber to be chatting more often and collaborate.

(48:25 - 48:55)
That's awesome. That's so good. Do you think that was strategic by management? I think it's partly organic and partly strategic.

I mean, I should also say we're a smaller team. I think whenever we get somebody new come in, or if we have a visitor, they say, oh, wow, this is everybody. And they're shocked at how few of us there are.

But we're a very experienced team. I've been working in the development side of things now for nine years, and I'm the new guy. Yeah, right.

(48:59 - 49:23)
So, I'm just backing up a little bit to what you said. I think it's very important for development and security to cross over. Because obviously when I'm developing something, if a user asks for a new function in the system, I'm focused on delivering the outcome that the user wants, getting the data that I need and doing whatever processing needs done with it and outputting it to wherever the user is asking it to go.

(49:24 - 49:31)
I can hear the cyber guys right now just going, oh, developers, this is a problem with developers. That's it. That's it.

(49:34 - 49:50)
You know, I think that's my job to do is get the user the output they're after. Cyber guys would be the complete opposite. They'd say, no, no, stop any, ideally, put everything offline, and we'll go back to pen and paper.

(49:51 - 49:54)
They're not like that. Come on. Okay, I'm exaggerating.

(49:54 - 50:00)
We just want to ring fence it. We just want to make sure it's wrapped up in cotton wool. And you can go on the big bad internet.

(50:01 - 50:30)
Oh, that's it. That's it. But I think when users come to talk to me about what they want, I've got to say the number one complaints that I typically get is, why is this blocked? Why do I have to use an MFA code there? Why does a password have to have all these characters? And you're trying to explain to them a little bit about how cybersecurity works and how these tools are very effective at what they do.

(50:31 - 51:00)
I think MFA is a big one at the moment because it's been around long enough that there's starting to be a few security risks or vulnerabilities around it. It's not a silver bullet anymore, which we used to think it was. Exactly.

It's common enough that a lot of people know it's annoying. I have to get this email, or I have to get this text, or this app, or whatever. And they know how to complain about it.

(51:00 - 51:32)
But it's also old enough now that some of the guys with a malicious intent are working out ways to get that code and impersonate you. One thing I always stress to frontline users when they're asking about this stuff, I talk about recycling passwords being a serious thing. It's not just one of those things you hear about.

You don't want to recycle passwords. Exactly. Yeah.

Sorry. Yeah. Just to be clear, do not recycle passwords.

(51:33 - 51:49)
Throw them out. And I think a lot of people, they take that seriously when it comes to things that they know are sensitive, like your online banking passwords, for example. But then for things that they consider more everyday, there's nothing too secure there.

(51:50 - 54:06)
They tend to just use the same weak password for every other account. And I say to them, have you got a really secure email password? And they'll usually M&R. And I'll say, so if somebody hacks into some account that you're holding and they get your password, and then they try that password on your email and they get into your email, what happens now? And a lot of the time they'll say, oh, there's nothing in my email too sensitive.

It's not that big of a deal, but I'd want to change it. And I say, so if in that moment they get into your inbox, then they go to your online banking and say, I forgot my password. Can you email me a link to reset it? And they're in your email.

They get that link to reset your online banking password. And they go, oh yeah, I might go change my email password. And it's like this stuff that you hear about cybersecurity, I think a lot of us these days have heard solid security advice.

And people don't realize how important it is to listen to, because there's all these scenarios that you don't think of. Once you realize why somebody's telling you, use a multi-factor authenticator or use strong passwords, you sort of go, oh yeah, I should use that security stuff. And maybe these guys doing the security work for me are actually protecting me pretty well.

And it's probably not such a big deal that this website's blocked. The security guys are actually keeping us safe. That's right.

It's a balance between, sometimes it's kind of like with the greatest of respect. It's kind of like parents and their children and having to say, my classic example my dad gives is, don't touch that, it's hot. And then they touch it and they burn themselves and they go, well, why did that happen? It's like, well, I explained to you that it was hot, but you continued to touch it.

So you now are facing the consequence. Similar thing with the user awareness piece that we have to engage in from like a phishing training, user awareness training, annual cyber training that people have to do. Exactly.

(54:07 - 54:19)
Without that context. And we don't want to, as a cyber team, we don't want to just be saying, it's blocked, you can't have this, go away. It's better for people to understand the context.

(54:19 - 58:22)
We're all adults in organizations where these problems are being faced. So it's give the people the actual reason. If they want to know, some people don't care.

They just, yeah, well, I want access and I need access for this. And well, cyber, you're just being difficult, but they don't have the full context. They don't see all the incidents.

And what you're talking about before with the enablement of MFA, I'm actually doing a talk on this next month at work. There's it's becoming, it's hard to say this is becoming more popular or if it's just the algorithm is feeding me more content because I've already shown interest in it. But the rise of info stealers or info stealing malware, where people's credentials are being stolen and it happens from a couple of different ways.

So you've got drive-by downloads, you go to a website and you receive malware, which is kind of similar to a watering hole attack. If every day I go to Sydney Morning Herald, then using some malvertising, malicious advertising is a bad ad that comes up and delivers malware. Or in the case that I'm going to be talking about next month, if you go to a Google search and you go look for some software, what looks like the real domain pops up as an advertisement, but they've done a bit of a typo squat or they've malformed the URL so that it just has one letter that's changed.

But at a quick glance, it's in the middle of the word, the way the human brain works, you kind of glance over it, your brain stitches the pieces together and you go, yeah, that's exactly what I was searching for. You click on the link, download the file and once it's executed, within seconds, it actually runs a PowerShell script on your machine, steals all your session cookies, grabs some internet history, a whole bunch of other profile information, autocomplete data from your browser and dumps it off to a web server somewhere, then cleans up after itself and it makes it actually quite difficult to find. And that's how they're bypassing MFA.

They're just actually stealing session cookies now. And some cookies don't expire for a long time for different websites. So it's good as a start, but certainly not the be all and end all.

And I think this is where we're going to start getting into, I want to talk about AI ethics. I'm sure it's, but I want to get your intellectual and academic angle on this one. Surely it's ethical for us to be using AI to look for quicker methods of detection and prevention, right? There's no ethical considerations that we need to be worried about using AI to do this.

As long as the data that we have to upload is sanitized, and we're not looking up, how can I protect Bridget Jones' personal information? Her data has just been stolen. We kind of have to anonymize it and use those use cases, right? The ethical problem is on the side of the attacker who is using AI to generate new methods of breaking into organizations to steal their data and steal session cookies. Well, it's a funny one, because I feel my gut is saying just it's Clint right now by saying, yes, it is absolutely ethical to any tool that you can possibly get your hands on to protect the privacy and the data of your user or your client is worth using.

(58:22 - 1:00:23)
And that being said, when it comes to, I guess if you would just say every tool to help, you want to make sure that you're using good quality tools. You don't want to be using some rubbish application that has its own security loopholes in it that if you install application ABC, now there's a backdoor for somebody to get into your system. There's that.

Yeah, for sure. Yeah. So if you were trying to, if you were a small business and you do a Google search and you go AI privacy protection and you, that's a brilliant angle, right? It's a different concept.

It's not just about using Chet GPT to be like, how can I protect my users? It's also buying off the shelf tools, Cots commercial off the shelf. Yeah. Yeah.

Okay. It's a very unique angle, John. I knew you'd have one.

I think with ethics, I think there's very, very, very rarely a black or white answer. I think it's almost always some degree of gray. A good example that I learned about was a parent who's trying to make sure that their child is safe online.

They install a software to keep track of where their kid's signing up for accounts and what websites they're visiting, just as an example. They could look at history, but let's just say, as an example, they're using a program to track their kid's online activity to keep their kid safe. Which we know is the most rudimentary and you should really implement a proxy server at home if you want a modicum of protection and logging, but go home.

Just to take a really, really simple case. That most... It's ethical here though, John. You can go all in, go deep dive.

(1:00:24 - 1:06:28)
Almost anyone would say what this parent is doing to protect their child is ethical. But then you could argue, hang on, is this a breach of privacy? Exactly. Wendy, if you obviously become... Do you need to reach the age of majority? I mean, is there ever a point where protection is less valuable than privacy or more valued than privacy? Absolutely.

I think that's a difficult one to answer and that's why I tend to say just about everything is somewhere in the gray. It's rarely black or white. I think... Yeah, that's hard.

In terms of, I guess, my real world job, just for a little bit of background there, I work in the medical industry and I see patient data, massive amounts of patient data at work. And I guess the number one focus in literally every single thing we do at my job is to achieve the best health outcome for the patient. That being said, if you were to come along and say the easiest way to get this result to the doctor is to hire a skywriter and write the results up in the sky.

Yeah, sure. The doctor just has to look up and they've got that patient's results. But now you've got a little bit of a privacy breach on your hands because anyone else that looks up can see it.

Yeah. Again, you come back to the balance of, okay, how do we avoid this privacy breach? How do we go to the absolute furthest degree possible to avoid the privacy breach? And for that, you could say, well, the doctor and the patient have to travel with four forms of photo ID to the laboratory, provide a DNA sample to prove that they are who they say they are, and only then will we release the results. And so now, great.

We know the result only went to the right person, but it was not really a fast way to get the result there. And as a result, the patient health outcome has suffered. And so you obviously, to take it away from those two really extreme examples, you sort of come to a point of going, okay, well, what is reasonable in terms of getting a result to a doctor for the patient to be treated as quickly as you can, while not opening up unnecessary security risks and just finding means of communication between different systems that allow for that secure transmission of health information? I think just to go back to the question on the ethics of AI and security, I think AI is a tool.

It's not necessarily a good thing or a bad thing, much like a knife. A knife in the hand of a chef is a fantastic thing. They can make me an awesome dinner, and I love that.

Or the exact same knife in the hands of a maniac could cause a catastrophic outcome. So you can't take that and say the knife is a good thing or the knife is a bad thing. It's just the knife is a thing, and it's the actions of the tool handler that determine whether it's a good outcome or a bad outcome.

And then I guess to loop that back to the parent looking after their kids' online safety, the action, it was committed with good intent, but does that make it a good action or not? Well, that's a bit of a difficult one to answer. Does the intention outweigh the action, or does the action outweigh the intention? The way I see it, you just have to issue a disclaimer, and this is an internet access policy, and if you want access to the internet, then you agree that you are under surveillance. Right? It seems silly, but just full disclosure.

You've accessed my internet. I'm the parent. The cost of entry, I guess.

It's the same at work. Everyone has corporate policies. You're on our system.

We monitor everything. We can see everything you do. We might use SSL inspection.

We might use other methods, whatever. Yeah, 100%. Transparency, it bridges a huge amount of gaps.

You tell people, hey, this is what we're doing, and this is how, and this is why. And then you give the user the power to make the decision. I mean, we see it even with advertising these days.

You'll sign up in the terms and conditions that will say, we're going to track what you're doing so that we can give you targeted advertising. Do you agree to this? Sure. You want the software, you agree, and then you install uBlock Origin, or PiHole, or Raspberry Pi to block it as DNS.

To be honest, I actually like the targeted advertising. If, let's say a social media platform sees that I'm interested in a particular type of music, and a band that I like is coming to town, and they give me an ad, I'll be like, hey, sweet, I'm glad that I know about this concert now. I like ads that are like that.

That's good, and it's relevant. Yeah, yeah, exactly. So, I mean, because of their transparency, saying, hey, we want to watch what you do and then give you ads that relate to it, I'm like, yeah, cool, have a look at what I do.

A lot of, I guess, security comes from your own insignificance. No, we cannot go to security via obscurity. I'm not important.

Lies. Maybe this is me being a little bit careless. Well, mate, thanks again for coming on.

I will let you go and have dinner. My pleasure, mate. Great to have a chat, and thanks for having me on the podcast.

(1:06:29 - 1:06:39)
It's been fun. Anytime. If anyone wants to reach out to talk to you about AI or development, is that something you'd be interested in? I'm bad at replying.

(1:06:41 - 1:07:11)
All right, well, that's good. You're just a super celebrity now, so we won't link to your LinkedIn. Yeah, that's great.

I mean, feel free to link my LinkedIn if anyone wants to have a chat. Apologies in advance if I'm slow to reply. I tend to be pretty busy with work and study, but I will do my best to check my messages.

Fantastic. All right, thanks again, John. It's great having you on.

All the best. Take care. Yeah, look after yourself, and I'll speak to you soon.

Take care, man. Bye.

People on this episode

Clint Marsden

Host

TLP - The Digital Forensics Podcast