Red-teaming AI with CounterFit
It’s an all out offensive on today’s episode while we talk about how the best defense is a good offense. But before we plan our attack, we need to know our vulnerabilities, and that’s where our guest comes in.
On this episode, hosts Nic Fillingham and Natalia Godyla are joined by Will Pearce, who discusses his role as AI Red Team Lead from the Azure Trustworthy ML Group and how he works to find weaknesses in security infrastructure to better develop ways to prevent against attacks.
In This Episode You Will Learn:
- The three main functions of counterfeit
- Why the best defense is a good offense
- Why Will and his team aren’t worried about showing their hand by releasing this software as open source
Some Questions We Ask:
- What previously developed infrastructure was the counterfeit tool built upon?
- How AI red teaming differs from traditional specops red teaming
- How did the counterfeit project evolve from conception to release?
Will Pearce’s LinkedIn
AI security risk assessment using Counterfit
Nic Fillingham’s LinkedIn:
Natalia Godyla’s LinkedIn:
Microsoft Security Blog:
Security Unlocked: CISO Series with Bret Arsenault
[Full transcript can be found at https://aka.ms/SecurityUnlockedEp31]
Nic Fillingham: (00:08)
Hello and welcome to Security Unlocked, a new podcast from Microsoft, where we unlock insights from the latest in news and research from across Microsoft security engineering and operations teams. I'm Nic Fillingham.
Natalia Godyla: (00:20)
And I'm Natalia Godyla. In each episode, we'll discuss the latest stories from Microsoft security, deep dive into the newest threat intel, research and data science.
Nic Fillingham: (00:30)
And profile some of the fascinating people working on artificial intelligence in Microsoft security.
Natalia Godyla: (00:36)
And now let's unlock the pod.
Nic Fillingham: (00:41)
Hello listeners, and welcome to episode 31 of Security Unlocked. Natalia, hello to you. Welcome.
Natalia Godyla: (00:46)
Hello, Nic. Happy to be here. Uh, what do we have on the docket for today?
Nic Fillingham: (00:50)
Today we have Will Pearce joining us. Will Pearce is the AI red team lead inside the Azure Trustworthy Machine Learning Group. Eager listeners of the podcast might recognize Will's name from a couple of episodes back where we had Ram Shankar Siva Kumar come on the podcast and mentioned Will a few times. Will is here to talk to us today about a blog post that he co-authored with Ram Shankar Siva Kumar on May 3rd, discussing the announcement of a new AI security risk assessment tool called Counterfit. And this is a great conversation, a sort of fascinating project here, and his job is about trying to break into our AI systems and compromise them in order to sort of make them, make them safer, make them better. And so we're gonna say that word, we're gonna say this word red teaming in quite a bit in the interview, and for those that may not be super familiar with the concept, we thought we might just sort of revisit it. Natalia, you've, you've got a good definition there, w- walk us through what does red teaming mean?
Natalia Godyla: (01:47)
And so red teaming originated in the military as a way to test strategies by posing as an external force. The US force would be the blue team, the defenders, and the red team would be someone that is trying to infiltrate the United States, and that same concept is now applied to security. So red teaming is that training exercise to determine where are the gaps in your security strategy.
Nic Fillingham: (02:11)
Right. And so in this context here, with regards to the Counterfit tool, Will just had a bunch of scripts that he had built himself just to sort of do his job. These are scripts he built for himself, and at some point Will talked about in the interview how he decided to pull them together into a toolkit and create a sort of an open source project that's now available up on GitHub, so that other AI red team folks, uh, really anyone who's out there trying to make AI systems more secure through red teaming can benefit from the work that Will's done. Natalia, some of the things that Counterfit can do, obviously we'll hear from Will in just a second, but what's your summary.
Natalia Godyla: (02:45)
I mean, there's so many different ways you can use this tool for offensive security. So you, you can pen test an red team AI systems using Counterfit, you can do vulnerability scanning, and you can also log for AI systems. So collect that telemetry to improve your understanding of the different failure modes in AI systems.
Nic Fillingham: (03:07)
Well, this is a great conversation with Will Pearce. I think you'll enjoy it. On with the pod.
Natalia Godyla: (03:11)
On with the pod. Today, we are joined by Will Pearce, an AI red team lead from the Azure Trustworthy ML Group to talk about a blog post called AI Security Risk Assessment Using Counterfit. Welcome to the show Will.
Will Pearce: (03:29)
Thank you. Thanks for having me.
Natalia Godyla: (03:31)
Awesome. Yeah. We're really excited to talk about Counterfit, and I think it'd be great to start with a little bit of an intro. So could you share who you are, what your day-to-day is at Microsoft?
Will Pearce: (03:40)
Yeah. Yeah. As you mentioned, Will Pearce, I'm the red team lead for the Azure Trustworthy Machine Learning team. My day to day is attacking machine learning inside Microsoft. So building tools, doing research and going after machine learning models wherever they live inside Microsoft.
Natalia Godyla: (03:59)
And Counterfit is a tool that helps with that, correct? Could you share what Counterfit is?
Will Pearce: (04:05)
Yep. Yeah. So Counterfit is a command line application that helps me automate these assessments. So this was sort of a lot of data processing that can go into them, and is taking a lot of time, and so I sort of built this command line application to take care of it. I come from the ops world, so traditional red teaming, you know, where you kind of hack networks. And so sort of the command line interface, that malware interface is what I was used to, but in the machine learning world, a lot of the tools or libraries, they're not, so they're not really readily available for you to automate things. And so I just kind of married the two together that basically wraps existing frameworks.
Nic Fillingham: (04:47)
Will, I'd love to step back just to speak to you. So you are the AI red team lead, tell us about AI red teaming or AI ML red teaming, how does that differ from sort of traditional SecOps red teaming?
Will Pearce: (05:00)
In and a lot of ways it doesn't, machine learning is a new sort of attack surface that is coming up like as businesses integrate machine learning into all kinds of things, the security of machine learning hasn't really been paid attention to. But you know, machine learning is part of a larger system, it's still an information asset that still the model files exist on a server. They're put into websites, all the normal stuff. And so a lot of those skills transferred, you know, one-to-one, the difference being is having that, that knowledge of how machine learning algorithms work, how you can bend them, how you can alter your inputs to get the outputs that you want, and a lot of it, a lot of the attacks are really just kind of engineering to get to that point.
Nic Fillingham: (05:46)
And the types of specialists that you have on an AI red team versus again, a sort of, sort of more, more generalist, uh, SecOps red team. Do you have data scientists and do have other statisticians and other folks that maybe have a different set of skills?
Will Pearce: (06:01)
Yep, absolutely. So we have a couple of members on the team that are extremely experienced data scientists and ML engineers. So basically blending of those skillsets, you know, where I don't have that formal background, but I do understand how sort of attacks work and, you know, how to run an op. They understand how the algorithm works at a, a very deep level, and so we, we have a lot of fun going back and forth brainstorming ideas.
Natalia Godyla: (06:32)
So bringing this back to the Counterfit project, how did the Counterfit project evolve? As I understand it, it started as a group of attack scripts, and, and now it's an automated tool. So what did that process of evolution look like?
Will Pearce: (06:49)
So earlier I mentioned all these things are libraries and-
Natalia Godyla: (06:53)
Will Pearce: (06:53)
... you know, I've been at Microsoft for nine months-ish. And coming from that ops role, it just wasn't scalable. So to write a script for every attack that you wanted to do-
Natalia Godyla: (07:04)
Will Pearce: (07:05)
... isn't scalable. So the first thing, just natural to want that tool, that malware type interface was to build, was to wrap these into a single tool that you could run any attack script that you wanted in, in an automated fashion. That was that, it was, it was just a need for an automated tool for my own purposes and it kind of evolved into this. Truth be told, I didn't necessarily think it was gonna be as popular as it was.
Natalia Godyla: (07:29)
Will Pearce: (07:30)
Yeah. I wrote it because I needed it, not because, you know, we wanted to release it, but it has kind of taken on a life of its own at this point where, you know, I don't do more bug fixes than I do attacks, but I could see in the not too distant future we would need a dev to like take care of the day-to-day maintenance of it, or, you know, build in whatever features we wanted for it.
Nic Fillingham: (07:55)
And did not thing exists here in this space Will, was there, was there nothing that allowed for the automation of, of the work that you were doing and that's why you sort of built it, or did something exist, but the modifications that would have been necessary to meet your needs would have been sort of too laborious?
Will Pearce: (08:10)
I shouldn't say nothing existed 'cause I don't... There was nothing that, you know, for example, data types, right? Like you have texts, images, NumPy, or, or arrays of numbers, things like that. A lot of the tools only focus on one of those data types or two let's say, right? But there's a wide variety of models at Microsoft that I need to test. And so having something that can do text, audio, image, any arbitrary data type is extremely valuable, and that was sort of the first step. It was just having a need, I didn't wanna use five different tools, you know, I wanted to use one, and so that was kind of the, the driver for me to build it.
Nic Fillingham: (08:53)
And I noticed, uh, Will it's been published through GitHub. So is the intent here for it to be a true sort of community initiative, community project and, and have contributors and, and sort of a, a vibrant community?
Will Pearce: (09:05)
Yeah, absolutely. Yeah, that's the plan. Ram will tell you I'm not the best data scientist, so this is the blending of offensive security and machine learning, right? And data science. And so there are just conventions in the data science world that I'm not familiar with, similarly, there are inventions in the offensive security world that data scientists aren't familiar with. So moving this Counterfit becomes a metaphor of sorts for these machine learning algorithms, where people feel welcomed to submit new research, um, and to really become a platform for the conversation between machine learners and security people to evolve, start to understand each other and what matters to the other.
Natalia Godyla: (09:51)
And are you also continuously updating the tool, so as you learn more adversarial attacks against AI, will you be feeding that into the product, and what does that process look like?
Will Pearce: (10:04)
Yeah, yeah, absolutely. So it exists on algorithms, right?
Natalia Godyla: (10:09)
Will Pearce: (10:09)
Uh, attack algorithms. So an algorithm basically iterates on an input in a particular way, right? And that's how it, you kind of create that output that you want. So there's that piece, is just creating new algorithms that will do whatever we think is useful for the particular task. But there's also things like a web interface that would be extremely nice for some users or, you know, just some niceties that aren't built in yet still somewhat difficult to look at the results of a scan or the samples of the scan. And so, so some of those things still need to be built in, but yeah, that's kind of the plan is to build any, you know, someone could submit a feature request tomorrow and we would probably build it the next day just because we're excited to see what people do with it and what they care about with it.
Nic Fillingham: (11:05)
So Will, if we could jump forward into, I think the three core functions or the three use cases of this tool as they're sort of listed out in the blog here for those that have read the blog post. So the first one is listed out as penetration testing and red teaming AI systems, and the, the tool here is preloaded with published attack algorithms, which can be used to, to test out evading and, and stealing AI models. We've had a bunch of your colleagues, uh, and peers on the podcast before, so we've learned a little bit on the podcast here about adversarial ML. We know that it's sort of a new frontier, we know that the vast majority of organizations out there don't have anything in place to protect their AI systems. Can you tell us a bit about this first scenario here? So evading and stealing AI models, what does that sort of look like in a hypothetical sense or in the real world, and then how do we use this tool to sort of test against it?
Will Pearce: (11:59)
Let me go backwards a little bit in your questions.
Nic Fillingham: (12:01)
Will Pearce: (12:02)
So you mentioned that organizations don't have the tools to protect these systems.
Nic Fillingham: (12:08)
Will Pearce: (12:08)
That's only partly true, only because machine learning, the model itself is a very small part of that whole system, but there's a very mature information security presence around principles of least privilege, setting up servers, deploying end points. Like we know exactly there are very mature security processes that can already be attached to these things, the difference is because machine learning people aren't cued in to this, the security apparatus at a higher level, they're not aware that these things exist, right? So you're looking at ML engineers who are responsible for deploying an endpoint to, uh, you know, let's say a public site, but they're not aware that maybe the way they're deploying it, you know, they, they put secrets in the code or, or whatever. And that's kind of what this is about, is it is about marrying of traditional information security principles and this new technology, machine learning.
Will Pearce: (13:07)
So in terms of evading a model, I mean, what that looks like is basically you have a model that is responsible for taking input and making a decision based on that input. So the classic example is images, but, you know, if you think about authentication system, you know, where it uses your face, you know, Windows Hello, maybe there is a different face that would also work on it. So evading a model is basically just giving an input such that you get the output that you want. So in the traditional information security sense, it would be like bypassing a malware classifier, bypassing a spam filter, so that's how you get your phishing.
Will Pearce: (13:43)
Stealing is, it's basically turning machine learning on its head. So it's just reflecting the model back at itself. So all you do is you send in, you grab a dataset from online, there's ton of them, for example, like an email data set. So let's say you're a spam filter. I did some research like before I got to Microsoft, it was a spam filter. In their email headers, they leaked their spam scores. So you'd send an email and you'd get one back, and in the headers it would be like 900.
Nic Fillingham: (14:12)
Will Pearce: (14:13)
I recall it's interesting. And it was in every email. So what we did is we grabbed big data set of emails, like the Enron data set, and we just sent every single email, every single Enron email through this spam filter, and we collected the email we had already. And then for each email, we just collected the score, right? And then we just trained a local model to mimic the spam filter, and using that, we were able to sort of reverse that spam filter and figure out what words the model thought were bad and what words the model thought were good.
Will Pearce: (14:46)
And so Counterfit kind of automates that process. It gives you a framework in which you can put all that code into one place and then run that attack. The code we wrote for that particular attack, it was in like, you know, 15 different files, it was several different services. It wasn't pretty, or repeatable necessarily. And so Counterfit allows you to sort of aggregate all of the weird code that you might need and allow you to interface some target model with any number of algorithmic attacks, including, you know, model stealing.
Nic Fillingham: (15:22)
So I, I might've got this wrong Will, but, so if the goal is to stop adversaries from potentially stealing your model using this technique here where you, you'd basically grab a dataset, throw it at a, at a model, monitor the output and then go train your own model to mimic that. How does Counterfit help protect against that, or how does Counter- what kind of information or data does Co- Counterfit output to help you in that, in stopping model stealing?
Will Pearce: (15:49)
Um, (laughs) it, it doesn't.
Nic Fillingham: (15:51)
Will Pearce: (15:52)
Counterfit is an offensive security tool. (laughs)
Nic Fillingham: (15:55)
Will Pearce: (15:56)
So the primary piece being offense drives defense.
Nic Fillingham: (16:00)
Will Pearce: (16:01)
So using this tool in that particular way, you can then test, right? In any number of scenarios, before you deploy a model, you can scan it and you, after you deploy a model, you can scan it, but you start to develop benchmarks. So in traditional information security, when you have a vulnerability scan, right? You scan the entire network, you get your list of critical, high, medium, low vulnerabilities. You then go start checking, you know, patching, check it, and then you re-scan the next month. This is a similar function.
Natalia Godyla: (16:34)
So we talked through two of the use cases here, the pen testing and red teaming, and then you just touched on vulnerability scanning. Can you provide a little bit more color on how you intend security professionals to use it for logging, what's the, the purpose, the driver behind that use case?
Will Pearce: (16:54)
Yeah. So logging... (laughs) Going back to security foundations, currently machine learning, a lot of them don't log-
Natalia Godyla: (17:00)
Will Pearce: (17:02)
... or they, they don't explicitly log for the purpose of security. So they'll log telemetry data, they'll log usage data, but that doesn't feed any higher level security processes. So the Counterfit has logging built in where it will track every input and every output, just as you would, you would put a l- a logging mechanism behind a model where you would track every input and every output. So we've built it in here so organizations can get some form of logging during an attack, right? So they could then turn those logs into some sort of detection pipeline, some sort of ability to detect a particular attack, but ideally organizations would log, right? They're gonna be logging anyway. And so I think it, in a lot of ways, it's just about getting machine learning people to start thinking about these security motions in a consistent way. So if you're gonna collect logs, do it in a way that's repeatable (laughs) and consistent and gives you the information that you need to, to do whatever you need to do, whether it's, you know, telemetry data or usage data or w- whatever it is.
Nic Fillingham: (18:11)
You know, you talked about a, a golfer Counterfit to sort of fit the nature of a metasploit, and being, uh, popular and, and powerful red teaming tool. What efforts are being made, or what's being done to ensure that this doesn't end up being an actual breach toolkit for adversaries? How do you toe that line of making a, a powerful tool for red teams who are ultimately trying to do good, and actually, you know, making it easier for adversaries to go out there and evade or steal models?
Will Pearce: (18:39)
I don't have a good answer for you. Well, I mean, in a lot of ways, you know, offense drives defense, right? So we think adversaries are gonna be doing this anyway. So in this way, if we can get a tool into people that make it easier for everybody (laughs) including adversaries, you know, we would hope that organizations would start putting mitigations in place for these things. If they see an uptick in attacks, they should do something about it, if they don't, then great, it's obviously not on the radar of attackers. And I would say currently it is not really on the radar of attackers.
Nic Fillingham: (19:19)
Well, not until this podcast comes out.
Will Pearce: (19:21)
Yeah, yeah. Exactly.
Natalia Godyla: (19:21)
Will Pearce: (19:22)
And so we're, yeah, I think we're maybe a little ahead of schedule just in terms of what this tool represents, and we might've missed the mark completely, right? Like we might be, we don't know if attackers are gonna go this route of attacking machine learning. There are certainly new attacks every year that come out, so the trend is up, but I think widespread abuse has yet to be seen, which I guess is the whole point here is to get ahead of that.
Nic Fillingham: (19:51)
Well, let me to just recap to make sure I, I sort of understand this. So as someone red teaming and penetration testing AI machine learning systems, you had a lot of disparate scripts, a lot of disparate tools, a lot of disparate processes, you needed to bring them all together into a, into a single pane of glass, to use an overused, uh, analogy. So you created a first and foremost for you, then you realize it would be a powerful tool for, for others out there that are, that are trying to protect AI machine learning systems through red teaming, through, as you say, offense drives defense. Can you share any examples of how the, the tool, either the, the work that you've done in protecting ML systems at Microsoft or with customers or other projects, do you have any stories you can tell of how this tool has been used out in the wild and, and some of the things that it's done to help find vulnerabilities, help patch gaps? Yeah, what are some of the positive stories or positives outcomes?
Will Pearce: (20:42)
Yeah. I mean, in the wild, I don't think so. You know, it's like when I go back-
Nic Fillingham: (20:46)
Will Pearce: (20:46)
... to talk to my, my like traditional red team peers, for them, machine learning is still a main in a lot of ways. So it's like they only hear about it in terms of, you know, they're only being sold at, right? Like they only say an EDR and it's like, okay, well, we've seen this story a million times. Like two years ago, it was application white listing. So it's gonna take, I think a little bit to get on board, but there are a couple of use cases. There's one we did with the expense fraud where you would take a receipt and you would change a digit to be more, right? So you would spend 20 bucks, you get a receipt for 20 bucks, but you'd change the two to three, then you would net $10.
Will Pearce: (21:25)
There, in a lot of systems, there's still like a human in the loop, so a lot of engines will have like a rule that says, if this is below 90% confidence, send it to a human, otherwise just trust the machine learning algorithm. There's a number of different NLP models that we've gone through, uh, with this where you can, you know, make algorithms say racist things or impolite things, and you can basically force it to do that.
Nic Fillingham: (21:56)
NLP is, uh, natural language processing?
Will Pearce: (21:58)
Mm-hmm (affirmative). Yeah. It's also neu- neuro linguistic programming-
Nic Fillingham: (22:03)
Will Pearce: (22:03)
... and I, I think it's natural language processing. (laughs)
Nic Fillingham: (22:04)
But it's, it's sort of, it's sort of the processing of written or spoken word?
Will Pearce: (22:08)
Yup. Yeah, exactly. So have you, I'm sure you might've heard of GPT-3, Open AI.
Nic Fillingham: (22:11)
Yes, we have.
Will Pearce: (22:15)
Yeah. So there's, there's a couple things there with the, like that dataset for example. They pulled everything from the internet, right? And it's like as much public data as they possibly could, but it's like, just because it was public doesn't mean it should have been public. So there's a number, an amount of PII that you can pull out of GPT-3 that, you know, organizations might not be aware exists inside the model. A lot of models like will memorize training data, and so, you know, when you deploy like an NLP model to an end point and you don't realize this, if that model has PII in it, you know, you're kind of exposing it to whoever has access to that end point. And that's, that's a new challenge for sure.
Will Pearce: (23:02)
It also, you know, if you have PII saved in your model, like it's easy to say a database has PII, this falls within a particular compliance boundary, but when you say, this model has PII, where does that fall? Does it fall inside of that same compliance boundary? Security would say yes, but a lot of machine learning data scientists, they're not there yet. And so, you know, you might have a model that is deployed that is backed by this NLP system where you can pull PII from, and Counterfit kind of helps automate this and helps me, you know, play and tweak and, you know, figure out what I need to send to model to get the output that I want.
Natalia Godyla: (23:45)
How do you coordinate with teams inside Microsoft to build a feedback loop? I'm, I'm assuming you're, as you said, tweaking along the way, and with your findings, you've discovered vulnerabilities or opportunities to evolve the way that we're handling our AI systems. How do you work with teams to better the process?
Will Pearce: (24:08)
Yeah. It's report writing. (laughs)
Natalia Godyla: (24:11)
Will Pearce: (24:12)
So sometimes we reach out, you know, there's a particular service we wanna go after, maybe it has a high impact, a high value to us, you know, maybe there's something that we, we wanna do 'cause we think for style points, so, you know, we wanna go after that. So we'll reach out and we'll contact PLC as like, hey, we're, as the trustworthy machine learning team we wanna attack your model, we'll give you a report. Other times we'd go into the Azure website and I just look at all the products that exist and I just provision them into my, into our own tenant and attack them from there, and then write the report and send it over.
Will Pearce: (24:50)
So it usually depends, it's a production system. I usually provision it if I can, and go after it that way. If it's not quite there yet, or it's, you know, a high impact use case, you know, for example, the PII one that we just talked about, will work directly with the team and kind of set up an official project. We have like rules of engagement, you know, there's a cadence, and in the end it's a report that basically states what we did, recommendations that we have, and a kind of a, a pat on the back and-
Natalia Godyla: (25:23)
Will Pearce: (25:24)
... good luck, not good luck, but, you know, reach out if you need anything kind of thing. And I would say, yeah, it's been positive. I think it's really difficult to show impact. So in a traditional information security sense, getting domain admin, you know, it's an easy way to show impact. Dumping a database full of PII, you know, it's an easy way to show impact, but, you know, when you, uh, change an image to make a dog look like a cat, and then you'd like, okay, see, this is possible? Like it's a harder sell and it doesn't quite hit home. So, you know, a lot of the work done is really just trying to show impact and give teams just an easy way to see the risks that exist-
Natalia Godyla: (26:11)
Will Pearce: (26:12)
... without having to, not dumb it down, but without having to resort to toy examples.
Nic Fillingham: (26:19)
So are there folks out there Will listening to this podcast hearing about the Counterfit tool who may not think of themselves as sort of the target audience for this, you know, protecting AI and ML systems is, is obviously still very nice and red teaming AI and ML systems, it sounds like even more so. Can you talk to us about some of the types of data scientists, security ops folks, what are some of the roles out there of people that should be taking a look at Counterfit and sort of thinking about the AI systems that might be in use in their organizations that need to be pen tested, vulnerability tested, logged, et cetera, et cetera, who, who needs to use this tool that maybe doesn't realize they need to use this tool?
Will Pearce: (26:58)
You know, really anybody using machine learning. But Microsoft has a mature information security program, a lot of places don't. So what this tool doesn't give is like, there's no model inventory, there's no tracking of assets. There's, there's none of th- those foundational security things that are, that would normally in place, right? Like how do you know what to vulnerability scan in a traditional environment where you can either scan, right? You can just every internal IP address possible, you know, or you can pull it out of an asset inventory, right? Organizations for their models don't even have asset inventories yet. If there is a machine learning person who is wondering, you know, what is possible, you know, with this model, like what can I get it to do? Like those are the kinds of people, and it's just bringing it into their own process, their own machine learning development life cycle, and saying at the end of this, I'm gonna scan and see, see what's there.
Will Pearce: (27:53)
Or maybe they're the ones responsible for deploying models to a public endpoint, and they were like, you know what? Let's see what this thing kicks out, right? Let's, let's, let's see what Counterfit comes up with. We're just point Counterfit, and if something falls out, like we'll deal with it then. But I don't know, from the security side, anytime you mention machine learning to security people, they, math, like they just don't wanna talk to you 'cause they assume machine learning means math.
Nic Fillingham: (28:19)
Will Pearce: (28:20)
And in a lot of ways-
Nic Fillingham: (28:20)
Will Pearce: (28:21)
... it does.
Natalia Godyla: (28:21)
Will Pearce: (28:21)
Yeah. And I, to be fair, I was maybe one of those people in the beginning, but I have always enjoyed like numbers and data and things like that. So this is kind of a, in some ways a dream, right? For me, because that's the things that I'm interested in. But I would say if there is an interest in data and numbers and watching what comes out, like it is a rabbit hole that just doesn't end, right? Like you can think of, I mean, in, in all the ways like attacks are, are just like this, like attackers need feedback, right? To, to be successful, and machine learning model is the same way. It's like you input data, you get output, and then you in the middle, there's some inference, there's some like black box that you have to like wonder what happens.
Will Pearce: (29:08)
And so I think in a lot of ways, security people are, already think that way. So for Counterfit, like if you have a product that you wanna bypass, if you have a spam filter you wanna bypass, like figure out how these, these algorithms that, you know, researchers built that you can use in your ops, and you'll find that fortunately, that all the math has done for you and, and all you have to do is get your data in the right format and just let the math take care of itself.
Nic Fillingham: (29:39)
I wonder if you should make up some t-shirts or some stickers that say like, you know, just Counterfit it. Like should we verb-
Natalia Godyla: (29:45)
Nic Fillingham: (29:45)
... should we verb that now and then like put it all over Blackout Conference in RSA and-
Will Pearce: (29:50)
Nic Fillingham: (29:51)
... get all the, get all the SecOps folks out there just, uh, just point Counterfit at it and see what happens.
Will Pearce: (29:56)
Yeah. Well, it's funny. So the spam filter attack that I mentioned earlier, the reason it's called Counterfit is because it is a, like a model stealing piece. So I think in some libraries like to fit a model is the term.
Natalia Godyla: (30:11)
Will Pearce: (30:12)
So it's like to Counterfit is to steal it.
Nic Fillingham: (30:15)
Very clever. I think you're, you're neck and neck with a cyber battle SIM for-
Natalia Godyla: (30:19)
Nic Fillingham: (30:19)
... coolest, uh, ML tool name, uh, to come out, in, of, of Microsoft. Will Pearce, thank you so much for joining us on Security Unlocked today. Before we wrap, before we let you go, tell us where our listeners can go to learn more about this project and/or potentially follow you on the inter webs.
Will Pearce: (30:36)
You can go to, to get the tool, go to github.com/azure/counterfit, and there is a highly recommend the Wiki, and Docker and/or Ubuntu, or if you're brave, you can install it on Windows. And I am on Twitter @Moohacks, which is...
Nic Fillingham: (30:57)
Moohacks as in M-O-O or M-U? What's Moohacks?
Will Pearce: (30:59)
Uh, M-O-O... I can't remember if I have the underscore, on my Git I have Moohacks.
Nic Fillingham: (31:06)
All right. What will we find if we follow you on Twitter, or is that an NSFW question?
Will Pearce: (31:11)
No, it's mostly, uh, machine learning things... Well, it's a good mix I think. Machine learning and, uh, cybersecurity research that I like.
Nic Fillingham: (31:20)
Sounds good. All right. Well, Will Pearce once again, thanks for being on Security Unlocked.
Will Pearce: (31:23)
Yeah. Thank you very much.
Natalia Godyla: (31:25)
Well, we had a great time unlocking insights into security from research to artificial intelligence. Keep an eye out for our next episode.
Nic Fillingham: (31:36)
And don't forget to tweet us @msftsecurity, or email us at email@example.com with topics you'd like to hear on a future episode. Until then, stay safe.
Natalia Godyla: (31:47)