Thoughts on Copilot for Security’s Early Days

April 1, 2024, seen the release of Microsoft Copilot for Security to general availability (GA). It is a generative AI solution integrating with Defender XDR, Entra, Purview, and Intune. Just over a month later, it’s time to write down some thoughts.

In cybersecurity, we face the challenge of scarce resources — time, finances, attention, will — to identify, protect, and respond to threats and vulnerabilities.

There’s an old joke. One economist asks another, “How’s your wife?”. The other economist replies, “Compared to what?”

To properly answer the question “How’s Copilot for Security?”, we need to think similarly: “How’s Copilot for Security compared to the alternatives that consume the similar resources to achieve similar ends?”

This article is an attempt to get you thinking about that question.

First, I’ll explain costs, because as touched on earlier, everything goes back to cost.

Then, a first run through my experience of using Copilot for Security so you can see how it performs against tasks you may attempt. My area of focus mostly pre-incident security: architecture, gap analysis, etc. Copilot for Security is marketed as solution for “end-to-end scenarios such as incident response, threat hunting, intelligence gathering, and posture management” [ref], so this use case is in scope, albeit not one that occupies most of the material I’ve seen online.

Part of the feedback I’ve heard is generative AI is all about how good your prompt skills are, and Mona Ghadiri has contributed thoughts regarding how to get better at this.

Finally, concluding thoughts — current state, how you should approach a purchasing decision, and thoughts for the future of Copilot — will wrap up this article.

Costing Copilot for Security
Hands on experience with Copilot for Security
Mona’s prompt recommendations
- T9 texting vs keyboard texting
- Be Formulaic!
Concluding thoughts

Costing Copilot for Security

Copilot for Security (sometimes referred to as just Copilot) is not licensed, rather it is billed as a resource. That’s because Copilot consumes a new type of Azure resource: the Security Compute Unit (SCU). SCUs are an abstraction of the compute resource required to power Copilot. At time of release and writing, one SCU costs 4USD/hour.

Figure 1 – Security Compute Units in the Azure portal

Unlike how you may deallocate an Azure virtual machine to reduce cost, there is currently no native way to deprovision SCUs when not in use. You would have to delete the Azure resource instead, which means recreating it when required again. In one scenario I did this, it took a long time from recreating the compute capacity to Copilot allowing me to run prompts again.

Microsoft recommend customers have three SCUs. Were these to run 24×7, it would cost over 100,000USD/year.

But how do I know how many SCUs I need running?

This isn’t easy to initially predict. There is no mapping of number of prompts to SCU consumption because even similar AI prompts may not reliably use the same level of compute resource. After some use, you can leverage the Copilot Usage monitoring page to roughly anticipate how many SCUs you consume, and therefore the reservations required, and costs associated.

Figure 2 – Usage monitoring in Copilot for Security

When considering costs for Copilot, we must consider all features, beyond the obvious generative AI. Purchasing SCUs also unlocks Microsoft Defender Threat Intelligence (MDTI). MDTI is targeted at SOCs and MSSPs — which may tell you about the target audience for Copilot — and is a paid threat intelligence offering above the threat analytics feature licensed with Microsoft Defender for Endpoint (MDE) Plan 2. MDTI provides data regarding attack groups and tools, indicators of compromise, and integration of these with Microsoft Defender XDR and Sentinel (and now, Copilot).

Hands on experience with Copilot for Security

Let’s run through a few things I threw at Copilot for Security to see how it performed. If you’ve followed me on X/Twitter, you may have seen the spoilers for this.

When you first start Copilot in securitycopilot.microsoft.com, you choose/create an Azure resource group for the SCU resource, seen in Figure 1. Copilot provisions with two RBAC roles: contributors and owners. By default, Global Administrators and Security Administrators (Entra roles) are owners. Everyone is a contributor by default. This means everyone can use Copilot provided they already have sufficient permission. For example, it won’t supersede a standard user’s permissions; you’ll still need permission in Defender XDR, Purview, and so on.

Still, you may want to reduce the scope from everyone. You cannot throttle Copilot per-user, so while determining your average SCU requirements, or at least in your pilot phase, I would recommend controlling access to the contributor role with an Entra group.

Figure 3 – Copilot role-based access control

The figures displayed so far are all from securitycopilot.microsoft.com, also known as the standalone experience. This is a UI dedicated to prompting for the data that Copilot can query. There are also embedded experiences which are on the traditional portals of Entra, Defender XDR, Purview, and Intune. The embedded experience is similar to Copilot as you’d experience it in Microsoft 365 apps and Edge; pop ups and panes over the pages you’re used to.

In the standalone experience, you’re greeted by Daily tips. The first I ever seen reminded me of generative AI’s propensity to hallucinate (politely described as fabricate in most AI services): “Fact-check, fact-check, fact-check. To catch Copilot’s fabrications, try probing the references it cites. Even those might be made up. Quotes? Same. Dates? Yep.”

Figure 4 – Daily tip in Copilot for Security

It’s a valid, important, but also subtly droll. Have you ever been quoted over $100,000/year in another cybersecurity resource (tool or employee), and the very first thing you’re told is to not trust the validity of its output?

Daily tip acknowledged, the first actual prompt Copilot is tested against my tenant is “In Entra ID, how many of my users have added a FIDO2 security key authentication method?”

Figure 5 – Copilot for Security’s prompt dialogue

The prompt wasn’t clear enough, I’m told, so I try another that leans on Defender XDR and Intune: “Can you let me know all the devices used by Ruairidh Campbell in Intune and/or Defender, and any vulnerabilities those devices have based on Defender Vulnerability Management data?”

Figure 6 – Testing Copilot for Security to query Defender and Intune inventory

I’m told there as six associated devices, but the response then goes on to only show four of those. There is no information about the vulnerabilities.

Let’s see if a third test will improve our initial impressions. Maybe if it won’t acknowledge Defender Vulnerability Management, it will acknowledge the MDTI we’re now getting: “Based on Microsoft Defender Threat Intelligence, what are the main risks you perceive for my tenant, and what would be the best use of my time for defending against those main risks?”

Figure 7 – Security for Copilot suggests malvertising is the main risk in a tenant

Copilot suggests malvertising is the biggest risk I face despite, as it goes on to explain, my tenant having zero misconfigured, vulnerable, or impacted assets and devices. It’s a downright bizarre recommendation for a human to suggest as the biggest risk, but even based on machine-driven data, there are no metrics that suggest it should be assumed the biggest risk by Copilot.

Thinking back to the daily tip, I ask it to justify its answer, and try to lead the witness: “Why did you suggest malvertising, when I have zero impacted, misconfigured, or vulnerable devices? Are there any others, such as maybe AITM or token theft that may be more important or likely?”

Figure 8 – Copilot for Security identifying tenant threats

Copilot this time suggests the Fortinet FortiClient is a potential threat: there are three misconfigured devices. Interestingly I do not have this installed on any devices, so while the misconfiguration may be related it is unlikely to be a high priority item, at least as it relates to this CVE.

In the next prompt, if Copilot is asked write me a KQL query to identify requests to the domain http://twitter.com in Defender logs, it uses the UrlClickEvents table and provides a button to Go hunt in Microsoft 365 Defender (the old name for Microsoft Defender XDR).

Figure 9 – Copilot for Security generating KQL

Fair enough. One of the things you need to be when prompting any AI system is specific as possible. So in the absence of which specific type of request, it has made an assumption I only care about Defender for Office 365 safe links clicks. Ideally, it would have assumed all types and also given me endpoint visits. So, I ask it for those.

Figure 10 – Copilot for Security generating more KQL

Unfortunately, that doesn’t return any results because it didn’t trim the http:// prefix; doing this manually made it work. So, it provided a good start.

If you ask the free Microsoft Copilot (formerly under the Bing branding), the same thing, you get a different KQL result because it only projects the most relevant columns.

Figure 11 – KQL generation in the free Microsoft Copilot

When I returned to Copilot sometime later, I asked some more detection and response type questions. When asked how to identify SharpHound activity in MDE, it provided a query that likely wouldn’t achieve the intended objective.

Figure 12 – Copilot for Security suggesting a SharpHound activity query

One of the highly marketed features of Copilot for Security is its ability to translate scripts to natural language. This would be great, as it can take a long time to understand exactly what a script, obfuscated or just complicated, is doing. In my experience, Copilot for Security continually timed out with a PowerShell example I commonly use. After four or five attempts, I gave up.

Figure 13 – Copilot for Security analyses an obfuscated script

Because I’m interested in value for money, I return to the free Microsoft Copilot and copy and paste the same script. It immediately responds, with some good info, albeit not quite hitting the nail on the head. Still, this was free and gave me something where the paid Copilot for Security gave me nothing.

Figure 14 – Microsoft Copilot translating an obfuscated script to natural language

I ask if it can help me hunt for any user activity of any kind that is outside the UK and on a non compliant device. I’m not a KQL expert, but I don’t think its suggestion will get me far.

Figure 15 – Copilot for Security generating KQL to hunt

A heavily marketed area I have success with is testing is the natural language summarisation of Microsoft Defender XDR incidents. They were easy to read and accurate.

Figure 16 – Copilot incident summary in Microsoft Defender XDR

Right, you’ve seen my results. Maybe some of the less ideal results were the result of poor prompting, so one thing that comes to mind is how can a SOC optimise their prompts to get the best out of Copilot? For that, Mona Ghadiri has helped contribute to this post in the next section.

Mona’s prompt recommendations

T9 texting vs keyboard texting

I have seen a lot of classes on how to learn prompting or prompt engineering, and I decided that instead of taking a class, I would take notes as I learned and share them with you. My first thought honestly was prompt creation feels like T9 texting and charged per letter like we did back in 2004. However, its not like that at all. The opportunity is the closer we stay to a formula, the more predictive costs can be and it can be easier to judge what makes a good vs. bad prompt. We don’t have to be as reductive as what happened with T9 texting at all!

Be Formulaic!

Microsoft says start with this kind of formulaic approach to prompt engineering.

What I realized as I was learning is this lends itself very well to is variable based thinking, just like we do in code. What if instead of trying to build out multiple prompts from scratch, we treat each prompt like security as code and coded it with variables instead of hard coding in the prompts themselves, and as we want to make more, we make child versions or add additional variables or exclusions as we see fit?

This solved a few problems for us:

Versioning/continuous improvement was way easier
Applying the same prompt to different user spaces was much easier
Engineers could grab the latest prompt they needed instead of hand developing new prompts which was an unpredictable cost per user.
I couldn’t get to a cost per prompt quite yet, but I am in a month or so going to have much better predictive metrics around usage.

There are other things I learned are worth considering when building a prompt/prompt library:

There is a trade-off between complexity and processing time and customers want to be able to quantify that somehow
Most of the Microsoft out of the box queries are between 4 and 7 prompts. These prompts generally can execute in a timely fashion, but is dependent on SCUs you have applied. When you are only using out of the box prompts, cost estimation is easier, but quality of answer is less customizable
Defining and refining prompts in real time when you need it is quite difficult and can be costly, its better to define 2-4 custom ones and then roll them out for general usage like a development sprint
Sticking to a methodical formula based prompt workflow meant we could better predict costs.

Concluding thoughts

Security isn’t easy, and hiring experienced personnel isn’t cheap. AI — across vendors — is promoted as a way of reducing costs and overhead because, otherwise, there’s no point. In the case of Copilot for Security, we therefore can only assess is it worth it based on the cost and efficiency improvements.

The current suggested model which expects, at a minimum, 1 SCU to run 24/7 sets the cost benchmark at around 35,000USD. Or, based on the recommended 3 SCUs, comes to nearer 100,000USD. Those are high benchmarks and may be challenging to justify, at least compared to the other cyber defence spend. You might be able to afford a new member of staff (a full grown human being!) or invest in a project to harden your environment, like Application Control or tiering or ongoing posture management. There’s a lot you could do with six figures.

But, it doesn’t have to cost you 100,000+ USD. Thanks to the Azure resource based nature of it, you can experiment at low cost to start. You could delete the SCU resource in Azure entirely and recreate when needed. This manual on-demand model makes more sense: when a requirement for Copilot arises, that’s when we spend, likely saving a lot of money. An issue I had with this was that when adding additional SCU due to hitting the limits, I had delays of 15+ minutes before I could run prompts again. If using Copilot for time critical scenarios, like incident response, that could get stressful. There is no native way of performing on-demand provisioning and deprovisioning (down to or up from 0), but community members have developed solutions.

There are at least two questions we must consider with any product or service purchasing decision, including Copilot:

What alternatives — proven, ideally — may better improve my intended outcomes at similar costs.
If a vendor, provider, or reseller evangelises a solution, how specifically are they using it or suggest it can help, and what real world evidence have they provided for those specifics?

You know your environment, resources, budget, skill, requirements, and similar variables better than a generic recommendation can apply to. You need to make up your own mind for if Copilot for Security is right for you, now, given its strengths and weaknesses. But please just apply the above two questions at all points of your thought process.

Long term, I’m a Copilot for Security optimist. When (and I’m assuming it’s when) it can be provisioned and billed on-demand and start to reason over wider data in the Microsoft Graph, I think it’ll be a game changer. Or alternatively, licensed like Copilot for Microsoft 365 so that costs are predictive. Gap analysis, joining dots, attack path identification: these are all challenging across Entra, Intune, Defender, and Purview. As the run through demonstrated, it’s not quite there yet with identifying gaps (the example of users without a registered security key failed), but when we get to that point, you could get a huge return on your investment compared to the time/cost of scripting or manually assessing. And that — proactive identification and minimisation of weakness — is what moves your security needle.

Table of Contents