Hunting down Sandworm and Wizard Spider: How ESET fared in the ATT&CK® Evaluation

Next story

The results of the latest round of the MITRE Engenuity ATT&CK® Evaluations are out. This time the evaluation was modeled against the Wizard Spider cybercrime and the Sandworm nation-state groups. ESET is a pioneer of research on Sandworm, with some of the most significant discoveries made about this threat group under our belt – but more on that later.

First, let’s take a brief look at the evaluation methodology and, most importantly, how ESET Inspect (formerly ESET Enterprise Inspector) fared.

Evaluation overview

As stressed by MITRE Engenuity, “these evaluations are not a competitive analysis” and thus “there are no winners”. The evaluations are a professionally executed, transparent, and objective snapshot of an endpoint threat detection tool’s capabilities to detect some of the adversarial behaviors demonstrated by the threat groups selected as the theme of the current round. At the same time, there are certain areas – crucial in real-world defense – that are out of the scope of the evaluations.

Some key parameters that the evaluations do not consider include performance and resource requirements, alerting strategy, noisiness (alert fatigue – any product could obtain a very high score on most of these results by producing alerts on every action recorded in the test environment), integration with endpoint security software, and ease of use.

The reasoning behind this is that organizations, security operations center (SOC) teams, and security engineers all have different levels of maturity and different regulations to comply with, along with a host of other sector-, company-, and site-specific needs. Hence, not all the metrics given in the ATT&CK Evaluations have the same level of importance to each evaluator.

To analyze the evaluation results properly, it’s important to understand the methodology and a few key terms. The detection scenarios consisted of 19 steps (10 for Wizard Spider and 9 for Sandworm) spanning a spectrum of tactics listed in the ATT&CK framework, from initial access to lateral movement, collection, exfiltration, and so on. These steps are then broken down to a more granular level – a total of 109 sub-steps. ESET Inspect, which supports Linux machines, was not yet released at the time of the evaluation, so Linux-related steps and sub-steps were out of scope. That means 15 steps and 90 sub-steps were evaluated in ESET’s case. The MITRE Engenuity team recorded the responses and level of visibility at each sub-step for each participating solution.

The results were then combined into various metrics, essentially based on the solution’s capability to see the behaviors of the emulated attack (Telemetry category) or to provide more detailed analytical data (General, Tactic, and Technique categories). For more details, read MITRE Engenuity’s documentation on detection categories.

ESET’s evaluation results

The results are publicly available here.

Out of the 15 applicable steps in the detection evaluation, ESET Inspect detected all steps (100%). Figures 2 and 3 illustrate the different types of detection per step.

Wizard Spider Scenario

Sandworm Scenario

Breaking the attack emulation down to a more granular level, out of the 90 applicable sub-steps in the emulation, ESET Inspect detected 75 sub-steps (83%). Figures 4 and 5 illustrate the different types of detection per sub-step.

Wizard Spider Scenario

Sandworm Scenario

As the results indicate, ESET Inspect provides defenders excellent visibility of the attacker’s actions on the compromised system throughout all attack stages.

A key metric that is important for SOC analysts to understand what’s happening in their environment is analytics – additional context – for example, why the attacker executed the specific action on the system. ESET Inspect provided this extra information for 69 of the detected sub-steps (92%).

Note that ESET did not participate in the Linux part of the evaluation as the new version of ESET Inspect with Linux support was publicly launched only on March 30, 2022, completing our coverage of all major platforms alongside Windows and macOS.

Linux detections aside (note that ESET’s ecosystem does provide endpoint protection for Linux – but this was outside the scope of this evaluation), ESET Inspect did not identify 15 out of the 90 sub-steps.

Nearly all of those “misses” are due to ESET Inspect not monitoring certain API calls. API monitoring is a tricky business due to an unfavorable signal-to-noise ratio. Considering the enormous number of API calls present in a system, monitoring all of them is neither feasible nor desirable as this would be an enormous hog on resources.

To provide an example, one of the missed sub-steps (10.A.3) pertains to detecting the CreateToolhelp32Snapshot API call, which is commonly used in legitimate applications for process enumeration. This sub-step precedes an attempt to inject malicious code into a process. ESET Inspect takes the more efficient strategy of detecting this process injection, putting focus on the less frequent and more suspicious action.

This is in no way trying to say that API monitoring doesn’t have its place on a defender’s checklist – ESET is continuously evaluating scenarios where it makes sense to stay vigilant and add detection capabilities – yet in some cases this provides very little additional benefit at a very high cost.

The key principle when designing an effective extended detection and response (XDR) solution – and this applies to endpoint security software as well – is balance. In theory, it’s easy to create a solution that achieves 100% detections – simply detect everything. Of course, such a solution would be next to useless and is precisely the reason why traditional endpoint protection tests have always included a metric for false positives – a true comparative test cannot be done without testing for false positives. This is also the reason why security analytics platforms typically suffer from a high false positive rate and have heavy resource demands. Such platforms will have to contend with XDR solutions to remain relevant.

Yes, the situation is a bit different with XDR compared to endpoint protection (because you can monitor or detect without alerting) but the principles still apply: too many detections create too much noise, leading to alert fatigue. This causes an increased workload for SOC analysts, who have to sift through a large number of detections, leading to the exact opposite of the desired effect: it would distract them from genuine high-severity alerts. In addition to the increased human workload, too many lower importance detections also increase costs due to higher performance and data storage requirements.

The essential role of a good XDR solution is not necessarily to alert the analysts to every single procedure carried out during an attack (or sub-step in the ATT&CK Evaluation). Rather it should alert them that an attack took place (or is ongoing) ... and afterward, support the investigation by providing the capability to navigate transparently through detailed and logically structured evidence of what happened in the environment and when. This is a functionality that we continue to put great emphasis on in developing ESET Inspect.

Figure 6 shows the detection of sub-step 19.A.6 – an attempt to spread NotPetya from an infected machine via WMI. Figure 7 depicts the attempt detected from the other side – the targeted machine.

In addition to alerting the SOC analysts to the malicious activity, additional contextual information is provided, including detailed command line parameters executed by the adversary, and the execution chain and process tree – highlighting further related events that were suspicious or clearly malicious.

An explanation of the observed behavior is provided, along with a link to the MITRE ATT&CK knowledge base, and the typical causes for this type of behavior, both malicious and benign. This is especially helpful in ambiguous cases where potentially dangerous events are used for legitimate purposes due to the organization’s specific internal processes, which are for the SOC analyst to investigate and distinguish.

Recommended actions are also provided, as well as tools to mitigate the threat by actions such as terminating the process or isolating the host, which may be done within ESET Inspect.

Wizard Spider and Sandworm

ESET has done extensive research on both groups that were the inspiration for this evaluation round.

Wizard Spider is a financially motivated group behind TrickBot and the infamous Emotet malware, which has often led to the deployment of ransomware such as Ryuk. These have been some of the most active botnets and thus in the sights of ESET’s automated Botnet Tracking service for some time. We have been closely monitoring the development of these threats and our systems have uncovered (and blocked) new variants, configurations, and command and control domains even before they were used in an attack against potential victims.

Our effective defense against Emotet apparently caused significant frustration to Wizard Spider. In addition to implementing protection for users of our security solutions, our extensive threat intelligence on this group helped in the coordinated TrickBot takedown effort that crippled its activity for several months.

Sandworm is an advanced persistent threat (APT) group notorious for its high-impact attacks against Ukraine and other countries. The US Department of Justice and the UK National Cyber Security Centre attributed the group to the Russian GRU. ESET’s research has played a pivotal role in uncovering the group’s activities.

ESET first identified the work of Sandworm, along with the references in its code to Dune from which its name was inspired, as that of the BlackEnergy subgroup in a Virus Bulletin 2014 talk. We have tracked this group’s activity from its inception (around the time of the beginning of the war in Donbas and the occupation of Crimea in 2014), later uncovering both of its attacks against the Ukrainian power grid – the BlackEnergy-facilitated attack in 2015 and the Industroyer attack in 2016.

Most pertinent to this evaluation round were our discoveries around NotPetya in 2017, as the second detection scenario specifically emulates this faux ransomware. ESET researchers were the ones who linked NotPetya to TeleBots, a subgroup of Sandworm, and uncovered patient zero of what eventually became the most costly cyberattack in history: the supply-chain compromise of M.E.Doc.

For a more detailed overview and timeline of Sandworm’s attacks in Ukraine and around the world, and for information on the recent cyberattacks we detected around the 2022 Russian military invasion of Ukraine, refer to our blogpost, podcast and webinar.


We are happy to see that the rigorous MITRE ATT&CK Evaluation demonstrated the qualities of our XDR-enabling technology and validated the vision and roadmap we have for ESET Inspect looking forward.

It’s important to keep in mind that the development of a high quality XDR solution cannot be a static undertaking. As adversary groups change and improve their techniques, so must XDR and endpoint protection platforms keep pace to continue protecting organizations from real-world threats.

And that’s exactly the case with ESET Inspect: it is not a solution whose development is disconnected from active threat research. No, it’s our experts who track the world’s most dangerous APT groups and cybercriminals who also ensure ESET Inspect’s rules are effective and capable of detecting malicious activity on targeted systems.

The threat intelligence that is a product of our research is used to improve our security solutions and is also offered to customers as part of our premium threat intelligence offering in the form of private reports and data feeds covering technical, tactical and strategic threat intelligence. This is on top of our publicly available research, for example, on the subject of the Ukraine crisis.

ESET Inspect is just one of the components in our comprehensive cybersecurity portfolio, designed to deliver reliable protection against cyberattacks. ESET Inspect is an integral part of ESET’s multi-layered security ecosystem, which includes strong endpoint security, cloud-based protection, machine learning-based detection technologies, and ESET LiveGrid® telemetry coming from a user base of tens of millions of endpoints (which, among other benefits, allows ESET Inspect to factor into its decisions the reputation of binaries and processes).

ESET believes this unified approach to delivering security solutions is absolutely crucial, because while it’s important to have great visibility into an attack executed in your network, it is much more important to be able to spot and recognize it among a myriad of events, or even better, to prevent it from happening at all.

For more information about ESET’s participation in the ATT&CK Evaluations, visit our page here.