The blue-team challenge

Ask any person who has interacted with a security operations center (SOC) and they will tell you that noisy detections (false positives) are one of the biggest challenges. There have been many companies that have tried to solve this problem but virtually all attempts have come up short. This article will attempt to promote a better solution using artificial intelligence (AI) & machine learning (ML) while remaining highly understandable and easily comprehensible.

First, to understand the challenge facing blue teams – those defenders charged with identifying and responding to attacks – you realize that almost any indicator will fit into one of two buckets. All detections/indicators can either be categorized as signature-based or anomaly-based.

Signature-based detections

Signature-based detections are manifested with things like:

  • Look for a running process named “mimikatz.exe”
  • Look for 50 failed logins in less than 60 minutes

Signature-based detections are trivial for attackers to circumvent in most cases. Using the two examples above, an attacker might rename their malicious mimikatz.exe executable to notepad.exe to avoid to detection. Similarly, if they execute 30 failed logins/hour, they also remain under the radar of detection because the threshold of concern was 50.

The effectiveness of signature-based detections depends highly on the breadth of detections and maintaining the secrecy of what is being monitored for. A non-technical analogy would be laying a field with tripwires and landmines; if the attacker knows the locations of your defenses, they can successfully navigate through them.

Anomaly-based detections

A second bucket of detections are anomaly-based detections. Anomaly based detections don’t rely on signatures but instead look for things that aren’t normal. Using the two examples above, anomaly detections might be something like:

  • Look for uncommon running process names
  • Look for statistically interesting volumes of failed logins

These anomaly detections are more difficult for attackers to circumvent but have challenges of their own. Specifically, just because something is anomalous doesn’t make it malicious.

Actions like quarterly backups appear statistically similar to data exfiltration, as an example. If a defender makes these anomaly detections too sensitive, then they are bombarded with noise. If they make the thresholds too high, they risk missing attacks.

Over the years, there have been companies that try to solve this problem by aggregating these indicators. Examples include:

  • A vendor that aggregates first-time events such as, “the first time a user logged on from a foreign country,”“the first time a user setup a scheduled task,” and “the first time a user sent 1GB of data.”
  • Assigning points to indicators and looking at those that accumulate the most points.
  • Mapping indicators to an industry standard (e.g., MITRE) and identifying actors that are exploiting multiple tactics/techniques.

But advances in computer technology have allowed us to develop a better way. Artificial intelligence and machine learning solutions are well within reach and less complicated than you might believe. To demonstrate this, we’ll pivot to an example that isn’t a cyber security issue.