What does it take to build a purple team program that scales beyond spreadsheets and one-off fire drills? In this episode of Ahead of the Breach, we sit down with Gary Lobermier, Lead Adversarial Security Engineer at Northwestern Mutual, OSCP-certified red teamer, and one of the more hands-on practitioners building automation-driven offensive security programs in financial services today.

Gary came up through the IT ranks from network admin, sysadmin, IT manager, before pivoting hard into offensive security. That background turns out to matter. He brings a practitioner's skepticism to the question every security team eventually has to answer: is our detection stack actually working, or are we just hoping it is?

Purple Teaming at Scale Is an Engineering Problem

Most purple team efforts stall at the same place: someone runs a handful of Atomic Red Team tests, generates a report, patches a few detection gaps, and calls it done. Three months later, an infrastructure change invalidates half those results and nobody knows.

Gary's team at Northwestern Mutual decided to treat the problem like software. Instead of running MITRE ATT&CK techniques manually on an ad hoc basis, they built a custom automation platform to schedule, deploy, and track the execution of hundreds of techniques across Windows, macOS, Linux, and AWS EC2 instances — every single day.

The goal isn't coverage theater. It's continuous validation: when your environment changes, you need to know immediately whether your detections still hold.

Why Off-the-Shelf Tools Fall Short

There's no shortage of tooling in this space. Caldera, Atomic Red Team, and commercial platforms all exist for a reason. Gary isn't dismissive of them. But he's been direct about where they break down in practice: most tools validate surface-level techniques against a static environment. Real enterprise environments aren't static. New assets get spun up. Configurations drift. New EDR rules get deployed with behavior that doesn't match documentation.

When you need daily validation at scale across heterogeneous infrastructure, you eventually hit limits that general-purpose tools weren't designed to solve. That's the gap Gary's team filled with custom engineering and the specific design decisions they made are worth understanding, not just the conclusion.

The MITRE ATT&CK Procedure Gap Nobody Talks About

Here's a tension Gary surfaces that most purple team conversations skip over: MITRE ATT&CK tracks Techniques and Sub-techniques, but the framework's documentation of specific Procedures (the actual implementation-level variants used by real threat actors) is inconsistent and incomplete.

When you're running continuous validation at scale, that ambiguity creates real problems. Which procedure variant are you actually testing? Does it match the specific behavior your detection logic was written to catch? If you're not tracking this carefully, you end up with a false sense of coverage. Your dashboard says you tested T1055 process injection, but you only tested one variant of twelve.

Gary's team addressed this with a custom YAML schema to track procedures at a granular level and integrate that tracking with their threat intelligence and detection engineering workflows. That kind of integration — connecting what attackers actually do with what your detections are supposed to catch — is where purple teaming moves from theater to operational value.

EDR Evasion: What Security Teams Get Wrong

Gary has presented publicly on EDR evasion techniques, not to help attackers, but because understanding exactly how evasion works is prerequisite knowledge for building detection that actually catches it.

His point of view is direct: modern EDR products are not magic. They have known gaps. The vendors know it, the red teamers know it, and the security teams who assume EDR coverage equals actual coverage are the ones who find out the hard way. The only way to know what your EDR catches is to test it against real techniques, repeatedly, as those techniques evolve.

This is exactly the kind of work that doesn't happen in an annual pentest. By the time the PDF lands, the environment has changed, the EDR vendor has pushed an update, and half the findings are stale. Continuous validation is how you stay ahead of that gap.

What Happens When You Automate the Red Team

One underappreciated outcome of automation is what it does to your blue team's posture. When defenders know that attack simulations are running continuously, they start treating detection as a product with a quality bar, not a box to check. Alert fidelity improves. Detection engineering becomes a discipline, not a reaction.

Gary's experience mirrors what Sprocket sees across customer environments: continuous testing changes behavior on both sides of the equation. The red team gets faster signal on what's working. The blue team closes gaps before real attackers find them. And the program compounds over time instead of resetting with every new pentest cycle.

Building This Without a Dedicated Red Team

Not every organization is Northwestern Mutual. If you're reading this with a team of two and a shoestring budget, Gary's advice isn't "hire five red teamers and build a custom platform." It's to start with the question: how do I know my controls are working today?

Even basic answer-seeking — picking five MITRE techniques relevant to your threat model, running them manually once a quarter, and tracking results over time — is dramatically better than no validation at all. The automation comes later. The discipline of asking the question comes first.

Where Offensive Security Programs Go Next

Gary is thinking about where AI fits into this picture, and he's appropriately skeptical. There are real applications: accelerating proof-of-concept development, analyzing code for vulnerable patterns, generating attack variants for coverage testing. But the human stays in the loop. Automation without validation produces noise, and in adversary emulation, noise is worse than silence because it creates false confidence.

The future of programs like Gary's isn't replacing red teamers with AI. It's using AI to extend what skilled practitioners can do — more coverage, faster iteration, better signal — while keeping expert judgment on every critical decision.

Listen to full episodes out now:

For more information about Ahead of the Breach and to listen to the latest episodes, please visit www.sprocketsecurity.com/aob-podcast. Episodes are also available on all major podcast platforms.
 

Apple

Spotify

YouTube
 

We look forward to bringing you more conversations with actionable insights that help in your pursuit to protect your most valuable assets — and help clients do the same!