By Ariz Soriano, Associate Director, Red Team Operations, THEOS Cyber

Modern red teaming has a context problem. Every tool speaks a different language, every engagement starts from zero, and AI keeps being sold as the solution without anyone addressing the grounding issue underneath. At THEOS Cyber, this is something we’ve been working on. At .HACK 2026 in Seoul, our Associate Director, Ariz Soriano, presented research that reframes the problem.

Seoul, .HACK 2026, and a reframe that took a while to land

Last week I was in Seoul with Yi-Ting Shen for the .HACK 2026 conference, where we presented on something we’ve been thinking about for a while: most of what’s wrong with modern red teaming isn’t a tooling gap, it’s a context gap. The tools are excellent. Every tool speaks a different language, every finding gets flattened into a static PDF, and every engagement effectively starts from zero. That is the real issue.

Getting to that framing took a while, and it wouldn’t have landed the same way without the collaboration. Yi-Ting and I first crossed paths at ROOTCON 19, but it was minimal, a “hi” in the hallway and not much more. The conversation picked up on LinkedIn afterwards, where we started trading thoughts on AI in offensive security. The exchange kept going over weeks, one idea turned into a shared document, the shared document turned into a research direction, and a few months later we were on stage in Seoul presenting it together.

There is a specific kind of value in building something with a researcher from a completely different context: different tools, different day-to-day, different instincts. A lot of what made it onto the slides only got there because one of us pushed back on the other.

A note on .HACK 2026

The lineup at .HACK was genuinely strong, and what stood out was how much of it was driven by local Korean talent. You do not always see that at security conferences, where the speaker list tends to lean heavily on foreign names. Here it was the opposite, and the quality of research being presented made it clear the Korean cybersecurity scene is doing serious work that deserves more international visibility.

Credit to the organisers, who ran an exceptionally well-managed event from the moment we landed. Logistics, communication, speaker support: everything was handled with a level of care that made the whole experience run smoothly, which is harder to pull off than most people realise. It was a great environment to present in and an even better one to learn from. First time in Korea, and definitely not the last.

The fragmentation trap

Walk into any red team’s morning and you will see the same scene: Nmap output in one window, BloodHound in another, findings pasted into a Notion page, and screenshots dragged into a shared drive. The operator’s brain is the integration layer. That is the literal system architecture of most red teams today.
This is not something we observed in isolation. A 2025 study in the International Journal of Safety and Security Engineering explicitly calls out that fragmented pentesting tools with minimal interoperability cause redundant effort and inconsistent findings.

But duplicated work is not even the worst part. Rich attack data dies inside the final report. The PDF records what worked. Everything else: the failed attempts, the dead ends, the environmental context the operator absorbed over three weeks, gets discarded the moment the engagement closes.

That data is arguably more valuable than the successes. Failed paths are the map of where defences actually held. And we delete it every time.

The challenge compounds the moment another team is involved. Purple teaming is where the same fragmentation hits a hard wall: the red team carries an attack graph in their head, the blue team stares at millions of raw events in a SIEM, and without a shared data model, both sides end up narrating their own experience at each other in a meeting. Red says what they did. Blue goes digging through logs afterwards to find the matching events. Nobody is looking at the same reality at the same time.

You cannot collaborate if you cannot see the same reality.

Red teaming is reasoning, not clicking

What drives everything else in our research is a fairly simple reframe: hacking is continuous reasoning, not execution.

After every single tool output, a competent operator answers two questions:

    • What does this mean?
    • What do I do next?


When I see port 445 open on a host, I don’t just log it. I immediately think: SMB, potential lateral movement, enumerate shares next, check for null sessions, look for signing requirements. That inference chain is the actual craft. Everything else is typing.

If you have been around for a while, you’ll recognise this as the classic OODA loop: Observe, Orient, Decide, Act. It was originally formulated for fighter pilots, but it maps cleanly onto offensive operations. You observe the output, orient yourself within the target environment, decide on the next tactical move, and act. The output of that action becomes the next observation, and the loop keeps going.

Thousands of iterations of this loop happen in an engagement. A system that could remember every finding and every relationship across those iterations could drive the loop automatically. That is where AI becomes genuinely useful in offensive security: maintaining context across the entire engagement, augmenting the analyst through every iteration of the loop.

Why naive AI integration fails in this context

The usual answer to the fragmentation problem is “add more AI,” and that answer misses the root cause. A generic LLM dropped into a red team workflow will hallucinate hosts that do not exist, credentials that were never captured, and pivot paths that do not resolve. This is typically the model being fed inaccurate context from tool output that was parsed poorly, or from the wrong tool being selected for the job. That is a grounding problem, solved by structuring what the AI can see.

The fix isn’t a better model. What fixes it is  constraining what the AI can access to the actual, verified state of the target environment: structured data to reason over, a persistent representation of the engagement it can’t wander outside of, and a mechanism to ground its suggestions in reality rather than in training distribution.

That is the architecture we walked through on stage, and it’s also the part we will not detail here. If you want the deep dive, the full talk will be available through the .HACK 2026 channels.

Our takeaway

Modern red teaming is a reasoning workflow disguised as a tooling workflow. The industry keeps buying more tools to solve what is fundamentally a context problem, and AI keeps being sold as the fix without anyone addressing the grounding issue underneath.

AI makes red teaming better by being plugged into a persistent, structured representation of the target environment. Without that foundation, the result is faster hallucinations.

If your team is rescanning assets because past data was not saved, losing credentials because they were not correlated to where they apply, or scrolling through chat logs to answer blue team questions a week after the fact: context is the gap, and context is what needs to be solved.

Stop thinking in isolated tools. Start thinking in persistent context.

Thanks to everyone who came to the talk in Seoul, and to the .HACK organisers for putting together a conference that punches well above its one-day format. If you want to discuss anything from the talk, feel free to reach out.

Special thanks to Yi-Ting Shen for the collaboration. This line of research wouldn’t exist without those long back-and-forth exchanges on LinkedIn, and it was genuinely a pleasure building it out together and sharing the stage in Seoul.