Hands Free: What LLM Driven Vulnerability Research Looks Like

Tomer Goldschmidt

/ June 2nd, 2026

Introducing LLMs into Research

Artificial intelligence (AI) and large language models (LLMs) are shifting the way numerous industries function, driving new operational approaches and efficiencies within manufacturing, robotics, technology development, software engineering, robotics, and all other critical infrastructure sectors.

Team82 is no different. We have experimented incorporating these advanced technologies into our research methodologies. In this blog, we showcase our use of Anthropic’s Claude Opus 4.6 AI model to uncover vulnerabilities in a popular video intercom platform manufactured by Zenitel. Last November, we disclosed five vulnerabilities in the TCIV-3+ model, a rugged IP-based video intercom that is deployed in many high-security areas and industrial environments.

Since we’d already researched this platform and found a range of highly critical command-injection, out-of-bounds write, and cross-site scripting vulnerabilities on the platform, we wanted to see how effective an AI model would be carrying out the same research. How quickly could it find these security issues compared to traditional, manual research? Would it find new vulnerabilities? Would it find new ways to chain the existing bugs into exploits?

This automated hands-free approach to vulnerability research is likely the next phase of this cybersecurity discipline. Already we’ve seen the impact of Anthropic’s Project Glasswing, and the rate at which it shrunk the time to find flaws and exploit them. The Claude Mythos frontier AI model behind Project Glasswing is currently available only to a closed preview group of technology companies, including Microsoft, Cisco, Amazon, NVIDIA, and cybersecurity companies such as Crowdstrike and Palo Alto Networks. Enterprises around the world are already rethinking their vulnerability and exposure management programs in anticipation of a wave of new vulnerability reports likely to come their way in the next few months.

We believe it’s critical to put these models to the test, throw back the curtain on our own vulnerability research methods, and determine how AI can truly change the course of security research.

How Team82 Manually Uncovered Zenitel Vulnerabilities

The Zenitel TCIV 3+ video intercom manages access to secure areas inside buildings and offices. This device features SIP dialing and voice-over-IP (VoIP) functionality together with a video feed and a remote settings and management interface.

Our first step last year in researching this platform was to take a software update downloaded from the vendor’s website and extract its filesystem. After doing so, we reviewed the configurations on the extracted filesystem and looked for relevant indications of the device’s web service feature, ipstweb a UPX-packed binary. We unpacked it using the UPX utility and statically analyzed the binary. During the static analysis, we refined the decompilation as much as possible to have a better view of the code flow. Then we started to drill down into as many code flows of the binary that are prone to issues and bugs.

Our static analysis and more traditional means of vulnerability research uncovered five vulnerabilities:

CVE-2025-64126: An OS command injection vulnerability that enables code execution (CVSS v3, 9.8)

CVE-2025-64127: An OS command injection vulnerability that enables code execution (CVSS v3, 9.8)

CVE-2025-64128: An OS command injection vulnerability that enables code execution (CVSS v3, 9.8)

CVE-2025-64129: An out-of-bounds write vulnerability that may crash a device (CVSSv3, 9.8)

CVE-2025-64130: Cross-site scripting (XSS) vulnerability that allows JavaScript execution on a victim’s browser (CVSSv3, 9.8)

This particular vulnerability research analysis took several hours to conduct and refine. This was the driver for us to test the same approach using Claude Opus 4.6.

Research Instrumentation: Claude Code

We began our hands-free research with Claude Code, a tool that allowed us to interact with Anthropic’s LLMs on our underlying operating system and code base through a command-line interface. Claude Code is an agentic AI coding assistant capable of executing commands, editing files, and much more all using natural language instructions.

To set up our agentic research environment we created a research working directory that consisted of a CLAUDE.md file, .mcp.json, and our target folder where we place the Zenitel software update for the TCIV-3+.

│   .mcp.json
│   CLAUDE.md
│
└───targets
    └───zenitel
            VS-IS_9.1.3.1.zip

The CLAUDE.md File

The CLAUDE.md file acts as the context component driving our LLM during the session in the Claude Code application. This is a crucial component in dictating how the LLM should approach our challenge and derive a solution.

In our case we tried to convey the idea of an agent doing security research for a capture-the-flag contest scenario. We provided extensive information on how to approach binary research and code analysis for vulnerability discovery.

Together with the Project Identity information above, we provided extensive knowledge about the tooling at the agent’s disposal, which in our case was a self-made Ghidra (disassembler/decompiler) model-context protocol (MCP), server enabling the agent to interact with the code-browser for analysis.

In addition, we explain to the LLM agent in detail how to conduct analysis and reverse engineering. This is important because we don’t want our agent to diverge or deviate from the purpose we have of discovering vulnerabilities in this target system.

References to agent’s operational pipeline.

Our Target Positioning

Regarding our target, all that we are giving Claude Code in our session is a prompt directed at the zenitel folder which contains the zip file update from the vendor. This folder is located inside the target’s folder.

Starting The Research Process

As stated, we decided to go with the LLM model by Anthropic named Opus 4.6. This model is not the latest version of their frontier models, but it works well with high integrity in technical tasks.

To initiate the research process we provided a short prompt directing the LLM to begin investigating and analyzing the software update.

As you can see from the prompt we gave it, we initially wanted the LLM to focus on the web-service exposed by this target system. This was an intentional choice because we knew this is a critical area of the system that would contain the most exposed attack surface.

This is where the speed of these AI models begins to surface. One minute into the session, the LLM extracts the firmware filesystem and starts to locate the web service components.

Claude Code begins by extracting the firmware filesystem.

Thirty seconds later, it locates the ipstweb binary that is in charge of the web service.

The AI model finds the ipstweb web server binary.

Ninety seconds later, it had identified the binary is packed with UPX. It tried to unpack it, unsuccessfully because the UPX utility was missing. The model then proceeded to install UPX and unpacked the binary successfully.

After three and a half minutes, the LLM loaded the binary into Ghidra using the MCP we provided and started bashing against it with common attack vectors that could yield easy findings such as command injection vulnerabilities.

MCP-aided analysis of ipstweb with Ghidra

After six-and-a-half minutes, the LLM compiles its findings for us in a markdown report.

A comprehensive summary report is written by Claude.

Faster Vulnerability Research Results with LLMs

The agent managed to discover numerous vulnerabilities in fewer than 10 minutes, most of them already disclosed and patched by the vendor. Now you may question the value of finding vulnerabilities that were already disclosed and patched by the vendor, however it’s important to note that the technical details for these vulnerabilities were never publicly disclosed. The LLM discovered concrete evidence of these vulnerabilities by conducting its analysis, and in a fraction of the time it took static, manual analysis.

Even more impressive was the fact that it was able to compile a thorough analysis into a report that is disclosure grade. This is due the fact that we set the baseline for a vulnerability report into our CLAUDE.md file.

To give a sense for what the agent findings were we provide a glimpse to the report it has created. In this sub-section you can see a vulnerability that we also discovered last year. This is a command injection vulnerability in an authenticated route of the web service.

As you can see in the report the agent provides a snippet from the decompilation. The snippet contains the code flow that calls a system command using a command that is formatted with attacker input. It also recognizes that there is input validation of the attacker input to align as an IP address string in the function FUN_0006bfd8, but also mentions it could be bypassed.

Wrapping Up

During this project the agent made significant progress in no time. In less than 10 minutes it was able to generate a thorough report and give solid evidence for security issues on the Zenitel platform.

One key insight regarding the approach taken by the agent was its ability to keep context along the way and not go down rabbit holes deviating from its purpose. This alone was also highlighted by the volume and variance in findings, from pure command injections to memory corruption bugs, and invalid system configurations.

What stands out most from this experiment is that it appears that given a firmware/software update from the vendor that is accessible to the public, an LLM agent would be able to do end-to-end vulnerability research with the right tooling and framing. This could bring about a lowering of the barrier of entry for discovering zero-day vulnerabilities. Meaning it is no longer about proficiency and expertise, but instead all that is needed is a copy of the target software.

LLM-based vulnerability research could soon turn the discipline vulnerability research into a commodity practice. Expect the first wave of this new branch of vulnerability discovery to focus on white-box targets such as open source software projects where full access and visibility into code is easily available. For a while, limitations such as firmware encryption or an inability to publicly access firmware will work in favor of enterprise security teams, but eventually even these limitations will be bypassed.

Stay in the know Get the Team82 Newsletter

Related Vulnerability Disclosures