Network Security Cloud Security Application Security

Secure.com flags 21 flaws in AI pentest on live stacks

Thu, 30th Apr 2026

Secure.com has published research detailing 21 vulnerabilities found by an AI-driven pentesting pipeline across three live production stacks. Seven of the issues were critical.

The findings came from tests on a multi-tenant eCommerce marketplace, a generative AI imaging platform, and a consumer password manager. The automated system operated without human input over a weekend.

The Dubai-based cybersecurity company said the flaws were concentrated in familiar areas of weak security practice rather than obscure software defects. The issues included frontend runtime configuration data exposed on every page load, unauthenticated scheduler and admin endpoints, unauthenticated notification injection, cross-origin session theft across four backend APIs, a publicly reachable admin dashboard, and a full production environment exposed in a public JavaScript bundle.

Uzair Gatz, Chief Executive Officer of Secure.com, said the results showed AI-assisted offensive testing had entered a more accessible phase. "Every finding in this report is a category-one hygiene failure - the kind that has been on the OWASP Top 10 for a decade. No zero days. No novel chains. The floor of what publicly available, open-source AI tooling can now find, validate, and document automatically has shifted materially in 2026. Nation-state budgets and proprietary platforms are no longer the threshold. A weekend, an open model, and a 50-line agent framework are.

"A capable, production-grade AI pentesting agent is no longer gated behind an expensive commercial contract or a well-funded research lab. Any motivated actor can build one this afternoon."

Secure.com argued that the economics of testing are changing as automation cuts the cost and time needed to identify exploitable weaknesses. It estimated that continuous execution by an AI system could run at about USD $18 per hour, compared with traditional engagements that rely on skilled human testers and larger budgets.

Open-source shift

Secure.com pointed to a broader rise in open-source AI pentesting tools, citing independent research that identified more than 39 such agents across six architecture patterns as of April 2026. It highlighted growth in systems built around the Model Context Protocol, which can expose established security tools such as nmap, nuclei, Metasploit, and Burp to large language models, as well as tools designed to work directly with Claude Code.

According to the report, these systems have widened access to techniques that previously required specialist teams or expensive commercial software. Examples cited by Secure.com included HexStrike, AutoPentest-AI, Raptor, and Transilience Community Tools.

Multi-agent model

The research also argued that multi-agent designs have overtaken single-agent systems in practical testing work. Single-agent tools, it said, often lose track of earlier findings, misreport command outputs, and struggle with long context windows, especially when handling large data sets such as full network scans.

By contrast, a multi-agent structure separates planning, reconnaissance, exploitation, and reporting while maintaining shared memory between agents. Secure.com said this approach reduces the risk that some single-agent systems will invent successful command results and continue on false assumptions.

Benchmarks cited in the report included HPTSA, which it said outperformed single-agent baselines by 4.3 times on zero-day exploitation, and D-CIPHER, which it said achieved leading results across three published test sets. It also pointed to CHECKMATE, a system that combines a large language model with a classical planner for long-horizon tasks.

Public markers

Secure.com said evidence from public bug bounty and research programmes suggests AI systems are already producing tangible results beyond laboratory tests. It cited XBOW, which it said held the top position on HackerOne with more than 1,060 validated submissions, including a CVE-listed remote code execution flaw in a Microsoft system.

The report also referred to AISLE, which it said found 12 out of 12 CVEs in an OpenSSL release; the DARPA AIxCC competition, where finalist systems found 54 vulnerabilities across 54 million lines of code in four hours; and ARTEMIS, which compared AI and human pentesters on a live enterprise network. In that comparison, Secure.com said the AI system placed second and outperformed nine of 10 human testers while running at around USD $18 per hour.

It also drew a distinction between broad public access to open-source and commercial models and the tighter restrictions around more advanced cyber-focused systems. Anthropic, it noted, had limited access to its unreleased Mythos model through Project Glasswing to a small set of partners and critical infrastructure organisations.

Secure.com said its own findings were generated using Claude Opus 4.7 at standard API pricing, suggesting that meaningful offensive testing no longer depends on restricted frontier models. The key differentiator, it argued, is not simply detection but the ability to validate a finding with proof-of-concept work that confirms whether exposed credentials, buckets, or services can actually be reached and used.

One example in the report contrasted a conventional scanner alert with an agent-based workflow that confirms ownership and access, demonstrates write permissions, and then removes a marker file after verification.

For defenders, the report raises questions about whether periodic pentesting remains sufficient when the same techniques can be run repeatedly and cheaply by automated systems. For attackers, the barrier to entry appears to be falling as public tools become easier to assemble into working pipelines.

Across the three production environments examined in Secure.com's research, the company said it found weaknesses it characterised as basic and longstanding rather than rare or highly sophisticated targets for specialist exploitation.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google