Application Security DevSecOps Supply Chain

Cobalt study says automated AI tests miss key flaws

Mon, 29th Jun 2026 (Today)

Cobalt has published research finding that automated penetration testing tools often miss critical vulnerabilities in large language model applications. The study surveyed 455 cybersecurity professionals at organisations with more than 500 employees.

It found that 78% of security teams had experienced critical false negatives from automated scanning tools, while support for fully automated pentesting fell to 9% from 29% a year earlier. At the same time, 47% said they now prefer a hybrid model that combines automation with human testing.

The findings suggest growing scepticism about using automation alone to assess AI systems, particularly as companies expand their use of LLM-based products and services. The shift towards hybrid testing reflects the limits of tools that struggle to identify flaws tied to context, business logic and application design.

Security teams are not abandoning automation altogether. The research found that 47% favour automated testing for low-risk environments, up 22 percentage points from the previous year. That suggests companies are drawing a clearer line between routine checks and the more complex demands of AI security.

Higher risk

Cobalt linked that caution to data from its earlier State of Pentesting research, which found AI-related tests produce high-risk findings at a much higher rate than conventional software. According to that analysis, 32% of all AI-related pentest findings were classified as high risk, compared with 12% across software more broadly.

Remediation is also proving harder. Only 38% of LLM vulnerabilities had been fixed at the time of analysis, leaving 62% still open, the lowest resolution rate across the categories examined.

Mean time to resolve AI and LLM security issues rose to 36 days from 19 days the previous year. That suggests teams are spending longer on problems that are harder to diagnose and close, rather than dealing only with surface-level weaknesses.

The wider burden on security teams is also increasing. Some 82% of respondents said their organisations are dedicating significantly more effort to AI security initiatives, and 77% said they now conduct regular security assessments and pentests for AI-powered products, an 11-point rise from the previous year.

Attack vectors

Among organisations that had experienced confirmed AI-related security incidents, the research identified a range of attack paths rather than a single dominant weakness. Shadow AI was cited in 44% of incidents, followed by data or model poisoning and improper output handling, both at 41%.

Supply chain vulnerabilities accounted for 35% of incidents and prompt injection for 34%, rounding out the top five categories. The mix underlines how AI security concerns extend beyond model behaviour to employee use of unapproved tools, third-party dependencies and the way applications process model outputs.

Even so, the report suggests organisations are not yet matching that threat picture with larger investments in human-led testing. While 60% of respondents said they need stronger LLM testing, only 42% said they plan to increase human-led red-team operations.

That gap may prove significant as companies weigh how to test systems that can behave unpredictably in production and be manipulated through prompts, training data or integration layers. Automated scanners can identify known patterns, but the findings suggest many teams still need specialist testers to probe how AI applications behave in real-world settings.

The survey covered cybersecurity professionals working across software development, healthcare, financial and insurance services, information services and other sectors. Participants included both an externally recruited cohort and a small sample of Cobalt customers.

Comparative data came from separate surveys in 2025 and 2026 conducted by Emerald Research on Cobalt's behalf. The earlier study polled 450 security professionals and was split evenly between leadership roles and technical practitioners.

Andrew Obadiaru, Chief Information Security Officer at Cobalt, said the rise of more advanced automated systems would not remove the need for skilled human testers. "While the industry is rightfully excited about the potential of Mythos-class tools, unguided algorithms are inherently prone to returning even more false positives and costly false negatives than the automated scanners we have today," he said.

He said the structure of AI applications remains a central challenge for machine-only approaches. "LLM vulnerabilities are deeply context-dependent and invisible to tools that lack an architectural understanding of the application. To close the validation gap, automation should be deployed exactly where it excels, but elite human expertise remains foundational to uncovering and remediating the most complex business logic risks," Obadiaru said.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google

Image: Andrew Obadiaru