[ad_1]
Even as OpenAI works to harden it Atlas AI browser against cyber attacks, the company admits quick injectionsa type of attack that manipulates AI agents into following malicious instructions often hidden in web pages or emails is a risk that won’t go away anytime soon. This raises questions about how safely AI agents can operate on the open web.
“A quick injection, like scams and social engineering on the Internet, is unlikely to ever be completely ‘solved,’” OpenAI wrote on Monday. blog post detailing how the company is strengthening Atlas’ armor to combat the incessant attacks. The company admitted that the “agent mode” in ChatGPT Atlas “expands the surface area of security threats.”
OpenAI launched its ChatGPT Atlas browser in October and security researchers rushed to publish their demos, showing that it was possible to write a few words in Google Docs that could change the behavior of the underlying browser. That same day, Brave published a blog post explaining that indirect prompt injection is a systematic challenge for AI-powered browsers, including The Comet of Perplexity.
OpenAI isn’t alone in recognizing that prompt-based injections aren’t going away. The The UK’s National Cyber Security Center warned earlier this month that prompt injection attacks against generative AI applications “may never be fully mitigated,” putting websites at risk of data breaches. The UK government agency advised cyber professionals to reduce the risk and impact of rapid injections, rather than thinking the attacks can be ‘stopped’.
On OpenAI’s part, the company said: « We view rapid injection as a long-term AI security challenge, and we will need to continually strengthen our defenses against it. »
The company’s answer to this Sisyphean task? A proactive, rapid response cycle that the company says holds promise in helping discover new attack strategies internally before exploiting them ‘in the wild’.
That’s not entirely different from what rivals like Anthropic and Google have said: that to combat the persistent risk of high-speed attacks, defenses must be layered and continually stress-tested. Google’s recent workfor example, focuses on architectural and policy level controls for agentic systems.
But where OpenAI takes a different tactic is with its “LLM-based automated attacker.” This attacker is actually a bot that OpenAI has trained using reinforcement learning to play the role of a hacker looking for ways to sneak malicious instructions to an AI agent.
The bot can test the attack in a simulation before using it in real life, and the simulator shows how the target AI would think and what actions it would take if it saw the attack. The bot can then study that response, adjust the attack, and try again and again. That insight into the target AI’s internal reasoning is something outsiders don’t have access to, so in theory OpenAI’s bot should be able to find bugs faster than a real attacker.
It’s a common tactic in AI safety testing: build an agent to find the edge cases and quickly test it in simulation.
« Us [reinforcement learning]A trained attacker can direct an agent to execute sophisticated, long-horizon malicious workflows that unfold over dozens (or even hundreds) of steps,” OpenAI wrote. “We also observed new attack strategies that did not appear in our human rescue teaming campaign or external reports.”

In a demo (partially pictured above), OpenAI showed how the automated attacker delivered a malicious email into a user’s inbox. When the AI agent later scanned the inbox, it followed the hidden instructions in the email and sent a dismissal message instead of composing an out-of-office message. But after the security update, « agent mode » was able to successfully detect the quick injection attempt and notify the user, the company said.
The company says that while rapid injection is difficult to secure in a foolproof manner, it is leaning on large-scale testing and faster patch cycles to harden its systems before they appear in real attacks.
An OpenAI spokesperson declined to say whether the Atlas security update has resulted in a measurable reduction in successful injections, but said the company has been working with third parties to protect Atlas from rapid injections even before launch.
Rami McCarthy, chief security researcher at cybersecurity company Wizsays reinforcement learning is a way to continually adapt to the behavior of attackers, but it’s only part of the picture.
“A useful way to reason about risk in AI systems is autonomy multiplied by access,” McCarthy told TechCrunch.
“Agent browsers tend to be in a challenging part of that space: moderate autonomy combined with very high access,” says McCarthy. « Many current recommendations reflect that trade-off. Restricting logged-in access primarily reduces exposure, while requiring review of confirmation requests limits autonomy. »
These are two of OpenAI’s recommendations for users to reduce their own risk, and a spokesperson said Atlas is also trained to get confirmation from the user before sending messages or making payments. OpenAI also suggests that users give agents specific instructions, rather than giving them access to your inbox and telling them to “take whatever action is necessary.”
“A wide latitude makes it easier for hidden or malicious content to influence the agent, even with security measures in place,” OpenAI said.
While OpenAI says protecting Atlas users from rapid injections is a top priority, McCarthy raises some skepticism about the return on investment for risk-sensitive browsers.
“For most everyday use cases, agentic browsers do not yet deliver enough value to justify their current risk profile,” McCarthy told TechCrunch. “The risk is high given their access to sensitive data like email and payment information, even though that access is also what makes them powerful. That balance will evolve, but today the tradeoffs are still very real.”
[ad_2]
Source link
Comments