Google is stepping up its game to protect you from sneaky online attacks! They're adding new layers of security to Chrome, especially to combat a sneaky trick called 'indirect prompt injection.' This is where bad guys try to trick AI into doing things it shouldn't, all through the websites you visit.
So, what's Google doing about it?
First up, they've introduced the User Alignment Critic. Think of it as a second pair of eyes that independently checks the AI's work. It makes sure the AI's actions align with your goals, not the hidden agendas of malicious websites. This is a crucial step because it isolates the 'critic' from the untrusted web content, preventing it from being tricked by those sneaky prompts.
But here's where it gets controversial... Google's approach complements existing techniques, such as spotlighting, which is designed to keep the AI focused on your instructions.
The User Alignment Critic runs after the planning is complete to double-check each proposed action. Its primary focus is task alignment: determining whether the proposed action serves the user's stated goal. If the action is misaligned, the Alignment Critic will veto it.
Next, Google is enforcing Agent Origin Sets. This is all about controlling where the AI can go and what it can access. The AI only gets to play with data from websites that are relevant to the task at hand or those you've specifically told it to use. It's like giving the AI a specific set of tools for a job, rather than letting it rummage through the entire toolbox.
This is implemented using a gating function that sorts origins into two categories:
- Read-only origins: Where the AI can get information.
- Read-writable origins: Where the AI can type or click.
The gating function ensures that the AI sticks to its designated areas, reducing the risk of data leaks.
This delineation enforces that only data from a limited set of origins is available to the agent, and this data can only be passed on to the writable origins. This bounds the threat vector of cross-origin data leaks.
Transparency and user control are also key. The AI will keep a log of its actions, and it will ask for your permission before doing anything sensitive, like visiting your bank or making a purchase.
Finally, the AI will also actively check each page for prompt injections and work with existing security features to block suspicious content.
This prompt-injection classifier runs in parallel to the planning model's inference, and will prevent actions from being taken based on content that the classifier determined has intentionally targeted the model to do something unaligned with the user's goal.
Want to test the system? Google is offering up to $20,000 for anyone who can find a way to break through these new defenses. They're particularly interested in finding exploits that let attackers:
- Make the AI do things without your permission.
- Steal your data.
- Get around the security measures.
But here's the kicker: The U.S. National Cyber Security Centre (NCSC) has warned that prompt injection is a persistent problem for large language models (LLMs). They believe that these security measures are not enough.
So, what do you think? Are these new security features enough to keep you safe, or is there more that Google and other companies need to do? Share your thoughts in the comments below!