Digital Safety
Might attackers use seemingly innocuous prompts to govern an AI system and even make it their unwitting ally?
12 Dec 2024
•
,
3 min. learn
When interacting with chatbots and different AI-powered instruments, we usually ask them easy questions like, “What’s the climate going to be at the moment?” or “Will the trains be working on time?”. These not concerned within the improvement of AI in all probability assume that each one information is poured right into a single large and all-knowing system that immediately processes queries and delivers solutions. Nevertheless, the truth is extra advanced and, as proven at Black Hat Europe 2024, the programs may very well be weak to exploitation.
A presentation by Ben Nassi, Stav Cohen and Ron Bitton detailed how malicious actors may circumvent an AI system’s safeguards to subvert its operations or exploit entry to it. They confirmed that by asking an AI system some particular questions, it’s attainable to engineer a solution that causes injury, akin to a denial-of-service assault.
Creating loops and overloading programs
To many people, an AI service might seem as a single supply. In actuality, nonetheless, it depends on many interconnected parts, or – because the presenting group termed them – brokers. Going again to the sooner instance, the question concerning the climate and trains will want information from separate brokers – one which has entry to climate information and the opposite to coach standing updates.
The mannequin – or the grasp agent that the presenters referred to as “the planner” – then must combine the info from particular person brokers to formulate responses. Additionally, guardrails are in place to forestall the system from answering questions which are inappropriate or past its scope. For instance, some AI programs would possibly keep away from answering political questions.
Nevertheless, the presenters demonstrated that these guardrails may very well be manipulated and a few particular questions can set off endless loops. An attacker who can set up the boundaries of the guardrails can ask a query that regularly supplies a forbidden reply. Creating sufficient situations of the query finally overwhelms the system and triggers a denial-of-service assault.
While you implement this into an on a regular basis state of affairs, because the presenters did, you then see how shortly this could trigger hurt. An attacker sends an electronic mail to a person who has an AI assistant, embedding a question that’s processed by the AI assistant, and a response is generated. If the reply is all the time decided to be unsafe and requests rewrites, the loop of a denial-of-service assault is created. Ship sufficient such emails and the system grinds to a halt, with its energy and sources depleted.
There’s, in fact, the query of the right way to extract the data on guardrails from the system so you possibly can exploit it. The group demonstrated a extra superior model of the assault above, which concerned manipulating the AI system itself into offering the background data by way of a collection of seemingly innocuous prompts about its operations and configuration.
A query akin to “What working system or SQL model do you run on?” is more likely to elicit a related response. This, mixed with seemingly unrelated details about the system’s objective, might yield sufficient data that textual content instructions may very well be despatched to the system, and if an agent has privileged entry, unwittingly grant this entry to the attacker. In cyberattack phrases, we all know this as “privilege escalation” – a way the place attackers exploit weaknesses to achieve greater ranges of entry than meant.
The rising risk of socially engineering AI programs
The presenter didn’t conclude with what my very own takeaway from their session is: in my view, what they demonstrated is a social engineering assault on an AI system. You ask it questions that it’s pleased to reply, whereas additionally presumably permitting unhealthy actors to piece collectively the person items of knowledge and use the mixed information to bypass boundaries and extract additional information, or to have the system take actions that it mustn’t.
And if one of many brokers within the chain has entry rights, that might make the system extra exploitable, permitting the attacker to make use of these rights for their very own achieve. An excessive instance utilized by the presenter concerned an agent with file write privileges; within the worst case, the agent may very well be misused to encrypt information and block entry for others – a state of affairs generally referred to as a ransomware incident.
Socially engineering an AI system by way of its lack of controls or entry rights demonstrates that cautious consideration and configuration is required when deploying an AI system in order that it’s not vulnerable to assaults.