Hacker News
The Webpage Has Instructions. The Agent Has Your Credentials
redgridtactical
|next
[-]
The simplest mitigation is also the least popular one: don't give the agent credentials in the first place. Scope it to read-only where possible, and treat every page it visits as untrusted input. But that limits what agents can do, which is why nobody wants to hear it.
rocho
|root
|parent
|next
[-]
I wonder if it'd be possible to train an LLM with such architecture: one input for the instructions/conversation and one "data-only" input. Training would ensure that the latter isn't interpreted as instructions, although I'm not knowledgeable enough to understand if that's even theoretically possible: even if the inputs are initially separate, they eventually mix in the neural network. However, I imagine that training could be done with massive amounts of prompt injections in the "data-only" input to penalize execution of those instructions.
stainlu
|next
|previous
[-]
This is npm supply chain attacks but worse in one specific way: with npm you need arbitrary code execution. With MCP, the attack surface is the natural language itself. The model reads the description and follows it. No sandbox escape needed.
The article suggests pinning versions and signing tool descriptions, which is the right direction. But the ecosystem tooling isn't there yet. Most MCP registries have no signing, no auditing, and tool descriptions aren't even shown to users before the model ingests them.
0xbadcafebee
|next
|previous
[-]
stavros
|next
|previous
[-]
indigodaddy
|root
|parent
[-]
amelius
|root
|parent
|next
[-]
stavros
|root
|parent
|previous
[-]
indigodaddy
|root
|parent
[-]
I’d guess this is the type of thing that might actually excel in your agent or these claw clones, because they literally can just do whatever bash/tool type actions on whatever VM or sandboxed environment they live on?
petesergeant
|previous
[-]
Works great with OpenClaw, Claude Cowork, or anything, really