-
Notifications
You must be signed in to change notification settings - Fork 101
Description
As noted in various discussions (most recently in #97), it's unlikely that security challenges around WebMCP can be addressed purely through deterministic specification rules. Browser and/or agent developers will have to deploy custom probablistic defense architectures, such as improved model understanding, "critic" models that prevent misaligned actions, or sophisticated tool security architectures like CaMeL.
This risks being a departure from the traditional Web specification security workflow: New implementers of the specification would not be able to use some of the lessons learned by early adopters and the community in delivering a secure feature.
One thing we can (and should) do is write very good and precise security considerations. However, there are limits to that, particularly in that it is hard to quantify what e.g. "should be resistant to prompt injections" means in practice.
As such, I think this group should embrace the default approach by agent makers to quantify the performance of an agentic system: evaluations. A robust dataset of prompt injection attacks that any implementer of WebMCP MUST (or SHOULD?) prevent with >99% recall would be a good and pragmatic step forward in interoperably enabling security for WebMCP.
Obviously there are a lot of open questions here around both maintenance and architecture for running these evals.