Supporting interoperable probabilistic defense structures through shared attack eval datasets

As noted in various discussions (most recently in #97), it's unlikely that security challenges around WebMCP can be addressed purely through deterministic specification rules. Browser and/or agent developers will have to deploy custom probablistic defense architectures, such as improved model understanding, "critic" models that prevent misaligned actions, or sophisticated tool security architectures like [CaMeL](https://arxiv.org/pdf/2503.18813).

This risks being a departure from the traditional Web specification security workflow: New implementers of the specification would not be able to use some of the lessons learned by early adopters and the community in delivering a secure feature.

One thing we can (and should) do is write very good and precise security considerations. However, there are limits to that, particularly in that it is hard to quantify what e.g. "should be resistant to prompt injections" means in practice.

As such, I think this group should embrace the default approach by agent makers to quantify the performance of an agentic system: [evaluations](https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents). A robust dataset of prompt injection attacks that any implementer of WebMCP MUST (or SHOULD?) prevent with >99% recall would be a good and pragmatic step forward in interoperably enabling security for WebMCP.

Obviously there are a lot of open questions here around both maintenance and architecture for running these evals.

cc @jpagnucco @bvandersloot-mozilla @victorhuangwq 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting interoperable probabilistic defense structures through shared attack eval datasets #106

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Supporting interoperable probabilistic defense structures through shared attack eval datasets #106

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions