Standards for building agents, better
-
Updated
Feb 22, 2026 - TypeScript
Standards for building agents, better
Agentic testing for agentic codebases
Ship agents you can audit.
The pre-flight check for AI agents
Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks
AI Agent Security Testing — 112 attacks across 14 categories. Prompt injection, jailbreaks, MCP poisoning, agency hijacking & more. Test any AI agent in 5 minutes.
Agent testing automation 🤖 by simulating users 👥 and agents 🤝 with judge ⚖️(langwatch-scenario)
𝘈 𝘔𝘶𝘭𝘵𝘪-𝘈𝘨𝘦𝘯𝘵 𝘚𝘺𝘴𝘵𝘦𝘮 𝘧𝘰𝘳 𝘊𝘳𝘰𝘴𝘴-𝘊𝘩𝘦𝘤𝘬𝘪𝘯𝘨 𝘗𝘩𝘪𝘴𝘩𝘪𝘯𝘨 𝘜𝘙𝘓𝘴.
Holdout scenario evaluation harness for AI agents. Doer/Judge/Adversary/Observer roles, probabilistic satisfaction scoring, append-only JSONL audit trails with integrity hashes. Created Dec 2025.
AI Agent Evaluation and Monitoring Guide
Open-source agent simulation and runtime control platform for Claude Code
PHP testing framework for LLM agents — multi-turn dialogs, cassette replay, tool calling, LLM-as-judge assertions
Regression and evaluation toolkit for prompt and agent output quality
Demonstration of testing and evaluation patterns for AI agents using Azure AI evaluation tools with custom evaluators
🧮 Solve mathematical problems and write proofs in natural language using this easy-to-use reasoning harness. Enhance your problem-solving skills effortlessly.
Add a description, image, and links to the agent-testing topic page so that developers can more easily learn about it.
To associate your repository with the agent-testing topic, visit your repo's landing page and select "manage topics."