ai-evals

Star

Here are 18 public repositories matching this topic...

solana8800 / langeval

Sponsor

Star

Evaluation Infrastructure for AI Agents

ai-evaluation agent-evaluation ai-evals

Updated Feb 25, 2026
TypeScript

yiouli / pixie-qa

Star

Agent skill for AI agent development

skill dev eval llm agent-skills ai-evals

Updated Apr 22, 2026
HTML

productfoundry101 / ai-evals-bootcamp

Star

A hands-on interactive AI-evals course for product folks, who want to develop product sense based on real life applications.

bootcamp red-teaming rag prompt-engineering llmops ai-product-management llm-evaluation claude-code ai-pm ai-evals

Updated Apr 24, 2026

RafaelParonis / jailbench

Star

🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights and improved model defenses against jailbreak attempts.

python flask analytics openai alignment model-evaluation ai-safety security-testing red-teaming model-robustness anthropic litellm content-safety llm-jailbreaks tool-calling llm-benchmark ai-evals textual-tui

Updated Apr 25, 2026
Python

vibheksoni / jailbench

Star

Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.

Updated Aug 12, 2025
Python

vitron-ai / aip-foundry-themis-starter

Star

Unofficial TypeScript starter for deterministic local contract testing around Foundry-oriented workflows with Themis.

react typescript schema-validation themis contract-testing osdk developer-tooling agentic-workflows ai-evals foundry-workflows

Updated Mar 28, 2026
TypeScript

MohsinCreed / LangfuseOllama

Star

Free, local Langfuse OSS setup with Ollama for LLM evaluation, scoring, and datasets.

docker open-source self-hosted free no-cost local-llm ollama langfuse llm-evaluation prompt-evaluation offline-ai llm-as-judge llm-observability ai-evals

Updated Apr 13, 2026
TypeScript

SuperfiedStudd / ai-evals-orchestration

Star

End-to-end AI evals orchestration platform for comparing LLM outputs across providers with transcription, structured logging, human review, and Supabase-backed decision tracking.

gemini openai multi-model transcription human-in-the-loop model-comparison supabase anthropic llm-evaluation ai-evals evaluation-pipeline

Updated Mar 10, 2026
TypeScript

vishal-labade / llm_exp_platform_v2

Star

Experimentation framework for LLM systems using simulated users, conversational behavioral metrics, and causal inference to evaluate prompt strategies, temperature, and model scaling.

experimentation causal-inference product-analytics llm-evaluation llm-benchmarking ai-evals

Updated Mar 8, 2026
Python

vineethcv / eval-engine

Star

Lightweight eval framework for LLMs & AI apps combining deterministic scoring, LLM-as-judge, and regression testing.

python testing evaluation openai llm ai-quality evals ai-evals ai-quality-assurance

Updated Apr 8, 2026
Python

Abhi-2016 / agentic-ai-learning

Star

Hands-on Agentic AI learning project — ReAct agents, memory systems, evals, and multi-agent architecture. Built as a structured AI PM curriculum.

python product-management react-pattern ai-agent llm research-automation anthropic llm-as-judge agentic-ai ai-evals

Updated Apr 22, 2026
Python

RamyaLakshmiKS / agentic_software_team

Star

Multi-agent system orchestrating an AI-driven software team using the Claude Agents SDK. Agents take on defined roles and collaborate autonomously on software tasks.

jira ai orchestration multi-agent confluence atlassian llm generative-ai anthropic llm-agents agentic-ai mcp-server ai-evals claude-agent-sdk

Updated Feb 4, 2026
Python

AlejandroFuentePinero / ai-jie

Star

An AI-powered tool for data extration designed for the Job Intelligence Engine project

structured-output pydantic ayncio prompt-engineering ai-evals

Updated Apr 9, 2026
Python

danielrosehill / Awesome-AI-Evaluations-Tools

Star

Collection of frameworks and tools for AI evalations, including tool-use, agentic AI, MCP, and multimodal

evaluations evals ai-evals

Updated Apr 25, 2026
Python

EvalLoop is a self-improving agent that iterates on its own outputs using evals + automatic policy patches. It runs a task, scores the result against a rubric, updates its rules, and re-runs until it hits a target score with a UI showing attempts, score trends, violations, and policy diffs.

typescript nextjs agents prisma self-improving agentic-ai ai-evals

Updated Mar 17, 2026
TypeScript

7ahir / project-forge

Star

Portfolio project showing product strategy, evals, roadmap, and GTM for a GenAI coding assistant

developer-tools product-management portfolio-project product-strategy llm genai ai-coding-assistant ai-evals

Updated Apr 2, 2026

peterderdak / evalgate

Star

CLI release gate for structured AI changes.

cli json-schema regression-testing prompt-testing llm-evals ai-evals

Updated Mar 12, 2026
TypeScript

krissiesmudgy575 / aip-foundry-themis-starter

Star

Build deterministic local contract tests for Foundry workflows with TypeScript, schema validation, telemetry, and proof artifact export

react typescript schema-validation themis contract-testing osdk developer-tooling agentic-workflows ai-evals foundry-workflows

Updated Apr 25, 2026
TypeScript

Improve this page

Add a description, image, and links to the ai-evals topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-evals topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-evals

Here are 18 public repositories matching this topic...

solana8800 / langeval

yiouli / pixie-qa

productfoundry101 / ai-evals-bootcamp

RafaelParonis / jailbench

vibheksoni / jailbench

vitron-ai / aip-foundry-themis-starter

MohsinCreed / LangfuseOllama

SuperfiedStudd / ai-evals-orchestration

vishal-labade / llm_exp_platform_v2

vineethcv / eval-engine

Abhi-2016 / agentic-ai-learning

RamyaLakshmiKS / agentic_software_team

AlejandroFuentePinero / ai-jie

danielrosehill / Awesome-AI-Evaluations-Tools

khushi491 / evalloop

7ahir / project-forge

peterderdak / evalgate

krissiesmudgy575 / aip-foundry-themis-starter

Improve this page

Add this topic to your repo