Open Source Agent Alignment: Make your agents follow rules. One line of code to enforce, trace, and improve.
-
Updated
Apr 26, 2026 - Python
Open Source Agent Alignment: Make your agents follow rules. One line of code to enforce, trace, and improve.
Review and moderation, your way. Online safety dashboard, queues, routing and automatic enforcement rules, and integrations.
🛡️ Programmable Guardrails for LLM Applications in Java. A framework-agnostic toolkit for input/output validation, PII masking, and jailbreak detection. The Java alternative to NVIDIA NeMo Guardrails.
A JavaScript-based content safety system designed to detect and filter sensitive media in real-time, ensuring platform compliance and user protection.
An intelligent task management assistant built with .NET, Next.js, Microsoft Agent Framework, AG-UI protocol, and Azure OpenAI, demonstrating Clean Architecture and autonomous AI agent capabilities
Step-by-Step tutorial that teaches you how to use Azure Safety Content - the prebuilt AI service that helps ensure that content sent to user is filtered to safeguard them from risky or undesirable outcomes
Transform uncertainty into absolute confidence.
🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights and improved model defenses against jailbreak attempts.
│ Real-time NSFW & harmful content detection as a service
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
AI application firewall for LLM-powered apps — multi-layered detection (heuristic, ML classifier, semantic, LLM-judge) against prompt injection, jailbreaks, and data leakage - inferwall.com
Technical presentations with hands-on demos
Production-Grade LLM Alignment Engine (TruthProbe + ADT)
Arabic Content Moderator — scan text for toxicity, hate speech, spam. Dialect-aware. Fully offline.
A Chrome extension that uses Claude AI to protect users under 18 from inappropriate content by analyzing webpage content in real-time.
Content moderation (text and image) in a social network demo
Responsible AI toolkit for LLM applications: PII/PHI redaction, prompt injection detection, bias scoring, content safety filters, and output validation. Framework-agnostic Python library with FastAPI demo.
Pre-Publish Security Gate - Scan and redact sensitive information before sharing
The open-source safety stack for AI agents. Policy engine, content scanner, approval workflows, audit trails. 924+ tests. MIT licensed.
抖音视频审核检测|同行举报分析工具|抖音视频风控|抖音风控||优化视频|举报同行|视频监测|视频检测
Add a description, image, and links to the content-safety topic page so that developers can more easily learn about it.
To associate your repository with the content-safety topic, visit your repo's landing page and select "manage topics."