Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5152038
fix: make env_config import resilient for Aegis installed location
solderzzc Mar 8, 2026
3ce975b
feat: add smoke_test to SKILL.md, default confidence → 0.8
solderzzc Mar 8, 2026
a06f3a6
fix: remove unused smoke_test from SKILL.md — Aegis SkillHandler has …
solderzzc Mar 8, 2026
9787dc7
fix: deploy.sh uses bundled scripts/env_config.py instead of repo-rel…
solderzzc Mar 8, 2026
fa12cbb
Merge branch 'master' into develop
solderzzc Mar 8, 2026
9ad4307
fix: ship env_config.py inside skill scripts/ dir — not just in repo …
solderzzc Mar 8, 2026
a32749a
feat: add version requirements to SKILL.md (python >=3.9, ultralytics…
solderzzc Mar 8, 2026
80d1efc
feat: add version requirements to benchmark SKILL.md (node >=18, npm_…
solderzzc Mar 8, 2026
7d3e7a3
feat(benchmark): add Indoor Safety Hazards VLM suite (12 tests)
solderzzc Mar 8, 2026
9b81316
feat: add SmartHome-Bench video anomaly detection skill
solderzzc Mar 8, 2026
7cb3660
feat: add HomeSafe-Bench skill + disk space checks
solderzzc Mar 8, 2026
f316143
feat(homesafe-bench): complete 40/40 fixture frames
solderzzc Mar 8, 2026
44d979d
fix: pin numpy<2.0.0 for CoreML export compatibility
solderzzc Mar 8, 2026
3af15f4
Merge pull request #138 from SharpAI/feature/smarthome-bench
solderzzc Mar 8, 2026
0f9ec05
chore: remove unimplemented skills, add homesafe-bench to README
solderzzc Mar 8, 2026
26aef50
feat(env_config): fix ROCm GPU detection for ROCm 7.2+
solderzzc Mar 8, 2026
a6798d1
Merge pull request #139 from SharpAI/feature/rocm-gpu-detection
solderzzc Mar 8, 2026
eaefade
fix(env_config): graceful CPU fallback when ROCm PyTorch unavailable
solderzzc Mar 8, 2026
fd87c7f
fix(env_config): proactive torch.cuda guard for ROCm PyTorch fallback
solderzzc Mar 9, 2026
dacf7cb
fix(deploy): prevent ultralytics from re-installing CPU onnxruntime
solderzzc Mar 9, 2026
bbd5db6
fix(deploy): simplify ROCm install — correct packages from the start
solderzzc Mar 9, 2026
58f3d54
fix(deploy): auto-detect installed ROCm version for PyTorch index
solderzzc Mar 9, 2026
2d32d52
fix(deploy): fallback through ROCm versions for PyTorch wheels
solderzzc Mar 9, 2026
28aede1
fix(rocm): use PyTorch+HIP for inference instead of ONNX
solderzzc Mar 9, 2026
385e692
Merge pull request #140 from SharpAI/feature/rocm-gpu-detection
solderzzc Mar 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,9 @@ Each skill is a self-contained module with its own model, parameters, and [commu
| Category | Skill | What It Does | Status |
|----------|-------|--------------|:------:|
| **Detection** | [`yolo-detection-2026`](skills/detection/yolo-detection-2026/) | Real-time 80+ class detection — auto-accelerated via TensorRT / CoreML / OpenVINO / ONNX | ✅|
| | [`dinov3-grounding`](skills/detection/dinov3-grounding/) | Open-vocabulary detection — describe what to find | 📐 |
| | [`person-recognition`](skills/detection/person-recognition/) | Re-identify individuals across cameras | 📐 |
| **Analysis** | [`home-security-benchmark`](skills/analysis/home-security-benchmark/) | [131-test evaluation suite](#-homesec-bench--how-secure-is-your-local-ai) for LLM & VLM security performance | ✅ |
| | [`vlm-scene-analysis`](skills/analysis/vlm-scene-analysis/) | Describe what happened in recorded clips | 📐 |
| **Analysis** | [`home-security-benchmark`](skills/analysis/home-security-benchmark/) | [143-test evaluation suite](#-homesec-bench--how-secure-is-your-local-ai) for LLM & VLM security performance | ✅ |
| | [`smarthome-bench`](skills/analysis/smarthome-bench/) | Video anomaly detection benchmark — 105 clips across 7 smart home categories | ✅ |
| | [`homesafe-bench`](skills/analysis/homesafe-bench/) | Indoor safety hazard detection — 40 tests across 5 categories | ✅ |
| | [`sam2-segmentation`](skills/analysis/sam2-segmentation/) | Click-to-segment with pixel-perfect masks | 📐 |
| **Transformation** | [`depth-estimation`](skills/transformation/depth-estimation/) | Monocular depth maps with Depth Anything v2 | 📐 |
| **Annotation** | [`dataset-annotation`](skills/annotation/dataset-annotation/) | AI-assisted labeling → COCO export | 📐 |
Expand Down Expand Up @@ -140,7 +139,7 @@ Camera → Frame Governor → detect.py (JSONL) → Aegis IPC → Live Overlay

## 📊 HomeSec-Bench — How Secure Is Your Local AI?

**HomeSec-Bench** is a 131-test security benchmark that measures how well your local AI performs as a security guard. It tests what matters: Can it detect a person in fog? Classify a break-in vs. a delivery? Resist prompt injection? Route alerts correctly at 3 AM?
**HomeSec-Bench** is a 143-test security benchmark that measures how well your local AI performs as a security guard. It tests what matters: Can it detect a person in fog? Classify a break-in vs. a delivery? Resist prompt injection? Route alerts correctly at 3 AM?

Run it on your own hardware to know exactly where your setup stands.

Expand Down
69 changes: 69 additions & 0 deletions skills.json
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,75 @@
"medium",
"large"
]
},
{
"id": "smarthome-bench",
"name": "SmartHome Video Anomaly Benchmark",
"description": "VLM evaluation suite for video anomaly detection in smart home camera footage — 7 categories, 105 curated clips from SmartHome-Bench.",
"version": "1.0.0",
"category": "analysis",
"path": "skills/analysis/smarthome-bench",
"tags": [
"benchmark",
"vlm",
"video",
"anomaly-detection",
"smart-home"
],
"platforms": [
"linux-x64",
"linux-arm64",
"darwin-arm64",
"darwin-x64",
"win-x64"
],
"requirements": {
"node": ">=18",
"ram_gb": 2,
"system_deps": [
"yt-dlp",
"ffmpeg"
]
},
"capabilities": [
"benchmark",
"report_generation"
],
"ui_unlocks": [
"benchmark_report"
]
},
{
"id": "homesafe-bench",
"name": "HomeSafe Indoor Safety Benchmark",
"description": "VLM evaluation suite for indoor home safety hazard detection — 40 tests across 5 categories: fire/smoke, electrical, trip/fall, child safety, falling objects.",
"version": "1.0.0",
"category": "analysis",
"path": "skills/analysis/homesafe-bench",
"tags": [
"benchmark",
"vlm",
"safety",
"hazard",
"indoor"
],
"platforms": [
"linux-x64",
"linux-arm64",
"darwin-arm64",
"darwin-x64",
"win-x64"
],
"requirements": {
"node": ">=18",
"ram_gb": 2
},
"capabilities": [
"benchmark"
],
"ui_unlocks": [
"benchmark_report"
]
}
]
}
17 changes: 11 additions & 6 deletions skills/analysis/home-security-benchmark/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
---
name: Home Security AI Benchmark
description: LLM & VLM evaluation suite for home security AI applications
version: 2.0.0
version: 2.1.0
category: analysis
runtime: node
entry: scripts/run-benchmark.cjs
install: npm

requirements:
node: ">=18"
npm_install: true
platforms: ["linux", "macos", "windows"]
---

# Home Security AI Benchmark

Comprehensive benchmark suite evaluating LLM and VLM models on **131 tests** across **16 suites** — context preprocessing, tool use, security classification, prompt injection resistance, alert routing, knowledge injection, VLM-to-alert triage, and scene analysis.
Comprehensive benchmark suite evaluating LLM and VLM models on **143 tests** across **16 suites** — context preprocessing, tool use, security classification, prompt injection resistance, alert routing, knowledge injection, VLM-to-alert triage, and scene analysis.

## Setup

Expand Down Expand Up @@ -71,7 +76,7 @@ This skill includes a [`config.yaml`](config.yaml) that defines user-configurabl

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `mode` | select | `llm` | Which suites to run: `llm` (96 tests), `vlm` (35 tests), or `full` (131 tests) |
| `mode` | select | `llm` | Which suites to run: `llm` (96 tests), `vlm` (47 tests), or `full` (143 tests) |
| `noOpen` | boolean | `false` | Skip auto-opening the HTML report in browser |

Platform parameters like `AEGIS_GATEWAY_URL` and `AEGIS_VLM_URL` are auto-injected by Aegis — they are **not** in `config.yaml`. See [Aegis Skill Platform Parameters](../../../docs/skill-params.md) for the full platform contract.
Expand Down Expand Up @@ -107,7 +112,7 @@ AEGIS_SKILL_PARAMS={}

Human-readable output goes to **stderr** (visible in Aegis console tab).

## Test Suites (131 Tests)
## Test Suites (143 Tests)

| Suite | Tests | Domain |
|-------|-------|--------|
Expand All @@ -126,7 +131,7 @@ Human-readable output goes to **stderr** (visible in Aegis console tab).
| Alert Routing & Subscription | 5 | Channel targeting, schedule CRUD |
| Knowledge Injection to Dialog | 5 | KI-personalized responses |
| VLM-to-Alert Triage | 5 | Urgency classification from VLM |
| VLM Scene Analysis | 35 | Frame entity detection & description |
| VLM Scene Analysis | 47 | Frame entity detection & description (outdoor + indoor safety) |

## Results

Expand All @@ -137,4 +142,4 @@ Results are saved to `~/.aegis-ai/benchmarks/` as JSON. An HTML report with cros
- Node.js ≥ 18
- `npm install` (for `openai` SDK dependency)
- Running LLM server (llama-server, OpenAI API, or any OpenAI-compatible endpoint)
- Optional: Running VLM server for scene analysis tests (35 tests)
- Optional: Running VLM server for scene analysis tests (47 tests)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
62 changes: 62 additions & 0 deletions skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -1704,6 +1704,68 @@ suite('📸 VLM Scene Analysis', async () => {
prompt: 'Describe this outdoor area. Are there any people present? What objects are visible?',
expect: ['patio', 'furniture', 'table', 'chair', 'grill', 'empty', 'no one', 'no people']
},

// Category E: Indoor Safety Hazards (12)
{
name: 'Stove smoke → kitchen fire hazard', file: 'indoor_fire_stove.png',
prompt: 'Describe this indoor security camera scene. Are there any fire or smoke hazards visible?',
expect: ['smoke', 'fire', 'stove', 'kitchen', 'cook', 'pot', 'steam']
},
{
name: 'Candle near curtain → fire risk', file: 'indoor_fire_candle.png',
prompt: 'Describe this indoor scene. Is there any fire risk from open flames near flammable materials?',
expect: ['candle', 'fire', 'curtain', 'flame', 'drape', 'fabric', 'risk']
},
{
name: 'Overloaded power strip → electrical hazard', file: 'indoor_elec_powerstrip.png',
prompt: 'Describe what you see. Are there any electrical safety hazards?',
expect: ['overload', 'power', 'electrical', 'plug', 'strip', 'cable', 'cord', 'outlet']
},
{
name: 'Frayed cord → electrical fire risk', file: 'indoor_elec_cord.png',
prompt: 'Describe the condition of the electrical cord. Is there any damage that could be dangerous?',
expect: ['fray', 'cord', 'damage', 'wire', 'worn', 'exposed', 'cable']
},
{
name: 'Toys on stairs → trip hazard', file: 'indoor_trip_stairs.png',
prompt: 'Describe this scene. Are there any trip or fall hazards on the staircase?',
expect: ['toy', 'stair', 'trip', 'hazard', 'ball', 'fall', 'step']
},
{
name: 'Wet floor → slip hazard', file: 'indoor_trip_wetfloor.png',
prompt: 'Describe the floor condition in this scene. Is there any slip hazard?',
expect: ['wet', 'slip', 'water', 'floor', 'puddle', 'spill']
},
{
name: 'Person on floor → medical emergency', file: 'indoor_fall_person.png',
prompt: 'Describe what you see. Is anyone in distress or in need of medical help?',
expect: ['person', 'fall', 'lying', 'floor', 'down', 'help', 'cane', 'elder']
},
{
name: 'Open cabinet chemicals → child safety', file: 'indoor_child_cabinet.png',
prompt: 'Describe this kitchen scene. Are there any child safety concerns with accessible chemicals?',
expect: ['cabinet', 'chemical', 'clean', 'open', 'bottle', 'danger', 'safety']
},
{
name: 'Cluttered exit → blocked fire exit', file: 'indoor_blocked_exit.png',
prompt: 'Describe this scene. Is the exit or doorway clear or obstructed?',
expect: ['block', 'exit', 'clutter', 'door', 'box', 'obstruct', 'furniture']
},
{
name: 'Space heater near drape → fire ignition risk', file: 'indoor_fire_heater.png',
prompt: 'Describe this bedroom scene. Is the space heater positioned safely?',
expect: ['heater', 'drape', 'fire', 'curtain', 'close', 'fabric', 'risk']
},
{
name: 'Items on high shelf → falling object risk', file: 'indoor_fall_shelf.png',
prompt: 'Describe the shelf and items on it. Are there any falling object hazards?',
expect: ['shelf', 'fall', 'heavy', 'unstable', 'box', 'stack', 'top']
},
{
name: 'Iron left face-down → burn/fire risk', file: 'indoor_fire_iron.png',
prompt: 'Describe this laundry scene. Is the iron being used safely?',
expect: ['iron', 'burn', 'fire', 'left', 'hot', 'steam', 'unattended', 'board']
},
];

// ─── Run all VLM tests ──────────────────────────────────────────────
Expand Down
123 changes: 123 additions & 0 deletions skills/analysis/homesafe-bench/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
name: HomeSafe-Bench
description: VLM indoor safety hazard detection benchmark inspired by HomeSafeBench (arXiv 2509.23690)
version: 1.0.0
category: analysis
runtime: node
entry: scripts/run-benchmark.cjs
install: npm

requirements:
node: ">=18"
npm_install: true
platforms: ["linux", "macos", "windows"]
---

# HomeSafe-Bench — Indoor Safety Hazard Detection

VLM benchmark evaluating indoor home safety hazard detection from static camera frames. Inspired by [HomeSafeBench (arXiv 2509.23690)](https://arxiv.org/abs/2509.23690), adapted for **fixed indoor security cameras** with AI-generated fixture images.

**40 tests** across **5 hazard categories**:

| Category | Tests | Example Hazards |
|----------|-------|-----------------|
| 🔥 Fire / Smoke | 8 | Stove smoke, candle near curtain, iron left face-down |
| ⚡ Electrical | 8 | Overloaded power strip, exposed wire, frayed cord |
| 🚶 Trip / Fall | 8 | Toys on staircase, wet floor, loose rug edge |
| 👶 Child Safety | 8 | Open cabinet with chemicals, uncovered outlet, blind cord |
| 📦 Falling Objects | 8 | Heavy items on high shelf, leaning bookshelf, loose ceiling fan |

## Setup

**Requires `npm install`.** This skill has a `package.json` with dependencies (e.g. `openai` SDK). The deployment agent should check for `package.json` and run `npm install` in the skill directory before launching.

Entry script: `scripts/run-benchmark.cjs`

### Verification

```bash
node scripts/run-benchmark.cjs --help
```

## Quick Start

### As an Aegis Skill (automatic)

When spawned by Aegis, all configuration is injected via environment variables. The benchmark discovers your VLM server automatically, generates an HTML report, and opens it when complete.

### Standalone

```bash
# Run all 40 tests
node scripts/run-benchmark.cjs --vlm http://localhost:5405

# Quick mode (2 tests per category = 10 total)
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --mode quick

# Skip report auto-open
node scripts/run-benchmark.cjs --vlm http://localhost:5405 --no-open
```

## Configuration

### Environment Variables (set by Aegis)

| Variable | Default | Description |
|----------|---------|-------------|
| `AEGIS_VLM_URL` | *(required)* | VLM server base URL |
| `AEGIS_VLM_MODEL` | — | Loaded VLM model ID |
| `AEGIS_SKILL_ID` | — | Skill identifier (enables skill mode) |
| `AEGIS_SKILL_PARAMS` | `{}` | JSON params from skill config |

> **Note**: URLs should be base URLs (e.g. `http://localhost:5405`). The benchmark appends `/v1/chat/completions` automatically.

### User Configuration (config.yaml)

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `mode` | select | `full` | Which mode: `full` (40 tests) or `quick` (10 tests — 2 per category) |
| `noOpen` | boolean | `false` | Skip auto-opening the HTML report in browser |

### CLI Arguments (standalone fallback)

| Argument | Default | Description |
|----------|---------|-------------|
| `--vlm URL` | *(required)* | VLM server base URL |
| `--mode MODE` | `full` | Test mode: `full` or `quick` |
| `--out DIR` | `~/.aegis-ai/homesafe-benchmarks` | Results directory |
| `--no-open` | — | Don't auto-open report in browser |

## Protocol

### Aegis → Skill (env vars)
```
AEGIS_VLM_URL=http://localhost:5405
AEGIS_SKILL_ID=homesafe-bench
AEGIS_SKILL_PARAMS={}
```

### Skill → Aegis (stdout, JSON lines)
```jsonl
{"event": "ready", "vlm": "SmolVLM-500M", "system": "Apple M3"}
{"event": "suite_start", "suite": "🔥 Fire / Smoke"}
{"event": "test_result", "suite": "...", "test": "...", "status": "pass", "timeMs": 4500}
{"event": "suite_end", "suite": "...", "passed": 7, "failed": 1}
{"event": "complete", "passed": 36, "total": 40, "timeMs": 180000, "reportPath": "/path/to/report.html"}
```

Human-readable output goes to **stderr** (visible in Aegis console tab).

## Citation

This benchmark is inspired by:

> **HomeSafeBench: Towards Measuring the Proficiency of Home Safety for Embodied AI Agents**
> arXiv:2509.23690
>
> Unlike the academic benchmark (embodied agent + navigation in simulated 3D environments), our version uses **static indoor camera frames** — matching real-world indoor security camera deployment (fixed wall/ceiling mount). All fixture images are **AI-generated** consistent with DeepCamera's privacy-first approach.

## Requirements

- Node.js ≥ 18
- `npm install` (for `openai` SDK dependency)
- Running VLM server (llama-server with vision model, or OpenAI-compatible VLM endpoint)
13 changes: 13 additions & 0 deletions skills/analysis/homesafe-bench/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
params:
- key: mode
label: Test Mode
type: select
options: [full, quick]
default: full
description: "Which test mode: full (40 tests) or quick (10 tests — 2 per category)"

- key: noOpen
label: Don't auto-open report
type: boolean
default: false
description: Skip opening the HTML report in browser after completion
8 changes: 8 additions & 0 deletions skills/analysis/homesafe-bench/deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash
# HomeSafe-Bench deployment script
# Runs npm install to fetch openai SDK dependency

set -e
cd "$(dirname "$0")"
npm install
echo "✅ HomeSafe-Bench dependencies installed"
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading