-
Notifications
You must be signed in to change notification settings - Fork 178
Feature: Index custom YAML arrays per-element for metadata filtering #701
Description
Feature Request: Index custom YAML arrays per-element for metadata filtering
Summary
Built-in arrays (tags, type) are indexed per-element and filterable via --tag and --type. Custom YAML arrays in frontmatter store correctly but aren't queryable via --meta or metadata_filters.
Extend the per-element indexing that tags already has to all array fields in frontmatter.
Use Case
User has 1000+ notes with Picoschema-defined structures. Many natural data models use arrays (participants, assignees, dependencies, status history, etc.). Currently these require workarounds (numbered fields, flattening) that lose semantic clarity.
Current Behavior
Example: Simple string array
---
title: Meeting Notes
participants:
- Alice
- Bob
- Charlie
---Expected: metadata_filters={"participants": "Bob"} returns this note.
Actual: Returns no results. The array is stored but not indexed per-element.
Example: Array of objects
---
title: Project Tracker
tasks:
- name: Design review
status: done
priority: high
- name: Write tests
status: active
priority: medium
---Ideal (future): metadata_filters={"tasks.status": "active"} or element-level querying.
Understood: This is significantly more complex than simple contains.
Current Workaround
Numbered fields (task1_name: Design review, task1_status: done) and dot notation on nested objects. Works but loses natural array semantics and makes schemas verbose.
Proposed Solution
Phase 1: Simple arrays (parity with tags)
- Index string arrays per-element
metadata_filters={"participants": "Bob"}matches if "Bob" is in theparticipantsarray- Same behavior as
--tagalready provides
Phase 2 (optional): Array-of-objects
- Support dot-notation queries:
{"tasks.status": "active"} - More complex but enables rich structured data in frontmatter
Why This Matters
- Schemas (Picoschema) are making structured frontmatter more common
- Natural data models use arrays (meeting participants, task assignees, dependencies, contributors, etc.)
- Current workaround (numbered fields) is verbose and breaks schema expressiveness
tagsproves the indexing pattern already works — just needs extension to custom fields
Related
- Built-in array filtering already works:
--tag <value>,--type <value>(repeatable) - Custom arrays are stored correctly, just not indexed for filtering
- User is running production workload (1000+ notes) hitting this limitation
Priority
Medium-High — affects users building structured knowledge bases with schemas. Blocking more expressive data models in frontmatter.