Skip to content

Feature: Index custom YAML arrays per-element for metadata filtering #701

@bm-clawd

Description

@bm-clawd

Feature Request: Index custom YAML arrays per-element for metadata filtering

Summary

Built-in arrays (tags, type) are indexed per-element and filterable via --tag and --type. Custom YAML arrays in frontmatter store correctly but aren't queryable via --meta or metadata_filters.

Extend the per-element indexing that tags already has to all array fields in frontmatter.

Use Case

User has 1000+ notes with Picoschema-defined structures. Many natural data models use arrays (participants, assignees, dependencies, status history, etc.). Currently these require workarounds (numbered fields, flattening) that lose semantic clarity.

Current Behavior

Example: Simple string array

---
title: Meeting Notes
participants:
  - Alice
  - Bob
  - Charlie
---

Expected: metadata_filters={"participants": "Bob"} returns this note.
Actual: Returns no results. The array is stored but not indexed per-element.

Example: Array of objects

---
title: Project Tracker
tasks:
  - name: Design review
    status: done
    priority: high
  - name: Write tests
    status: active
    priority: medium
---

Ideal (future): metadata_filters={"tasks.status": "active"} or element-level querying.
Understood: This is significantly more complex than simple contains.

Current Workaround

Numbered fields (task1_name: Design review, task1_status: done) and dot notation on nested objects. Works but loses natural array semantics and makes schemas verbose.

Proposed Solution

Phase 1: Simple arrays (parity with tags)

  • Index string arrays per-element
  • metadata_filters={"participants": "Bob"} matches if "Bob" is in the participants array
  • Same behavior as --tag already provides

Phase 2 (optional): Array-of-objects

  • Support dot-notation queries: {"tasks.status": "active"}
  • More complex but enables rich structured data in frontmatter

Why This Matters

  • Schemas (Picoschema) are making structured frontmatter more common
  • Natural data models use arrays (meeting participants, task assignees, dependencies, contributors, etc.)
  • Current workaround (numbered fields) is verbose and breaks schema expressiveness
  • tags proves the indexing pattern already works — just needs extension to custom fields

Related

  • Built-in array filtering already works: --tag <value>, --type <value> (repeatable)
  • Custom arrays are stored correctly, just not indexed for filtering
  • User is running production workload (1000+ notes) hitting this limitation

Priority

Medium-High — affects users building structured knowledge bases with schemas. Blocking more expressive data models in frontmatter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions