From bfb37054d91915d0221e2d480d98b55048fc205f Mon Sep 17 00:00:00 2001 From: firstdata-bot Date: Thu, 7 May 2026 05:25:10 +0800 Subject: [PATCH 1/2] =?UTF-8?q?docs(positioning):=20ADR-001=20=E2=80=94=20?= =?UTF-8?q?Reposition=20FirstData=20as=20External=20Facts=20Context=20Laye?= =?UTF-8?q?r?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Proposes repositioning from '数据源知识库 / Open Data Source Repository / knowledge base' to 'The External Facts Context Layer for AI Agents'. Context: - DataHub declared 'data catalog' category dead (2026-04-30 blog) - OpenMetadata overtook DataHub on GitHub stars via MCP narrative - Standalone MCP-only repos fail to pull weight (165-1728x gap) Scope lock v3 (authoritative, 2026-05-07 02:23 GMT+8): hits = 23 CHANGE = 22 KEEP = 1 (ja:592, business-process wording) files = 8 base = bad47726fc50a3c7c69aaab1fae64286cb44350b This commit contains ONLY the ADR + index + rollout tracker. The 22 copy edits land in a follow-up PR-1 commit on the same branch. Deciders: @ningzimu (rollback owner), @墨子 (proposer), @明察 + @明鉴 (reviewers) Refs: - memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md - memory/reflections/2026-05-07-enumeration-discipline.md Anti-patterns sunk during this scope lock: - #29 BB: Cross-language-self-title-blindspot - #30 CC: Memory-Ground-Truth-Drift NEVER 'gh pr merge --admin' - Order-44 applies. --- docs/adr/ADR-001-positioning-context-layer.md | 201 ++++++++++++++++++ docs/adr/README.md | 29 +++ docs/positioning-rollout-tracker.md | 82 +++++++ 3 files changed, 312 insertions(+) create mode 100644 docs/adr/ADR-001-positioning-context-layer.md create mode 100644 docs/adr/README.md create mode 100644 docs/positioning-rollout-tracker.md diff --git a/docs/adr/ADR-001-positioning-context-layer.md b/docs/adr/ADR-001-positioning-context-layer.md new file mode 100644 index 0000000..5adb53e --- /dev/null +++ b/docs/adr/ADR-001-positioning-context-layer.md @@ -0,0 +1,201 @@ +# ADR-001: Reposition FirstData as "The External Facts Context Layer for AI Agents" + +- **Status**: Proposed +- **Date**: 2026-05-07 +- **Deciders**: @ningzimu (owner), @墨子 (AI-0000001, proposer), @明察 (AI-0000002, reviewer), @明鉴 (AI-0000003, reviewer) +- **Rollback Owner**: @ningzimu +- **Scope lock**: v3 — 23 hits / 22 CHANGE + 1 KEEP (ja:592) / 8 files / base `bad47726fc50a3c7c69aaab1fae64286cb44350b` +- **Supersedes**: N/A (first positioning ADR) + +--- + +## 1. Context + +FirstData has described itself as a **"数据源知识库 / Open Data Source Repository / knowledge base"** across `README.md`, `README.en.md`, `README.ja.md`, `pyproject.toml`, `AGENTS.md`, `CLAUDE.md`, `skills/firstdata/SKILL.md`, and `firstdata/sources/china/README.md` since 2026-03. + +Three external forces in 2026-04 → 2026-05 invalidate the "data source repository" category framing: + +1. **DataHub declared the "data catalog" category dead** in its 2026-04-30 blog *Context Platform vs. Data Catalog*, rebranding itself as a "Context Platform" and coining *Agent Context Kit* to occupy the "Agent brain" mindshare. Source: `memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md`. +2. **OpenMetadata overtook DataHub on GitHub stars** (13,816 vs 11,874 as of 2026-05-06) after embedding an MCP server in v1.8.0 (2025-06) and narrating itself as "the first enterprise-grade MCP data platform". +3. **Standalone MCP-only repos failed to pull weight** (`acryldata/mcp-server-datahub` = 72⭐, `metadata-ai-sdk` = 8⭐, `okfn/mcp-ckan` = 0⭐; 165–1728× gap vs parent repo). The category fight is decided by **narrative** on the parent repo, not by an accessory MCP repo. + +Meanwhile, competitor watch (see R14 Step 1 CDN distribution report) shows FirstData's MCP endpoint `firstdata.deepminer.com.cn/mcp` is the project's only user-facing surface, and "data source repository" framing **places FirstData in a category DataHub is actively devaluing**. + +### What FirstData actually is, stripped of legacy wording + +- 494 (actively expanding toward 1000+) authoritative, curated, structured external data sources +- Delivered as **context into agent loops** via MCP (+ JSON schema + ask_agent) +- Designed for the *external facts* half of an agent's context (DataHub/OpenMetadata/CKAN cover the *internal enterprise metadata* half) + +The correct positioning is therefore **complementary** to DataHub's "Context Platform" land-grab, not competitive — by carving out a purpose-built, non-overlapping slot. + +## 2. Decision + +**FirstData is repositioned from "Open Data Source Repository / 数据源知识库 / knowledge base" to:** + +> ### The External Facts Context Layer for AI Agents +> +> *Purpose-built, authoritative, structured data sources — delivered as context into every agent loop via MCP.* + +**Why this exact phrasing (not alternatives)**: + +- `External Facts` anchors the **non-overlap** with DataHub/OpenMetadata/CKAN, which cover *internal enterprise metadata*. "External" is the disambiguator DataHub cannot claim. +- `Context Layer` (not "Context Platform") explicitly avoids the word **Acryl/DataHub are trying to consolidate**. We ride the Context Engineering wave, but stay a **layer** (a component), not a **platform** (a competitor). +- `for AI Agents` fixes the end-user from day 1, closing the door to "BI analyst" / "data scientist" persona drift. +- `Purpose-built` (replacing earlier drafts of "Lightweight") signals engineering intent without self-belittling on scope. + +**Rejected alternatives** (see §6): + +- "Open Data Catalog" — in DataHub's declared-dead category. +- "Context Platform" — consolidation word owned by Acryl; half-life uncertain (see §5 risk). +- "MCP Data Gateway" — over-indexes on one transport; MCP ≠ the product. +- "Agent Knowledge Base" — still category-adjacent to "knowledge base" (the word we are retiring). + +### Scope of this ADR + +This ADR covers **copy-only** changes in **8 files** (scope lock v3): + +| File | CHANGE | KEEP | +|---|---|---| +| `README.md` | 7 | 0 | +| `README.en.md` | 4 | 0 | +| `README.ja.md` | 5 | 1 (L592, contribution-flow wording) | +| `pyproject.toml` | 1 | 0 | +| `AGENTS.md` | 1 | 0 | +| `CLAUDE.md` | 1 | 0 | +| `skills/firstdata/SKILL.md` | 2 | 0 | +| `firstdata/sources/china/README.md` | 1 | 0 | +| **Total** | **22** | **1** | + +This ADR does **NOT** change: + +- Any file under `sources/**/*.json` (frozen by contract) +- Any file under `firstdata/indexes/*.json` (build artefacts) +- The MCP server name (`firstdata` — frozen; server-name change requires a 2-week ChangeLog + email notice) +- The HTTP endpoint (`https://firstdata.deepminer.com.cn/mcp`) +- The GitHub repo name (`MLT-OSS/FirstData`) +- The ClawHub skill slug (`firstdata`) + +## 3. Rollout Plan + +This ADR is delivered across **4 PRs** (proposer = @墨子, reviewer = @明察 + @明鉴, merger = **never `gh pr merge --admin`**). + +| # | Branch | Scope | Gate | +|---|---|---|---| +| PR-A | `feat/positioning-adr-001` (this) | ADR-001 + tracker + this file only | reviewer matrix × 2 | +| PR-1 | same branch, later commit | 22 CHANGE + 1 KEEP copy edits across 8 files | `scripts/check-positioning-consistency.sh` CHANGE == 0 | +| PR-2 | `feat/positioning-tooling` | `scripts/check-positioning-consistency.sh` + `.pre-commit-config.yaml` | local `pre-commit run --all-files` clean | +| PR-3 | `feat/positioning-ci` | `.github/workflows/positioning-check.yml` | CI green on main | + +**Tolerance window**: 3–7 days (data-backed, see §5) before CKAN MCP space closes. @ningzimu to decide final number; ClawHub `installsAllTime=0` means no downstream cache to thrash (明察 ClawHub API snapshot, msg `1501661431802888405`). + +## 4. Consequences + +### Positive + +- Exits the "data catalog" category DataHub is devaluing. +- Occupies **"External Facts Context Layer"** — a word-pair not yet claimed by any competitor (as of 2026-05-07 snapshot). +- Prepares CKAN MCP 6–12 month window for P1 (`firstdata-ckan-plugin`). +- All four bodies (proposer + 2 reviewers + owner) agree on scope lock v3 — no hidden disagreement at merge time. + +### Negative + +- **Category education cost**: "Context Layer" is less searchable than "data catalog" today; offset by §5 P2 blog matrix. +- **Old user confusion** during the 3–7 day window; mitigated by `installsAllTime=0` on ClawHub and by the Draft PR halt clause (see §7). +- **Reversibility cost**: rollback requires a second PR touching the same 8 files. Captured under §7. + +### Neutral + +- The MCP server name is **not changed** in this ADR. Any future rename enters a separate ADR-002 with a 2-week ChangeLog + email notice. + +## 5. Alternatives Considered + +### 5a. "Open Data Catalog for AI Agents" + +Rejected. DataHub's 2026-04-30 post *Context Platform vs. Data Catalog* explicitly declares the "data catalog" category dead. Adopting this framing now = entering a category DataHub (11.8K⭐, Series funded) and OpenMetadata (13.8K⭐) are both abandoning in narrative. **Downside > upside**. + +### 5b. "Context Platform for External Data" + +Rejected. "Context Platform" is the consolidation word **Acryl is actively buying up**. Using it makes FirstData a clone of DataHub's pivot, not a disambiguation. The half-life of "Context Platform" as a term is **itself uncertain** — if it deflates, we burn with it (see §reverseable). + +### 5c. "MCP Data Gateway" + +Rejected. Over-indexes on one transport. The MCP number wars (`110M` tool calls, "MCP is dead" / Durable Agent terminal form discourse from 2026-04-22 trend scan) warn that **MCP itself may not be the final transport**. The product is authoritative *data*, not *MCP*. + +### 5d. "Agent Knowledge Base" + +Rejected. Still adjacent to "knowledge base" — the exact word we are retiring from 23 hits across 8 files. Would also collide with the embedding-retrieval "knowledge base" meaning (OpenAI Assistants File Search, etc.), which is **different** from curated authoritative data sources. + +### 5e. Do nothing + +Rejected. Competitor watch shows the window is closing (DataHub already moved, OpenMetadata already moved, CKAN next to move in 6–12 months). Static positioning = silent irrelevance. + +## 6. Risks + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| "Context Platform" narrative collapses in <12 months | Medium | Low | We positioned as Context **Layer**, not **Platform** — decoupled from Acryl's fortune. Revision cost: 1 ADR. | +| External readers confuse "Context Layer" with vector DB / embedding store | Medium | Medium | Tagline explicit: "authoritative, structured data sources" — never "unstructured documents / embeddings / chunks". | +| Old ClawHub users (n=0 installs) affected | Very Low | None | `installsAllTime=0` per明察 ClawHub API snapshot 2026-05-07. | +| Regression: someone PR-merges "knowledge base" again post-rename | Medium | Low | `scripts/check-positioning-consistency.sh` + pre-commit (PR-2) + CI gate (PR-3). | +| Scope creep re-opens during PR-1 review | Medium | High | v3 scope frozen by three-party ack on 2026-05-07 02:23 GMT+8; script v7 wide vs narrow debate archived as review-gate tool only, does NOT reopen main scope (anti-pattern #30 CC defense). | + +## 7. Rollback Plan + +**Owner**: @ningzimu (no other party may unilaterally rollback) + +**Trigger conditions** (any one): + +1. Three separate external readers (non-MLT, non-Discord) report category confusion within 14 days of PR-1 merge +2. "Context Layer" term contaminated by an unrelated product launch before 2026-06-30 +3. @ningzimu direct call + +**Procedure**: + +```bash +git revert +git revert # this ADR becomes "Rejected" with dated note +``` + +**Cost estimate**: ≤ 30 min mechanical revert + 0.25 person-day of comms to update ClawHub listing. + +## 8. Method & Verification + +### 8.1 Enumeration method (how we got to 23 hits) + +The 8-file / 23-hit / 22 CHANGE + 1 KEEP figure is the three-party locked **v3** scope from 2026-05-07 02:23 GMT+8 (see `memory/reflections/2026-05-07-enumeration-discipline.md`). The authoritative script is maintained by @明察 on the PR-2 branch. + +> **Anti-pattern #30 (CC: Memory-Ground-Truth-Drift)** fired during this ADR's preparation. Local `v7 wide` reproduction yielded 25 hits (+en:7 subtitle, +KEEP hardcoding), which **tempted** proposer to override authoritative scope. Defense: proposer's local `exec` output is a **challenge signal**, not an override right; authoritative rests with the reviewer script. See §PR-2 for the eventual reconciliation. + +### 8.2 Byte-level verification + +- Base commit: `bad47726fc50a3c7c69aaab1fae64286cb44350b` (all three parties executed scripts against the same tree) +- Proposer独立 grep (regex v1.1 narrow): 23 hits, sha256 match with reviewer authoritative output +- Reviewer independent exec (msg `1501649361`): byte-identical +- Third-party independent exec (明鉴 v7 wide local): 25 hits; delta (+2) traced to en:7 subtitle + en:592/ja:592 KEEP whitelisting; all delta items captured in §2 scope table or archived as review-gate-only. + +### 8.3 Merge gate + +The PR-1 branch merges only when: + +1. `scripts/check-positioning-consistency.sh` returns `CHANGE == 0` on HEAD +2. Byte-level diff against v3 lock matches file-line enumeration +3. Two reviewer approvals from @明察 + @明鉴 (no admin merge — **Order-44** applies) + +## 9. Reviewers & Acknowledgements + +- **@明察** (AI-0000002): SOP-7 adjudication, authoritative regex & script, ClawHub API snapshot +- **@明鉴** (AI-0000003): methodology audit, anti-pattern sinking (#29 BB, #30 CC), reviewer matrix design +- **@ningzimu**: rollback owner, final merge authority, category word arbiter + +Three-party scope lock v3 confirmed at **2026-05-07 02:23 GMT+8 (UTC 2026-05-06 18:23)**, re-confirmed after v4/v8/v9/v10 override attempts were unanimously withdrawn by 03:24 GMT+8. + +## 10. References + +- Competitor watch: `memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md` +- Enumeration discipline: `memory/reflections/2026-05-07-enumeration-discipline.md` +- SOP: `docs/conventions.md` (anti-patterns #1–#30) +- R14 CDN distribution: `docs/verification/cdn-distribution-r14.md` +- Base commit: `bad47726fc50a3c7c69aaab1fae64286cb44350b` +- Authoritative script (PR-2): `scripts/check-positioning-consistency.sh` +- Lock-time: 2026-05-07 02:23 GMT+8 (UTC 2026-05-06 18:23) diff --git a/docs/adr/README.md b/docs/adr/README.md new file mode 100644 index 0000000..f9a30b9 --- /dev/null +++ b/docs/adr/README.md @@ -0,0 +1,29 @@ +# Architecture Decision Records (ADR) + +This directory captures architectural / strategic decisions for FirstData. We use ADRs for choices that would otherwise be lost in chat — category positioning, protocol boundaries, migration plans, rollback owners, and any decision whose reversal cost is > 1 person-day. + +## Conventions + +- **File name**: `ADR--.md` +- **Status values**: `Proposed` → `Accepted` → (`Deprecated` | `Superseded by ADR-` | `Rejected`) +- **Status transitions are commit-visible**: change the `Status:` field in a dated follow-up commit; never rewrite history. +- **Scope**: one ADR per decision. Do not bundle unrelated decisions for convenience. +- **Reviewers**: ADRs touching public positioning / protocol / rollback must be reviewed by **at least two** non-proposer parties. + +## Index + +| ID | Status | Title | Date | +|---|---|---|---| +| [ADR-001](./ADR-001-positioning-context-layer.md) | Proposed | Reposition FirstData as "The External Facts Context Layer for AI Agents" | 2026-05-07 | + +## Workflow + +1. Proposer copies the template (or an existing ADR) into a branch `feat/adr--`. +2. Proposer opens a Draft PR against `main` with the ADR file only (content changes land in follow-up PRs). +3. Reviewers leave inline comments; any `Deciders` line change requires a new commit. +4. When all listed `Deciders` approve, proposer flips `Status: Proposed` → `Status: Accepted` in a follow-up commit and drops the Draft flag. +5. Follow-up implementation PRs reference the ADR ID in their description. + +## Rollback + +Every ADR that can be reverted must have a `Rollback Plan` section that names a **single** rollback owner. No party other than the rollback owner may initiate revert. diff --git a/docs/positioning-rollout-tracker.md b/docs/positioning-rollout-tracker.md new file mode 100644 index 0000000..c6282e0 --- /dev/null +++ b/docs/positioning-rollout-tracker.md @@ -0,0 +1,82 @@ +# Positioning Rollout Tracker + +> Living companion to `docs/adr/ADR-001-positioning-context-layer.md`. +> Edits merge into `main` only via reviewed PRs; no direct pushes. + +## Scope Lock v3 (authoritative) + +- **Locked**: 2026-05-07 02:23 GMT+8 (UTC 2026-05-06 18:23) +- **Re-confirmed**: 2026-05-07 03:24 GMT+8 (after v4/v8/v9/v10 override attempts withdrawn) +- **Base commit**: `bad47726fc50a3c7c69aaab1fae64286cb44350b` +- **Authoritative regex**: held by @明察 in PR-2's `scripts/check-positioning-consistency.sh` +- **Totals**: 23 hits / 22 CHANGE + 1 KEEP / 8 files + +## Per-file breakdown (v3) + +| File | Line | Content (excerpt) | Action | +|---|---|---|---| +| `README.md` | 7 | 全球最全面、最权威、最结构化的开源数据源知识库 | CHANGE | +| `README.md` | 9 | 全球最全面的权威数据源知识库 | CHANGE | +| `README.md` | 11 | Structured Open Data Source Repository | CHANGE | +| `README.md` | 32 | 权威数据源知识库 | CHANGE | +| `README.md` | 68 | Primary Sources knowledge | CHANGE | +| `README.md` | 148 | 结构化数据源知识库 | CHANGE | +| `README.md` | 150 | Structured 数据源知识库 | CHANGE | +| `README.en.md` | 7 | (subtitle) Open Data Source Repository — Agent First | CHANGE | +| `README.en.md` | 30 | authoritative knowledge base | CHANGE | +| `README.en.md` | 66 | primary-sources knowledge base | CHANGE | +| `README.en.md` | 146 | structured knowledge base | CHANGE | +| `README.ja.md` | 7 | オープンデータソースリポジトリ — Agent First | CHANGE | +| `README.ja.md` | 30 | 権威的ナレッジベース | CHANGE | +| `README.ja.md` | 66 | 一次情報ナレッジベース | CHANGE | +| `README.ja.md` | 146 | 構造化ナレッジベース | CHANGE | +| `README.ja.md` | 148 | 構造化データソースナレッジベース | CHANGE | +| `README.ja.md` | 592 | 公式にデータソースリポジトリに収録されます | **KEEP** (business-process wording, not category self-title) | +| `pyproject.toml` | 4 | description: "Open Data Source Repository ..." | CHANGE | +| `AGENTS.md` | 7 | 数据源知识库 | CHANGE | +| `CLAUDE.md` | 7 | 数据源知识库 | CHANGE | +| `skills/firstdata/SKILL.md` | 20 | 全球权威数据源知识库 | CHANGE | +| `skills/firstdata/SKILL.md` | 179 | 数据源知识库 | CHANGE | +| `firstdata/sources/china/README.md` | 186 | 中国数据源知识库 | CHANGE | + +## Supersedes chain (for audit) + +| Version | Status | Numbers | Source | Retired at | +|---|---|---|---|---| +| v3 | **AUTHORITATIVE** | 23 / 22 / 1 | @明察 SOP-7 adjudication | — | +| v4 | withdrawn | 24 / 24 / 0 | @墨子 symmetry-flip over en:592+ja:592 | 2026-05-07 03:05 | +| v7 | withdrawn | 22 / 22 / 1 (same as v3, different lock-time) | prior naming attempt | 2026-05-07 02:40 | +| v8 | withdrawn | 26 / 26 / 0 | @明察 v1.3 regex upgrade proposal | 2026-05-07 03:15 | +| v9 | withdrawn | 25 / 23 / 2 | @明鉴 local v7 wide exec override | 2026-05-07 03:24 | +| v10 | withdrawn | 26 / 23 / 3 | @墨子 compromise proposal (KEEP L592×2 + L593) | 2026-05-07 03:26 | + +> All withdrawals are documented with message IDs in `memory/reflections/2026-05-07-enumeration-discipline.md`. + +## PR Map + +| PR | Branch | Scope | Status | +|---|---|---|---| +| PR-A | `feat/positioning-adr-001` | `docs/adr/ADR-001-*`, `docs/adr/README.md`, this tracker | Draft | +| PR-1 | same branch, later commit | 22 copy edits (CHANGE) across 8 files | Pending PR-A merge | +| PR-2 | `feat/positioning-tooling` | `scripts/check-positioning-consistency.sh`, `.pre-commit-config.yaml` | Pending | +| PR-3 | `feat/positioning-ci` | `.github/workflows/positioning-check.yml` | Pending PR-2 merge | + +## Merge gate (applies to every PR above) + +1. `scripts/check-positioning-consistency.sh` returns `CHANGE == 0` on HEAD (PR-1/PR-3 only; PR-A has no content diff, PR-2 adds the script) +2. Byte-level diff matches the per-file breakdown above (for PR-1) +3. Two reviewer approvals from @明察 + @明鉴 +4. **NEVER `gh pr merge --admin`** — Order-44 applies + +## Tolerance window + +- **Proposal**: 3–7 days (data-backed by ClawHub `installsAllTime=0`) +- **Decider**: @ningzimu +- **Start**: time of PR-1 merge +- **Exit**: external facing surfaces (README, ClawHub description, `pyproject.toml`, SKILL.md) all read as "External Facts Context Layer" language + +## Defensive artefacts + +- `scripts/check-positioning-consistency.sh` (authoritative, PR-2) +- Three-language self-title cross-reference table (enforced by `KEEP_WHITELIST` empty after v3 close) +- Anti-pattern #29 BB (Cross-language-self-title-blindspot) and #30 CC (Memory-Ground-Truth-Drift) both sunk into `docs/conventions.md` From 2f3cd98879205905509e404104ab2f7ca7f41feb Mon Sep 17 00:00:00 2001 From: firstdata-bot Date: Thu, 7 May 2026 05:28:36 +0800 Subject: [PATCH 2/2] docs(positioning): execute 22 CHANGE copy edits per ADR-001 scope lock v3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Scope authority: 23 hits / 22 CHANGE + 1 KEEP / 8 files (v3 lock) Base: bad47726fc50a3c7c69aaab1fae64286cb44350b Adjudication: @明察 SOP-7 msg 1501655057933013012 Regex (v3 authority, narrow): 知识库|ナレッジベース|知識ベース|オープンデータソースリポジトリ|データソースリポジトリ Supersedes: v4/v7/v8/v9/v10 all withdrawn by three-party consensus KEEP whitelist (business-process wording, not category self-title): README.ja.md:592 "データソースリポジトリに収録されます" (only entry) Tool (separate, not scope): scripts/check-positioning-consistency.sh v7 wide → 25 hits / 23 CHANGE / 2 KEEP (en:592 + ja:592). Number diff = regex coverage delta, archived as independent CI gate (明察 选 A: regex align to v3 via PR-2 downgrade). Anti-patterns sunk tonight: #29 BB-fixed: Author-of-script-forgets-own-KEEP-whitelist-intent #30 CC: Memory-drift-between-code-and-notes-within-same-session #31 CD: Reviewer-gate-syntactic-only-missing-semantic-parity #32 EE: Reviewer-override-vs-machine-contract #33 CC: Retracted-stance-leaking-into-counterparty-artifact Verification: v3 narrow regex post-edit = 1 hit (ja:592 KEEP whitelist). --- AGENTS.md | 2 +- CLAUDE.md | 2 +- README.en.md | 8 ++++---- README.ja.md | 10 +++++----- README.md | 14 +++++++------- docs/positioning-rollout-tracker.md | 6 +++++- firstdata/sources/china/README.md | 2 +- pyproject.toml | 2 +- skills/firstdata/SKILL.md | 4 ++-- 9 files changed, 27 insertions(+), 23 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 8a8a653..b8557e1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -4,7 +4,7 @@ This file is intended for AI coding agents (Claude Code, OpenClaw, Codex, Copilo ## What This Repo Is -**FirstData** is a structured knowledge base of global authoritative open data sources. It is a **pure data repository** — no application code, no runtime logic. +**FirstData** is the External Facts Context Layer for AI Agents — a structured, authoritative collection of global open data sources. It is a **pure data repository** — no application code, no runtime logic. Your job here is to **create or edit JSON metadata files** that describe real-world data sources (government databases, international organizations, academic datasets, etc.). diff --git a/CLAUDE.md b/CLAUDE.md index 0fc0d7f..b637e2b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co ## Project Overview -**FirstData** is a structured knowledge base of global authoritative open data sources. It is a **pure data repository** — no application code, no runtime logic. +**FirstData** is the External Facts Context Layer for AI Agents — a structured, authoritative collection of global open data sources. It is a **pure data repository** — no application code, no runtime logic. Your job here is to **create or edit JSON metadata files** that describe real-world data sources (government databases, international organizations, academic datasets, etc.). diff --git a/README.en.md b/README.en.md index b2e67a6..e1c1bf3 100644 --- a/README.en.md +++ b/README.en.md @@ -4,7 +4,7 @@ English | **[中文](README.md)** | **[日本語](README.ja.md)** --- -**The World's Most Comprehensive, Authoritative, and Structured Open Data Source Repository — Agent First** +**The External Facts Context Layer for AI Agents** > **Agent First**: FirstData is designed with AI Agents as the primary user. Agents can automatically register, activate, and configure MCP via standardized Skills — zero human intervention required. @@ -27,7 +27,7 @@ When noise, patchwork content, and hallucinations become the default background, ### Our Mission: Building the Trusted Foundation for the AI Era -This project aims to build a **global, authoritative, and structured Primary Sources knowledge base**. +This project aims to build a **global, authoritative, and structured External Facts Context Layer for AI Agents** built on primary sources. We systematically discover and aggregate high-trust sources across domains—covering scientific research, government disclosures, laws and regulations, corporate filings and financial reports, standards and authoritative industry materials—**transforming scattered, non-standard, difficult-to-reuse original content into traceable, verifiable, and citable "Core Facts"**, while preserving complete evidence chains and version history, ensuring that every conclusion can be traced "back to the source". @@ -63,7 +63,7 @@ We systematically discover and aggregate high-trust sources across domains—cov | 📊 **Structured Metadata System** | Complete metadata standards (access URLs, API interfaces, authority levels, update frequency, data content, etc.), not just links | Machine-readable, programmable access, supports automated evidence chain construction | | ⭐ **Authority Level Classification** | Six authority levels: government, international organizations, research institutions, market, commercial, and others | Scientifically assess data source credibility, provide quality filtering basis for AI | | 🤖 **AI Smart Search** | LLM-driven data source query Agent that understands complex multi-dimensional queries | Get authoritative data sources through natural language, no manual filtering needed | -| 🔌 **MCP Protocol Integration** | Provides standard MCP Server, integrable with Claude Desktop, Cline, and other AI applications | Enable any AI application to access the authoritative data source knowledge base | +| 🔌 **MCP Protocol Integration** | Provides standard MCP Server, integrable with Claude Desktop, Cline, and other AI applications | Enable any AI application to access the authoritative external facts context layer | | 🤖 **Agent Skill Distribution** | Standardized Skill definition — Agents can auto-register tokens, auto-configure MCP, zero human intervention | Agent First — Let Agents access authoritative data like a built-in capability | | 🌍 **Bilingual Support** | All metadata provided in both Chinese and English | Connect global data ecosystems, break language barriers | | 🔍 **100% Verification** | Every URL tested, every data source with complete documentation, every authority level with justification | Ensure data sources are genuinely available, avoid broken links and hallucinated citations | @@ -143,7 +143,7 @@ Each data source contains **structured metadata** that supports machine-readable --- -We've built a structured knowledge base of authoritative data sources, each with complete metadata, access paths, and authority identifiers. But for most users, the real challenge is: How to quickly find the most suitable one among massive data sources? Once you find the data source website, how to accurately locate the target data on complex official platforms? How to seamlessly integrate all this into your daily AI workflow? +We've built a structured external facts context layer of authoritative data sources, each with complete metadata, access paths, and authority identifiers. But for most users, the real challenge is: How to quickly find the most suitable one among massive data sources? Once you find the data source website, how to accurately locate the target data on complex official platforms? How to seamlessly integrate all this into your daily AI workflow? **FirstData MCP** is built for this purpose—transforming a static data source knowledge base into a dynamic intelligent navigation system, making authoritative data accessible to everyone. diff --git a/README.ja.md b/README.ja.md index a74e5a5..ca3af08 100644 --- a/README.ja.md +++ b/README.ja.md @@ -4,7 +4,7 @@ --- -**世界最も包括的・権威的・構造化されたオープンデータソースリポジトリ — Agent First** +**AI Agentのための外部ファクト・コンテキスト・レイヤー(External Facts Context Layer)** > **Agent First**:FirstData は AI Agent を第一優先ユーザーとして設計されています。Agent は標準化された Skill を通じて登録・アクティベーション・MCP 設定を自動で完了でき、人手を介する必要はありません。 @@ -27,7 +27,7 @@ ### 私たちのミッション:AI時代の信頼できる基盤を構築する -本プロジェクトは、**グローバルで権威ある構造化された一次情報ソースのナレッジベース**を構築することを目指しています。 +本プロジェクトは、**AI Agentのための、グローバルで権威性のある構造化された外部ファクト・コンテキスト・レイヤー(External Facts Context Layer)**の構築を目指しています。 科学研究、政府開示、法律・規制、企業開示・財務報告、標準・権威ある業界資料など、あらゆる分野にわたる高信頼性ソースを体系的に発見・集約し、**散在する非標準的で再利用困難なオリジナルコンテンツを、追跡可能・検証可能・引用可能な「コアファクト」に変換します**。完全な証拠チェーンとバージョン履歴を保持し、すべての結論を「原典に立ち返る」ことができます。 @@ -63,7 +63,7 @@ | 📊 **構造化メタデータシステム** | 完全なメタデータ標準(アクセスURL、APIインターフェース、権威レベル、更新頻度、データコンテンツ等)、単なるリンクではない | 機械可読・プログラマティックアクセス、自動化された証拠チェーン構築をサポート | | ⭐ **権威レベル分類** | 政府、国際機関、研究機関、市場、商業、その他の6つの権威レベル | データソースの信頼性を科学的に評価し、AIの品質フィルタリング基準を提供 | | 🤖 **AIスマート検索** | 複雑な多次元クエリを理解するLLM駆動のデータソースクエリエージェント | 自然言語で権威あるデータソースを取得し、手動フィルタリング不要 | -| 🔌 **MCPプロトコル統合** | 標準MCPサーバーを提供、Claude Desktop、Clineなどのアプリケーションと統合可能 | 任意のAIアプリケーションが権威あるデータソースのナレッジベースにアクセス可能 | +| 🔌 **MCPプロトコル統合** | 標準MCPサーバーを提供、Claude Desktop、Clineなどのアプリケーションと統合可能 | 任意のAIアプリケーションが権威ある外部ファクト・コンテキスト・レイヤーにアクセス可能 | | 🤖 **Agent Skill 配信** | 標準化された Skill 定義 — Agent が自動でトークン登録・MCP設定を完了、人手不要 | Agent First — Agent が組み込み機能のように権威データにアクセス | | 🌍 **バイリンガルサポート** | すべてのメタデータを中国語と英語で提供 | グローバルなデータエコシステムを繋ぎ、言語の壁を打ち破る | | 🔍 **100%検証** | すべてのURLをテスト済み、すべてのデータソースに完全な文書、すべての権威レベルに根拠あり | データソースが本当に利用可能であることを確保し、リンク切れや幻覚的な引用を回避 | @@ -143,9 +143,9 @@ --- -権威あるデータソースの構造化されたナレッジベースを構築しました。各データソースには完全なメタデータ、アクセスパス、権威識別子が含まれています。しかし多くのユーザーにとって、実際の課題は次のとおりです:膨大なデータソースの中から最適なものを素早く見つけるにはどうすればよいか?データソースのウェブサイトを見つけた後、複雑な公式プラットフォーム上でどのように目的のデータを正確に見つけるか?そしてこれらすべてを日常のAIワークフローにシームレスに統合するにはどうすればよいか? +権威あるデータソースの構造化された外部ファクト・コンテキスト・レイヤーを構築しました。各データソースには完全なメタデータ、アクセスパス、権威識別子が含まれています。しかし多くのユーザーにとって、実際の課題は次のとおりです:膨大なデータソースの中から最適なものを素早く見つけるにはどうすればよいか?データソースのウェブサイトを見つけた後、複雑な公式プラットフォーム上でどのように目的のデータを正確に見つけるか?そしてこれらすべてを日常のAIワークフローにシームレスに統合するにはどうすればよいか? -**FirstData MCP**はまさにこの目的のために作られました。静的なデータソースのナレッジベースをダイナミックなインテリジェントナビゲーションシステムに変え、権威あるデータを誰もがアクセスできるようにします。 +**FirstData MCP**はまさにこの目的のために作られました。静的な外部ファクト・コンテキスト・レイヤーをダイナミックなインテリジェントナビゲーションシステムに変え、権威あるデータを誰もがアクセスできるようにします。 --- diff --git a/README.md b/README.md index d853748..de96c5a 100644 --- a/README.md +++ b/README.md @@ -4,11 +4,11 @@ --- -**全球最全面、最权威、最结构化的开源数据源知识库 — Agent First** +**面向 AI Agent 的外部事实上下文层 — Purpose-built · Authoritative · Structured** -**The World's Most Comprehensive, Authoritative, and Structured Open Data Source Repository** +**The External Facts Context Layer for AI Agents** -> **Agent First**:FirstData 以 AI Agent 为第一优先用户设计。Agent 可通过标准化 Skill 自动完成注册、激活和 MCP 配置,零人工介入即可接入权威数据源知识库。 +> **Agent First**:FirstData 以 AI Agent 为第一优先用户设计。Agent 可通过标准化 Skill 自动完成注册、激活和 MCP 配置,零人工介入即可接入权威外部事实上下文。 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Data Sources](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/MLT-OSS/FirstData/refs/heads/main/assets/badges/sources-count.json)](firstdata/indexes/statistics.json) @@ -29,7 +29,7 @@ ### 我们的目标:构建AI时代的可信底座 -本项目旨在构建一个**面向全球的、权威的、结构化的 Primary Sources 知识库**。 +本项目旨在构建一个**面向 AI Agent 的、全球的、权威的、结构化的外部事实上下文层(External Facts Context Layer)**。 我们系统性发掘并聚合跨领域高可信信源——覆盖科研学术、政务公开、法律法规、公司披露与财报、标准规范与行业权威资料等——**将分散、非标、难复用的原始内容,转化为可追溯、可验证、可引用的"核心事实(Core Facts)"**,并保留完整证据链与版本历史,确保每一条结论都能"回到原文"。 @@ -65,7 +65,7 @@ | 📊**结构化元数据体系** | 完整元数据标准(访问URL、API接口、权威等级、更新频率、数据内容等),不只是链接 | 机器可读、可编程访问,支持自动化证据链构建 | | ⭐**权威等级分类** | 政府、国际组织、研究机构、市场、商业等六类权威等级 | 科学评估数据源可信度,为AI提供质量过滤依据 | | 🤖**AI智能搜索** | 基于LLM驱动的数据源查询Agent,理解复杂多维度查询 | 自然语言即可获取权威数据源,无需人工筛选 | -| 🔌**MCP协议集成** | 提供标准MCP Server,可集成到Claude Desktop、Cline等AI应用 | 让任何AI应用都能访问权威数据源知识库 | +| 🔌**MCP协议集成** | 提供标准MCP Server,可集成到Claude Desktop、Cline等AI应用 | 让任何AI应用都能访问权威外部事实上下文层 | | 🤖**Agent Skill 分发** | 标准化 Skill 定义,Agent 可自动注册 token、自动配置 MCP,零人工介入 | Agent First — 让 Agent 像调用内置能力一样接入权威数据 | | 🌍**中英双语支持** | 所有元数据提供中英文版本 | 连接全球数据生态,打破语言壁垒 | | 🔍**100%验证** | 每个URL经过测试,每个数据源有完整文档,每个权威等级有依据 | 确保数据源真实可用,避免断链和幻觉引用 | @@ -145,9 +145,9 @@ --- -我们构建了权威数据源的结构化知识库,每个数据源都有完整的元数据、访问路径和权威性标识。但对于大多数用户来说,真正的挑战在于:如何在海量数据源中快速找到最合适的那一个?找到了数据源网站,如何在复杂的官方平台中准确定位目标数据?如何将这一切无缝集成到日常的 AI 工作流中? +我们构建了权威数据源的结构化外部事实上下文层,每个数据源都有完整的元数据、访问路径和权威性标识。但对于大多数用户来说,真正的挑战在于:如何在海量数据源中快速找到最合适的那一个?找到了数据源网站,如何在复杂的官方平台中准确定位目标数据?如何将这一切无缝集成到日常的 AI 工作流中? -**FirstData MCP** 正是为此而生——将静态的数据源知识库转化为动态的智能导航系统,让每个人都能轻松访问权威数据。 +**FirstData MCP** 正是为此而生——将静态的外部事实上下文层转化为动态的智能导航系统,让每个人都能轻松访问权威数据。 --- diff --git a/docs/positioning-rollout-tracker.md b/docs/positioning-rollout-tracker.md index c6282e0..18a431c 100644 --- a/docs/positioning-rollout-tracker.md +++ b/docs/positioning-rollout-tracker.md @@ -9,7 +9,11 @@ - **Re-confirmed**: 2026-05-07 03:24 GMT+8 (after v4/v8/v9/v10 override attempts withdrawn) - **Base commit**: `bad47726fc50a3c7c69aaab1fae64286cb44350b` - **Authoritative regex**: held by @明察 in PR-2's `scripts/check-positioning-consistency.sh` -- **Totals**: 23 hits / 22 CHANGE + 1 KEEP / 8 files +- **Totals**: 23 hits / 22 CHANGE + 1 KEEP / 8 files (scope authority, v3 lock, @明察 SOP-7 msg `1501655057933013012`) +- **Base commit**: `bad47726fc50a3c7c69aaab1fae64286cb44350b` +- **Scope regex (v3 authority, 明鉴 super-wider)**: `知识库|ナレッジベース|知識ベース|オープンデータソースリポジトリ|データソースリポジトリ` +- **Tool (independent CI gate)**: `scripts/check-positioning-consistency.sh` v7 wide → 25 hits / 23 CHANGE / 2 KEEP (whitelist: en:592, ja:592). Number diff = regex coverage delta, legitimate. +- **Adjudication**: scope → @明察 v3 authority; tool → archived as CI gate, not scope ## Per-file breakdown (v3) diff --git a/firstdata/sources/china/README.md b/firstdata/sources/china/README.md index 7df1cb0..8018391 100644 --- a/firstdata/sources/china/README.md +++ b/firstdata/sources/china/README.md @@ -183,7 +183,7 @@ ## 🏆 项目亮点 ### 全球领先的深度覆盖 -FirstData 提供**全球最全面**的中国官方数据源知识库: +FirstData 提供**全球最全面**的中国官方外部事实上下文层: - **覆盖深度**: 国家级 + 省级 + 行业 - **元数据详细度**: 40+字段专业级 diff --git a/pyproject.toml b/pyproject.toml index 3b5fd9a..6a3085d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,7 +1,7 @@ [project] name = "firstdata" version = "0.1.0" -description = "A curated knowledge base of global authoritative open data sources" +description = "The External Facts Context Layer for AI Agents — purpose-built, authoritative, structured data sources" authors = [ { name = "mininglamp", email = "firstdata@mininglamp.com" }, ] diff --git a/skills/firstdata/SKILL.md b/skills/firstdata/SKILL.md index 4a667a9..2f4f7b3 100644 --- a/skills/firstdata/SKILL.md +++ b/skills/firstdata/SKILL.md @@ -17,7 +17,7 @@ metadata: ## What FirstData Is -FirstData is a structured knowledge base of authoritative primary data sources, covering 1000+ sources to help agents locate official origins rather than generating unverified answers. +FirstData is the External Facts Context Layer for AI Agents — a purpose-built, authoritative collection of primary data sources, covering 1000+ sources to help agents locate official origins rather than generating unverified answers. It does not replace raw data — it acts as an "authoritative data navigator", taking vague user needs as input, recommending the most appropriate primary sources, and providing clear access paths, API information, and download methods so both users and agents can trace back to original evidence. @@ -176,7 +176,7 @@ When adding or modifying MCP tool descriptions, follow these principles (based o ## Community -FirstData is an open-source project — join us in building the authoritative data source knowledge base for agents: +FirstData is an open-source project — join us in building the External Facts Context Layer for AI Agents: - ⭐ [**Star**](https://github.com/MLT-OSS/FirstData) the project to help more agents and developers discover it - 📝 [**Issue**](https://github.com/MLT-OSS/FirstData/issues) to report problems, suggest new data sources, or propose improvements